Geometric Searching

John Barretto
Friday, 20 Apr 2012, 16:00 to 17:30
To describe searching in its simplest abstract setting, suppose we have some accumulated data (called the 'file') and some new data item (called the 'sample'). Then, searching consists of relating the sample to the file. As Knuth pointed out, searching simply means locating the appropriate record (or records) in a  given collection of records.

Although all searching problems share some basic common feature, the geometric setting gives rise to questions of a somewhat unique flavour. First of all, in geometric applications, files are primarily not just 'collections', as in several other information processing activities. Rather, they are frequently representations of more complex structures, such as polygons, polyhedra etc. Even the case of a collection of segments in the plane appears deceptively unstructured, since each segment is associated with its end-point co-ordinates, which are implicitly related by the metric structure of the plane. Moreover, the fact that co-ordinates of points are thought of as real numbers entails that a search may not exactly find an item in the file matching the sample, but rather it will locate the latter in relation to the file. This provides an additional point of contrast between geometric searching and conventional searching.

Now, suppose we have a collection of geometric data, and we want to know if it possesses a certain property (say convexity). In the simplest case, the query will be asked only once, in which case it would be wasteful to do any preconditioning in the hope of speeding up future queries. A one-time query of this type will be referred to as 'single-shot'. Many times, however, queries will be performed repeatedly on the same file. Such queries will be referred to as 'repetitive-mode' queries. In this case, it may be worthwhile to arrange the information into an organized structure to facilitate searching. This can be accomplished only at some additional expense though. Then, our analysis must focus on four separate cost measures, namely -

1) Query Time
2) Preprocessing Time
3) Storage Space
4) Update Time

In my talk, I will try to illustrate the various trade-offs among these cost measures through the following 2 problems -

A) RANGE SEARCHING : Given N points in the plane, how many lie in a given rectangle with sides parallel to the co-ordinate axes? That is, how many points (x,y) satisfy a