Methods of imaging based on wavelet retrieval of scenes6751363Abstract Methods of imaging objects based on wavelet retrieval of scenes utilize wavelet transformation of plural defined regions of a query image. By increasing the granularity of the query image to greater than one region, accurate feature vectors are obtained that allow for robust extraction of corresponding regions from a database of target images. The methods further include the use of sliding windows to decompose the query and target images into regions, and the clustering of the regions utilizing a novel similarity metric that ensures robust image matching in low response times. Claims We claim: Description BACKGROUND OF THE INVENTION
RESOLUTION AVERAGES DETAIL COEFFICIENTS
2 [2, 2, 5, 7] --
1 [2, 6] [0, 1]
0 [4] [2]
The wavelet transform for the original image with four pixels is defined as the single coefficient representing the overall average of the pixel values followed by the detail coefficients in the order of increasing resolution. Thus, the one-dimensional Haar wavelet transform for the original image is given by: I'=[4, 2, 0, 1]. Each entry in I' is called a wavelet coefficient. Using the wavelet transform of an image rather than the image itself has a number of advantages. One such advantage is that a large number of the detail coefficients tend to be very small values. Truncating these small values from the transform introduces only small errors in the reconstructed image, thereby yielding a form of "lossy" image compression. The wavelet coefficients in the above example carry different weights with respect to their importance to the reconstructed image. For example, the overall average of the whole data set is more important than any of the detail coefficients because it affects the entire range of reconstructed values. In order to equalize the importance of all coefficients, the final wavelet coefficients must be normalized appropriately. Normalization is achieved by dividing each wavelet coefficient by 2.sup.i, where i denotes the index of the approximation level in which the coefficient appears (where level 0 is the finest resolution level). Thus, the wavelet transform for the previous example becomes, after normalization: I'=[4,2,0,1/2] There are two preferred methods in accordance with the invention, by which wavelets can be used to transform the pixel values in a two-dimensional image. Each of these transforms is a two-dimensional generalization of the one-dimensional wavelet transform described above. The first is called "standard decomposition." In this method, the one-dimensional wavelet transform is first applied to each row of pixel values; this operation produces an average value along with detail coefficients for each row. Next, these transformed rows are treated as if they were themselves an image and the one-dimensional transform is applied to each column; the resulting values are all detail coefficients except for a single overall average coefficient. The second, more preferred transform is called "non-standard decomposition." In this method, one step of horizontal pairwise averaging and differencing is performed on the pixel values in each row of the image. Next, vertical pairwise averaging and differencing is applied to each column of the result. This process is repeated recursively only on the quadrant containing averages in both directions. The procedure for computing the wavelet transform W for a w.times.w image I using non-standard decomposition is illustrated below in C++ code, although it will be apparent that other computer languages may instead be used: ##EQU2## In applying the non-standard decomposition method of the present invention in a square coordinate system, the coordinate of the upper left corner pixel of I is [1,1], the lower left corner pixel is [1, n] and the upper right corner pixel is [n, 1]. Each horizontal, followed by vertical, pairwise averaging and differencing involves pixels in the 2.times.2 box rooted at coordinates [2i-1,2j-1] for 1.ltoreq.i, j.ltoreq.w/2. The horizontal and vertical averaging of pixels in each 2.times.2 box rooted at [2i-1,2j-1] results in 4 new pixel values and these are computed in program Steps 3-6 above. The upper left value (computed in Step 3) is the average of the 4 pixel values and is stored in a new w/2.times.w/2 temporary. matrix A whose primary purpose is to store averages on which the above averaging and differencing process is to be recursively applied. The remaining 3 new pixel values denoting the upper right (Step 4), lower left (Step 5) and lower right (Step 6) pixels are assigned to pixel [i, j] in the upper right, lower left and lower right w/2.times.w/2 quadrants of W, respectively; these are the detail coefficients. Once the averages for all of the 2.times.2 boxes have been computed, the computeWavelet procedure is recursively invoked on A in Step 9, which is simply a subroutine that computes the Haar wavelets in a manner similar to that performed above. Note that the size of A is w/2.times.w/2 --thus, if w=2 then A has a single average value which is the average value of all of the pixels in I and is thus assigned to W[1,1] in Step 11. As mentioned above, to normalize the coefficients for the one-dimensional Haar transform, the coefficients in the i-th band are divided by 2.sup.i. For the two-dimensional Haar transform, this factor becomes 2.sup.i. Referring now to FIGS. 2A and 2B, an 8.times.8 image I is there shown and the non-standard decomposition method is applied. The matrix I' is the result of the first horizontal and vertical pairwise averaging and differencing on the 2.times.2 boxes of the original image. The average 2.5 of the values in each box is assigned to the 2.times.2 matrix A, while the detail coefficients are stored in the three 2.times.2 quadrants of the W matrix. The process is then recursively performed a second time on the averages contained in the A matrix, resulting in detail coefficients of 0 and an average value of 2.5, which is stored in the upper-left corner pixel of the W matrix. To begin the actual imaging process in accordance with the invention, in the first instance each image is broken into sliding windows (step 130 of FIG. 1) (which may overlap) with different sizes ranging from w.sub.min.times.w.sub.min to w.sub.max.times.w.sub.max. As the signature for each window, the s.sup.2 coefficients are used from the lowest frequency band of the Haar wavelet transform for the window--that is, the s.times.s matrix at the upper left corner of the wavelet transform for the window. To cluster the sliding windows (step 140) in the image, the Euclidean distance between their respective signatures is preferably used as the distance metric. Each cluster thus contains a set of similar windows which together define a region. The centroid of the cluster can be used as the signature for the region; alternately, the bounding box of all signatures in the cluster can be used as the signature as mentioned above. The query image is thereby decomposed into a number of regions. To perform region matching (step 180), for each region of the query image a spatial index is used to find all regions in the database that are similar--, that is, regions whose signatures are within .epsilon. distance of a region of the query. As mentioned above, any database may be used to index the regions. In the preferred embodiment, the R*-tree spatial index (to be described in further detail below) is utilized for spatial indexing and building of the target database. In the image indexing phase, the regions of each image in the database computed in the previous step are preferably indexed using their signatures. The previous step computes all of the pairs of matching regions (Q.sub.i,T.sub.1) for the query image Q and each target image T in the database. This information is used to compute the best similar region pair set for Q and T, i.e. the one that covers the maximum area in the two images and thus maximizes the similarity measure between Q and T as defined above, thereby matching Q to T. It should be noted that for target images T whose similarity to Q exceeds the threshold, steps 1-4 can be repeated with more detailed signatures (that is, by retaining a larger number of coefficients from the wavelet transform) to further improve the quality of matching. In accordance with the inventive methods, a dynamic programming algorithm to efficiently compute wavelet signatures for sliding windows in an image has been developed. This algorithm has similarly been implemented in C++ code, although any programming language can be used. An image is a two dimensional array of pixel values. Images can be represented using a number of different color spaces: i.e. RGB, HSV and YIQ, each of which store 3 color components for every pixel. The dynamic algorithm of the present invention for a single color value per pixel is described below, and the extension of the algorithm to multiple color channels is straightforward. The time complexity of computing the wavelet decomposition for a window is linear in the number of pixels in the window. That is, for a square window of width w, the complexity of computing the wavelet coefficients is O(w.sup.2) For an n.sub.1.times.n.sub.2 image and window size of w, the total number of windows in the image is (n.sub.1 -w)(n.sub.2 -w). Thus, the complexity of naively computing wavelet coefficients of windows rooted at every pixel in the image becomes O(w.sup.2 (n.sub.1 -w)(n.sub.2 -w)), which could be prohibitive. Instead, in accordance with the inventive methods, the dynamic programming algorithm incrementally computes the coefficients for larger windows using those computed for smaller windows, thereby reusing the computations performed for smaller windows. In particular, and assuming computed signatures for windows of size w/2.times.w/2, it is possible to compute signatures for windows of size w.times.w using the signatures of the smaller windows of size w/2.times.w/2. The algorithm can be visualized graphically by first referring to FIGS. 3A, 3B and 3C in which the process for computing a wavelet transform W for a single window is depicted. Let I (FIG. 3A) be the w.times.w window whose wavelet transform W it is desired to compute and let I.sub.1, I.sub.2, I.sub.3 and I.sub.4 denote the four smaller w/2.times.w/2 windows contained therein. Further, let W.sub.1, W.sub.2, W.sub.3 and W.sub.4 (FIG. 3B) be the wavelet transforms for I.sub.1, I.sub.2, I.sub.3, and I.sub.4. Consider a further decomposition of each of W.sub.1, W.sub.2, W.sub.3 and W.sub.4 into quadrants, and label them by 1, 2, 3 and 4, as shown FIG. 3C. The quadrants 2, 3 and 4 denote the upper right, lower left and lower right detail coefficients generated by the first round of horizontal and vertical averaging and differencing on pixels in each of I.sub.1, I.sub.2, I.sub.3 and I.sub.4 by the computeWavelets procedure described above for Haar wavelets. Thus, were the computeWavelets procedure applied to I, then after the first round of averaging and differencing, the detail coefficients in the upper right, lower left and lower right quadrants of W would have values as shown in the right hand matrix. Simply stated, the detail coefficients in each quadrant of W are a combination of the detail coefficients from the corresponding quadrant of each of W.sub.1, W.sub.2, W.sub.3 and W.sub.4. Assume now that A is the w/2.times.w/2 matrix of average values of 2.times.2 boxes of pixels in I after the first round of averaging and differencing. Thus, the upper left quadrant of W is A, and A consists of averages of 2.times.2 boxes from I.sub.1, I.sub.2, I.sub.3 and I.sub.4 after the first round of averaging. Then, since quadrant 1 in each of W.sub.1, W.sub.2, W.sub.3 and W.sub.4 are the wavelet transforms of the averages of 2.times.2 boxes in I.sub.1, I.sub.2, I.sub.3 and I.sub.4, respectively, after the first round of averaging, W.sub.1 [1], . . . , W.sub.4 [1] are the wavelet transforms for the four quadrants of A. Since the upper left quadrant of W is the wavelet transform for A, it is simply necessary to repeat the earlier steps with W.sub.1 [1], W.sub.2 [1], W.sub.3 [1] and W.sub.4 [1] as the wavelet transforms quadrants that comprise A and the upper left quadrant of W as the target where the wavelet transform for A is to be stored. The above-described recursive process terminates when W.sub.1 [1], W.sub.2 [1], W.sub.3 [1] and W.sub.4 [1] each contain only a single value which is the average of all of the pixel values in I.sub.1, I.sub.2, I.sub.3 and I.sub.1, I.sub.2, I.sub.3 and I.sub.4, respectively. At this point, values for the four upper left pixels of W, W.sub.1 [1,1], W.sub.2 [1,2], W.sub.3 [2,1] and W.sub.4 [2,2] can be computed by performing vertical averaging over the four averages in W.sub.1 [1], . . . , W.sub.4 [1] as described in the computeWavelets procedure. FIGS. 4A through 4C, depict the wavelet transforms for the four quadrants of I. In accordance with the inventive methods, the wavelet transform W for I is computed from the wavelet transforms for its four quadrants. First, the detail coefficients are obtained for the three non-upper left quadrants of W by copying them from their respective positions in the four smaller quadrants as described with respect to FIG. 3A through 3C. W is shown in FIG. 4B. The coefficients in the upper left quadrant of W are then computed from the average values for pixels in each of the four quadrants (stored in the upper left corner pixel of each quadrant), and the resulting W is shown in FIG. 4C. The procedure for computing the wavelet transform for a w.times.w window from the wavelets for its four subwindows is shown below in the C++ programming language: procedure computeSingleWindow (W.sub.1, W.sub.2, W.sub.3, W.sub.4, W, w) ##EQU3## The terms W.sub.1, W.sub.2, W.sub.3 and W.sub.4 are the wavelet coefficients for the four w.times.w subwindows of the window whose wavelet transform it is desired to compute, and w is the target for the computed wavelet coefficients for the w.times.w window. If w is 2, then in steps 2-5, it is preferable to perform horizontal and vertical averaging and differencing on the upper left pixel of each of W.sub.1, W.sub.2, W.sub.3 and W.sub.4. Otherwise, the copyBlocks procedure of step 8 above is invoked which copies the upper right, lower left and lower right quadrants of W.sub.1, W.sub.2, W.sub.3 and W.sub.4 to the corresponding quadrants of W as described with respect to FIG. 3. The procedure then calls itself recursively to compute the coefficients for the upper left w/2.times.w/2 quadrant of W using the coefficients in the upper left w/2.times.w/2 quadrants of W.sub.1, W.sub.2, W.sub.3 and W.sub.4. It should be noted that since only the s.times.s signature for each window is of interest, it is only necessary to compute coefficients for the upper left s.times.s matrix of W. From the foregoing discussion on the computation of the wavelet transform for a single window, it follows that this can be computed by recursively invoking the copyBlocks procedure on the s/2.times.s/2 upper left coefficients of the wavelet transform for the four subwindows in the window. Thus, invoking procedure computeSingleWindow with w=s computes the upper left s.times.s wavelet matrix for W with the upper left s/2.times.s/2 wavelet matrices of its four subwindows W.sub.1, W.sub.2, W.sub.3 and W.sub.4 as the starting point. The procedure computeSingleWindow computes the wavelet signature for a single window. Dynamic programming methods of the present invention compute signatures for multiple sliding windows in an n.sub.1.times.n.sub.2 image. A procedure called computeSlidingWindows of the present invention computes s.times.s signatures for all sliding windows in an image whose sizes are a power of 2 and do not exceed w.sub.max. Procedure computeSlidingWindows also accepts as an input parameter the number of pixels t by which to slide each window, where t is the horizontal/vertical distance between the upper left pixels of any two adjacent windows. The parameters s, w.sub.max and t are all required to be powers of 2. This procedure, in C++, is as follows: procedure ComputeSlidingWindows s, w.sub.max, t ##EQU4## In a single iteration of the outermost loop in Step 1, wavelet signatures are computed for all w.times.w windows in the image. In each successive iteration, w is doubled, beginning with an initial value of 2 for w until it reaches the maximum window size w.sub.max. The wavelet signature for a window of size w and whose upper left pixel is at [i, j] is stored in W.sup.w [i, j]. The wavelet signature for each w.times.w window is computed using those computed in the previous iteration for the four w/2.times.w/2 subwindows in it. W.sup.w [i, j] for every 1.times.1 window rooted at pixel [i, j] is initialized to the value of the pixel [i, j] (that is, the raw image intensity at a pixel is the signature for the 1.times.1 window containing the pixel). In Step 2, dist, the distance between any two successive sliding windows is set to be the minimum of t and the window width w; this is for alignment purposes and to ensure that for any w.times.w window, the wavelet signatures for its four w/2.times.w/2 subwindows have been computed in the previous iteration. In order to show that this is indeed the case, consider two cases: if w.ltoreq.t then dist is w and, during the previous iteration, dist must have been w/2. Thus, it follows that wavelet signatures for all windows comprising the window would have been computed in the previous iteration. On the other hand, if w>t, then dist is the same during the previous iteration and the current iteration, and is equal to t. Since t and w are both multiples of 2, it follows that w/2 is a multiple of t and therefore the wavelet signatures for the four w/2.times.w/2 windows would have been computed in the previous iteration. In an n.sub.1.times.n.sub.2 image, there are ##EQU5## possible w.times.w windows at a distance dist apart and rooted at a single row of pixels; similarly there can be ##EQU6## windows rooted at a single column of pixels. Thus, for the loops of steps 3 and 4, the coordinates [x, y] for the upper left corner pixel for each window is varied from 1 to n.sub.1 -w+1 at increments of dist in the horizontal direction, and from 1 to n.sub.2 -w+1 with increments of dist in the vertical direction. Finally, in Step 7, the procedure computeSingleWindow is invoked to compute the signature for the w.times.w window rooted at [x, y] from signatures for its four w/2.times.w/2 subwindows and the size of the signature to be computed, min (w, s), is passed as a parameter. Therefore, to generate signatures for all windows with sizes that are a power of 2 and range from w.sub.min.times.w.sub.min to w.sub.max.times.w.sub.max, passing w.sub.max as the input parameter to the dynamic programming algorithm generates all of the desired signatures. It will be noted that signatures for windows whose size w is less than w.sub.min can be deleted once they have been used to compute the signatures for windows of size 2w. The number of windows of size w.times.w for which wavelet signatures are computed during a single iteration of the procedure computeSlidingWindows is ##EQU7## Furthermore, O(s.sup.2) operations are required to compute the s.times.s wavelet signature for each window. The computation of most of the signature coefficients simply involves simply copying them from the signature computed for a window of width w/2. Only the four lowest band coefficients require pairwise averaging and differencing to be performed. Thus, the overall time complexity of computing signatures for windows of a given w is O(NS), where N=n.sub.1 n.sub.2 and S=s.sup.2. Since it is desired to compute signatures for log.sub.2 w.sub.max window sizes ranging from 2 to w.sub.max, the overall time complexity of computing signatures for all of the sliding windows is O(N S log.sub.2 w.sub.max). For each window, it is further desired to obtain a two-dimensional array of size s.times.s to store its signature. Thus, for all windows of a size w, N*S memory is needed where N=n.sub.1 n.sub.2 and S=s.sup.2. The computation of the signature for a new, bigger window requires just the previously computed signatures for smaller subwindows. Consequently, signatures are needed for only two consecutive window sizes need to fit in main memory for the purpose of efficiently computing signatures without incurring disk I/O. The total memory space requirements for the dynamic programming algorithm of the present invention is therefore 2N*S. The memory requirements to compute signatures for windows can actually be reduced from 2N*S to simply N*S. It will be recognized that each window of size w/2.times.w/2 participates in the computation of signatures for exactly four windows of size w.times.w (except near the image borders). Moreover, if windows are processed from top to bottom, and left to right, then when computing W.sup.w [i,j] the three other w.times.w windows that use W.sup.2 [i, j] have already been processed and W.sup.w/2 [i, j] can therefore be deleted once W.sup.w [i, j] has been computed. The total auxiliary memory space required by the procedure computeSlidingWindows thus advantageously becomes exactly N*S. One other way to compute the wavelet signatures for all windows of a given size w is to first compute all of the wavelet coefficients for each window and then truncate the higher frequency bands that are not of interest. This is a so called "naive" algorithm whose strength is its reduced memory requirements which come about from processing one window at a time so that only enough memory to store the wavelet transform for a single window of size w.sup.2 is needed. Thus, the prior art naive algorithm requires only w.sup.2 memory space as compared to the N*S space required by the dynamic programming algorithm. However, the time complexity of the naive algorithm can be much higher since computing the wavelet transform of each w.times.w window requires O(w.sup.2) operations and there are O(N) such windows. Thus, the overall time complexity of the naive algorithm is O(Nw.sup.2.sub.max) which can be much worse than the O(NSlog.sub.2 w.sub.max) complexity for the inventive dynamic programming algorithm since, typically, signatures are much smaller than the windows themselves and thus s<<w.sub.max. The number of sliding windows generated by the previous steps can be quite large--there are for example nearly 1000 64.times.64 windows at a spacing of 2 pixels in a 128.times.128 pixel image. The storing of signatures for every sliding window for every image could therefore be prohibitively expensive in terms of both storage and processing costs. One way to reduce this overhead is to cluster the similar windows within an image and store a single representative signature for all of the windows in a cluster. Since an image may generate a sizeable number of sliding windows, in order to guarantee low response times in the clustering steps 130, 200 of FIG. 1, clustering algorithms with linear time complexity are desired. Most clustering algorithms in the prior art have at least quadratic complexity. Moreover, since it is highly desired to ensure that a cluster contains windows that are fairly alike, it is preferable to be able to specify a threshold on the radius of the cluster, i.e. the maximum distance between the center of the cluster and a point in the cluster. The pre-clustering phase taught in T. Zhang, R. Ramakrishnan, and M. Livny, "Birch: An Efficient Data Clustering Method for Very Large Database," Proceedings of the ACM SIGMOD Conference of Management of Data, pp. 103-114 (June 1996), is expressly incorporated herein by reference (hereinafter referred to as "BIRCH") is a state-of-the-art clustering algorithm for large data sets and meets these requirements. This preclustering phase has time complexity that is linear in the input size and accepts an .epsilon..sub.c parameter that is the threshold on cluster size. It stores a compact summary for each cluster in a CF-tree, which is a balanced tree structure similar to an R*-tree. For each successive data point, it traverses the CF-tree to find the closest cluster and, if the point is within distance .epsilon..sub.c of the cluster, it is absorbed into it; otherwise, it starts a new cluster. The threshold parameter .epsilon..sub.c and the signatures of all of the sliding windows in an image are presented as inputs to the BIRCH pre-clustering algorithm. BIRCH then generates a set of clusters each, of whose radius is generally within .epsilon..sub.c. Depending on the complexity of a given image, the number of clusters will vary. For example, simple uniform images tend to have a lot of similar windows while complex images containing many objects and abrupt color transitions can be expected to have a considerable number of different windows. Thus, the number of clusters generally increases with image complexity. Each cluster defines a region of the image. Either the centroid or the bounding box of the signatures for windows in the cluster can be used as the signature for the corresponding region. For each region, a bitmap can be computed for the pixels of the image covered by windows in its cluster. This information is utilized in the final step of the similarity metric of the present invention to compute the area of an image covered by multiple matching (and possibly overlapping) regions. For each region, its signature along with its bitmap is stored in an R*-tree which is an efficient disk-based index structure for the storage and retrieval of high-dimensional data points. It should be noted that in the bitmap for a region, it is not necessary to store a bit for each pixel--instead, a coarse representation can be used in which a single bit is kept for each k.times.k array of pixels, thus decreasing the storage overhead by a factor of k.sup.2. The regions for database images are then preferably stored in a disk-based spatial index. Each region is indexed using either the centroid or the bounding box for signatures in its cluster. Given a query image Q, its regions are extracted using steps identical to those used for the database images. The index is then probed to locate all regions in the database whose signatures are within .epsilon. distance of any of the query's regions. Thus, if signatures are bounding rectangles, then the bounding rectangles of regions in the query image are extended by .epsilon. and the index is then used to locate all overlapping bounding rectangles therein. Since each bounding rectangle in the index represents a region belonging to an image in the database, this step effectively retrieves all database images that have at least one similar region to that of the query. The choice of a good spatial index for indexing region signatures is important to the attainable performance of the inventive methods and algorithms. Most proposed methods explode exponentially with dimensionality, eventually reducing to sequential scanning. The R-tree family of multi-dimensional structures is preferred because they are very effective spatial disk-based indexes. R-tree spatial indexes are taught in A. Guttman, "R-trees: A Dynamic Index Structure for Spatial Searching," Proceedings ACM SIGMOD, pp. 47-57 (June 1984), which is expressly incorporated herein by reference. Specifically, the R*-tree spatial index taught in N. Beckman, H. P. Kriegel, R. Schneider and B. Seeger, "The R*-tree: An Efficient and Robust Access Method for Points and Rectangles," Proceedings of ACM SIGMOD, pp. 322-331, (May 1990), which is also expressly incorporated herein by reference, exhibits better performance than other R-tree variants and is preferably used in the inventive methods. Like the R-tree, each internal node in an R*-tree stores a minimum bounding rectangle (MBR) for each of its children and MBRs of siblings are permitted to overlap. Nodes are split when they become full; however, rather than just considering the area, the node splitting heuristic in the R*-tree also minimizes the perimeter and overlap of the MBRs. Furthermore, in order to make the shape of the tree less dependent on the order of insertion, when a node becomes full it is not immediately split but, instead, a portion of the node is reinserted from the top level. Due to these two enhancements, the R*-tree generally outperforms the R-tree. The quickest similarity metric to compute is one in which a union of the bitmaps is obtained for the matching regions from Q and T, and the area covered by the regions is then computed. The similarity is then the fraction of the total area of the two images that the computed area for the regions represents. This procedure is very fast (linear time complexity in n) and corresponds to relaxing the requirement that each region appear only once in a similar region pair set. A drawback of this approach, however, is that the same query region may match a number of different regions in the target, making the covered area in the target image large since all of the matched regions get included in the computation of area. This may nevertheless not present a problem since very few regions from the query image match regions in the target and, thus, the covered area in the query itself could be small. The foregoing situation can in any event be remedied by adopting the strict definition of a similar region pair set which restricts the relationship between regions of Q and T to a one-to-one correspondence, i.e. it prohibits a region from appearing multiple times in the similar region pair set. Ideally, it is advantageous to compute the similar region pair set that results in the maximum value for the similarity measure. However, since overlap between regions is permitted, the problem of determining such matching pairs of distinct regions from Q and T that maximize the covered area is a non-trivial problem. Therefore, given matching pairs of regions (Q.sub.1 ; T.sub.1) . . . (Q.sub.n ; T.sub.n, the problem of computing a similar region pair set with the maximum covered area is denoted as NP-hard. This can be shown by the following straightforward reduction from the set cover problem, which can be stated as follows: Given elements e.sub.1, . . . , e.sub.n, and sets L.sub.1, . . . L.sub.m, are there k or less sets such that every element e is contained in one of the k sets? The reduction is carried out by constructing matching regions in Q and R as follows. Consider k non-overlapping windows Q.sub.1, . . . Q.sub.k in Q, each with a very small area .delta.. Each Q.sub.i is a separate region of Q. Construct m regions T.sub.1, . . . , T.sub.m in T, one corresponding to each set L.sub.i as follows. First, for every element e.sub.i there are n.sub.i windows in T, where n.sub.i is the number of sets containing e.sub.i. Then, n.sub.i windows corresponding to e.sub.i are arranged in T as follows. First, windows for two distinct elements have no overlap. Second, any pair of windows for the same element have a very large overlap .gtoreq.A which is almost equal to the area of the window itself (also A>>.delta.). Each region T.sub.i is then constructed to contain one window for each element e.sub.i contained in the corresponding set L.sub.i --as a result, T.sub.i is the union of [L.sub.i ] windows. Finally, every region from Q is allowed to match every region from T. Thus, there are k or less sets containing every element e.sub.i if and only if there is a similar region pair set with area at least nA. The "only if" part is straightforward--simply choose the regions in T corresponding to the k sets that cover the n elements and, since the windows for elements do not overlap and the window for each element has area of at least A, the area of the similar region pair set is at least nA. For the "if" direction, choose the sets corresponding to the k matching regions from T. If the area covered is greater than nA, then because A is chosen to be very large, it must be the case that a window for each element is contained in the chosen regions in T, and, thus, the sets corresponding to them must contain all of the n elements. In the following, a "greedy" heuristic is proposed for computing the similar region pair set with the maximum area. The basic idea is to relatively choose the best pair of matching regions that maximizes the area covered by the regions. Let B.sub.Q and B.sub.T be bitmaps maintained for the two images Q and T, respectively, each initially set to zero. Initially, inSet contains the matching pairs of regions from Q and T, (Q.sub.1 ; T.sub.1); : : : ; (Q.sub.n ; T.sub.n), while outSet, into which the similar region pair set is transformed, is empty. The steps of this greedy heuristic are as follows: 1. Add to outSet the pair (Qi; Ti) from inSet for which area (B.sub.Q.orgate.Qi)+area(B.sub.T.orgate.Ti) is maximum. 2. Delete all pairs containing either Q.sub.i or T.sub.i from inSet. Set B.sub.Q =B.sub.Q.orgate.Qi and BT=BT.orgate.Ti. 3. If inSet is empty, return outSet; otherwise, return to Step 1. The time complexity of the greedy algorithm is O(n.sup.2), where n is the number of matching pairs in inSet. The reason for this is that steps 1 and 2 are repeated at most n times, and the complexity of selecting the best matching pair in Step 1 is O(n). During the image indexing phase, practical considerations force the use of signatures with small sizes, and relatively large .epsilon..sub.c values for the clustering procedure. Smaller signatures reduce the overhead associated with storing keys in indices, and are advantageous since the R-tree family of indices is known to be inefficient for high dimensions. Smaller signatures however, contain less information and, thus, may be less accurate than larger ones. Similarly, a larger value for .epsilon..sub.c results in fewer and larger clusters and, consequently, fewer regions and reduced storage and computation requirements. This also means, however, that windows within a region may be less homogeneous and that the region signatures may be less representative of windows in the region. These considerations, however, are inapplicable when considering only a single target image T whose similarity measure to Q has already been found to exceed the specified threshold. In this case, even with a smaller .epsilon..sub.c for the clustering phase, the number of regions for T can be expected to be fairly small. As a result, storing larger signatures for each region of T becomes feasible and these signatures will easily fit in main memory. It is therefore possible to perform a refined matching phase in which larger wavelet signatures are recomputed for sliding windows in both Q and T, the windows with a smaller .epsilon..sub.c are clustered, and the region and image matching phases on Q and T are performed as described hereinabove. This refined phase can result in a marked improvement in the quality of the matching process due to more accurate signatures and more homogeneous clusters. If, on the other hand, response time is a significant concern, this refined step of image matching can be selectively omitted. In an effort to experimentally demonstrate the performance benefits gained by the inventive dynamic programming methods herein taught and disclosed for computing wavelet signatures for sliding windows, the execution times were measured for the inventive method and for the "naive" algorithm discussed above. Three important factors that affect the performance of the algorithms are image size, sliding window size and signature size. An exemplary image size is 256.times.256 and the sliding window size and signature size are varied. The distance between any two adjacent sliding windows is set to be one pixel. FIGS. 5A and 5B, depict the experimentally determined effects of these parameters on computation times. FIG. 5A shows Execution Time (in seconds) against sliding window time, while FIG. 5B shows Execution Time (in seconds) against signature size. In FIG. 5A the running times of both the inventive and prior art algorithms for computing 2.times.2 signatures as the sliding window size is varied from 2.times.2 to 128.times.128. The numbers 2.sup.i along the x-axis represent a window size of 2.sup.I.times.2.sup.I. Since the sliding window size is always a power of two, the x-axis is shown in log scale. It will be observed that the execution time increases slowly for the inventive dynamic programming algorithm as the window size increases. However, the performance of the "naive" algorithm degrades significantly as the window size grows. The reason for this is that the time complexity of the naive algorithm is proportional to the square of the window size, while the execution time of the inventive dynamic programming algorithm is proportional to the logarithm of window size. For a window size of 128.times.128, the naive algorithm takes almost 25 seconds to run which is about 17 times slower than the inventive dynamic programming algorithm taught herein. Similarly, FIG. 5B shows the execution times for the inventive and prior art algorithms as signature size is increased for a fixed sliding window size of 128.times.128. It can be seen from this graph that the execution time of the dynamic programming algorithm of the present invention increases slowly as signature size is increased, while that of the naive algorithm stays almost constant at 25 seconds. However, even for signatures sized as large 32.times.32, the inventive dynamic programming algorithm is nearly 5 times faster than the naive algorithm. Since typical signature sizes can be expected to be between 2.times.2 and 8.times.8 due to the inability of existing indices to handle high-dimensional data, in most practical situations the inventive dynamic programming algorithm(s) will be far superior to the prior art naive algorithm and, indeed, to most other computationally expensive imaging algorithms. Thus, the methods of imaging based on wavelet retrieval of scenes as disclosed herein efficiently and effectively compute fixed-size, low-dimensional feature signatures independent of resolution, image size and dithering effects. The inventive methods are computationally inexpensive and can therefore be run on general purpose digital computers. Additionally, the algorithms that implement wavelet retrieval in accordance with the invention are very fast as compared to prior art imaging techniques and are able to differentiate and image objects containing complex colors, textures and shapes, as well as multiple images within an object. Such results have not heretofore been achieved in the art. Thus, while there have shown and described and pointed out fundamental novel features of the invention as applied to a preferred embodiment thereof, it will be understood that various omissions and substitutions and changes in the form and details of the devices illustrated, and in their operation, may be made by those skilled in the art without departing from the spirit of the invention. For example, it is expressly intended that all combinations of those elements and/or method steps which perform substantially the same function in substantially the same way to achieve the same results are within the scope of the invention. Moreover, it should be recognized that structures and/or elements and/or method steps shown and/or described in connection with any disclosed form or embodiment of the invention may be incorporated in any other disclosed or described or suggested form or embodiment as a general matter of design choice. It is the intention, therefore, to be limited only as indicated by the scope of the claims appended hereto.
|
Same subclass Same class Consider this |
||||||||||
