In this article, an R implementation of locality sensitive hashing will be used for fast approximate nearest neighbor search in images. The idea of document retrieval using LSH appears as one assignment in the Coursera Course Machine Learning Clustering and Retrieval. As kd-tree based implementation of ANN search does not scale well with high dimensional data (such as text data), LSH provides an alternative implementation.
A face image is chosen from the face dataset as shown below and LSH query is used to retrieve the top 25 nearest neighbors (images) that are most similar to the query image.The query image face is oriented to the left so the nearest negihbors of this image is expected to be oriented to the left as well.
The next figure shows the paritions induced by the LSH training process on the face images, images in the same partition (bin) has the same color, with a particular run of LSH the following 22 bins were obtained.
As can be seen from the above figure, that the top 25 query results become more relevant as the search radius grows.
As we increase the search radius, we find more neighbors that are more similar to the query image.
With increased search radius comes a greater number images that have to be searched. Query time is higher as a consequence.
With sufficiently high search radius, the results of LSH begin to resemble the results of brute-force search.
As shown in the next figure, the impact of search radius on the quality metrics for the neighbors retrieved are examined.
As can, be seen from the above figures, the top 25 similar images obtained using LSH query starts to resemeble the true nearest neighbors of the query image, as the search radius increases.
The next animation shows how the nearest neighbors change with LSH search radius, when queried with an image of 0.