Approximate nearest neighbour oh yeah (ANNOY) is a method for faster nearest neighbour search. ANNOY builds trees by random projections. The tree structure makes it easier to find the closest matches. You can create an ANNOYIndex for faster retrieval as shown here:

def create_annoy(target_features):
    t = AnnoyIndex(layer_dimension)
    for idx, target_feature in enumerate(target_features):
        t.add_item(idx, target_feature)
    t.build(10)
    t.save(os.path.join(work_dir, 'annoy.ann'))

create_annoy(target_features)

The dimension of the features is required for creating the index. Then the items are added to the index and the tree is built. The bigger the number of trees, the more accurate the results will be with a trade-off of time and space complexity. The index can be created and loaded into the memory. The ANNOY can be queried as shown here:

annoy_index = AnnoyIndex(10)
annoy_index.load(os.path.join(work_dir, 'annoy.ann'))
matches = annoy_index.get_nns_by_vector(query_feature, 20)

The list of matches can be used to retrieve the image details. The index of the items will be returned.

 

 

Advantages of ANNOY

There are many reasons for using ANNOY. The main advantages are listed as follows:

  • Has a memory-mapped data structure, hence, less intensive on RAM. The same file can be shared among multiple processes due to this.
  • Multiple distances such as Manhattan, Cosine, or Euclidean can be used for computing the similarity between the query image and target database.