The technique of Content-based Image Retrieval (CBIR) takes a query image as the input and ranks images from a database of target images, producing the output. CBIR is an image to image search engine with a specific goal. A database of target images is required for retrieval. The target images with the minimum distance from the query image are returned. We can use the image directly for similarity, but the problems are as follows:
- The image is of huge dimensions
- There is a lot of redundancy in pixels
- A pixel doesn’t carry the semantic information
So, we train a model for object classification and use the features from the model for retrieval. Then we pass the query image and database of targets through the same model to get the features. The models can also be called encoders as they encode the information about the images for the particular task. Encoders should be able to capture global and local features. We can use the models that we studied in the image classification chapter, trained for a classification task. The searching of the image may take a lot of time, as a brute-force or linear scan is slow. Hence, some methods for faster retrieval are required. Here are some methods for faster matching:
- Locality sensitive hashing (LSH): LSH projects the features to their subspace and can give a candidate a list and do a fine-feature ranking later. This is also a dimensionality reduction technique such as PCA and t-SNE which we covered earlier in the chapter. This has feature buckets in lower dimensions.
- Multi-index hashing: This method hashes the features and it is like pigeonhole fitting making it faster. It uses hamming distance to make the computation faster. Hamming distance is nothing but the number of location differences of the numbers when expressed in binary.
These methods are faster, need lesser memory, with the trade-off being accuracy. These methods also don’t capture the semantic difference. The matches results can be re-ranked to get better results based on the query. Re-ranking can improve the results by reordering the returned target images. Re-ranking may use one of the following techniques:
- Geometric verification: This method matches the geometries and target images with only similar geometries returned.
- Query expansion: This expands the list of target images and searches them exhaustively.
- Relevance feedback: This method gets the feedback from the use and returns the results. Based on the user input, the re-ranking will be done.
These techniques are well developed for text and can be used for images. In this chapter, we will focus on extracting features and use them for CBIR.