Efficient retrieval

The retrieval can be slow because it’s a brute-force method. Matching can be made faster using approximate nearest neighbor. The curse of dimensionality also kicks in, as shown in the following figure: With every increasing dimension, complexity increases as the...

Building the retrieval pipeline

The sequence of steps to get the best matches from target images for a query image is called the retrieval pipeline. The retrieval pipeline has multiple steps or components. The features of the image database have to be extracted offline and stored in a database. For...

Content-based image retrieval

The technique of Content-based Image Retrieval (CBIR) takes a query image as the input and ranks images from a database of target images, producing the output. CBIR is an image to image search engine with a specific goal. A database of target images is required for...

Classification

Image classification is the task of labelling the whole image with an object or concept with confidence. The applications include gender classification given an image of a person’s face, identifying the type of pet, tagging photos, and so on. The following is an...

Deep learning for computer vision

Computer vision enables the properties of human vision on a computer. A computer could be in the form of a smartphone, drones, CCTV, MRI scanner, and so on, with various sensors for perception. The sensor produces images in a digital form that has to be interpreted by...

Long short-term memory (LSTM)

Long short-term memory (LSTM) can store information for longer periods of time, and hence, it is efficient in capturing long-term efficiencies. The following figure illustrates how an LSTM cell is designed:   LSTM has several gates: forget, input, and output....

Recurrent neural networks (RNN)

Recurrent neural networks (RNN) can model sequential information. They do not assume that the data points are intensive. They perform the same task from the output of the previous data of a series of sequence data. This can also be thought of as memory. RNN cannot...

Max pooling in Convolutional neural network

Pooling layers are placed between convolution layers. Pooling layers reduce the size of the image across layers by sampling. The sampling is done by selecting the maximum value in a window. Average pooling averages over the window. Pooling also acts as a...

Kernel in Convolutional neural network

Kernel is the parameter convolution layer used to convolve the image. The convolution operation is shown in the following figure:     The kernel has two parameters, called stride and size. The size can be any dimension of a rectangle. Stride is the number of...

Convolutional neural network

Convolutional neural networks (CNN) are similar to the neural networks described in the previous sections. CNNs have weights, biases, and outputs through a nonlinear activation. Regular neural networks take inputs and the neurons fully connected to the next layers....

Playing with TensorFlow playground

TensorFlow playground is an interactive visualization of neural networks. Visit http://playground.tensorflow.org/, play by changing the parameters to see how the previously mentioned terms work together. Here is a screenshot of the playground:  ...

Stochastic gradient descent

SGD is the same as gradient descent, except that it is used for only partial data to train every time. The parameter is called mini-batch size. Theoretically, even one example can be used for training. In practice, it is better to experiment with various numbers. In...

Softmax in Neural Network

Softmax is a way of forcing the neural networks to output the sum of 1. Thereby, the output values of the softmax function can be considered as part of a probability distribution. This is useful in multi-class classification problems. Softmax is a kind...