Deep learning models are often criticized for not being interpretable. A neural network-based model is often considered to be like a black box because it’s difficult for humans to reason out the working of a deep learning model. The transformations of an image over layers by deep learning models are non-linear due to activation functions, so cannot be visualized easily. There are methods that have been developed to tackle the criticism of the non-interpretability by visualizing the layers of the deep network. In this section, we will look at the attempts to visualize the deep layers in an effort to understand how a model works.
Visualization can be done using the activation and gradient of the model. The activation can be visualized using the following techniques:
- Nearest neighbour: A layer activation of an image can be taken and the nearest images of that activation can be seen together.
- Dimensionality reduction: The dimension of the activation can be reduced by principal component analysis (PCA) or t-Distributed Stochastic Neighbor Embedding (t-SNE) for visualizing in two or three dimensions. PCA reduces the dimension by projecting the values in the direction of maximum variance. t-SNE reduces the dimension by mapping the closest points to three dimensions. The use of dimensionality reduction and its techniques are out of the scope of this book. You are advised to refer to basic machine learning material to learn more about dimensionality reduction.
Wikipedia is a good source for understanding dimensionality reduction techniques. Here are a few links that you can refer to:
- Maximal patches: One neuron is activated and the corresponding patch with maximum activation is captured.
- Occlusion: The images are occluded (obstructed) at various positions and the activation is shown as heat maps to understand what portions of the images are important.