In contrast with the unsupervised techniques, supervised learning methods require labeled ground truth data and pre-training to adapt the system to the task at hand, in this case vessel pixel segmentation. Supervised methods tend to follow the same pattern: the problem is formulated as a binary classification task (vessel vs not vessel).
Image features are hand-engineered and then a machine learning classifier is trained to map from those features (such as gradient information, interest point descriptors, responses to image processing filters like Gabor wavelets, etc.) to either a probability that that pixel is a vessel or directly to the binary classification (vessel / not vessel).
For example, [5] computes features based on pixel responses to Gaussian Derivative filters, then applies an interesting result to “rotate” feature vectors via multiplication with a linear operator. They use this idea to achieve rotation invariance (since the blood vessels appear at all orientations). With feature vectors in hand they then apply Support Vector Machines (SVM) as its classifier. One downfall of their approach is that by not explicitly modelling the vessels, but instead trying to learn their features they developed a classifier that also produced a strong result from the optic disc giving them many false positive detections across the testing data set.
[21] introduced the DRIVE database to test their system which started with ridge detection based on the curvature (using the Hessian) to find the center-lines. These lines are then grouped together into convex sets by an region growing process that looks in a neighborhood around the pixel and check (via the eigenvectors) that the direction of adjacent lines are similar and that they are not on parallel lines.
Interestingly this is like an ad-hoc test for the perceptual grouping employed by tensor voting. The authors then extract 18 different features based on color and various properties of the detected ridge lines and convex sets. These features are fed into a k-NN classifier (with k = 101 in their case). Their system produces excellent results with an area under the curve score of 0.95 however the disadvantages of using a k-NN classifier are classification time (at the time of publication it took 15 minutes to classify an image on a 1 Ghz Pentium computer), additionally k-NN classifiers require large amounts of memory because their “training” regime simply consists of memorizing all of the training data. This was not a problem for them, however they mention that handling certain failure cases could be improved with more training data, however this would directly increase the memory requirements of the system since there is no “compression.”
[19] implements their own preprocessing to deal with camera aperture issues by extending the border of the image (by replicating pixel values), they invert the green channel and then compute the Wavelet transform using 2D Gabor wavelet because of their good properties for localizing details. The Gabor wavelet can be steered and so they compute the transform at from 0 degrees to 180 degrees in steps of 10 degrees and for each pixel select the maximum Wavelet response as that pixels feature value. They pass the pixel intensity value and maximum Wavelet response into two classifiers, a Gaussian mixture model classifer and through logistic regression. [20] is more closely related to our work in the sense that his features are simple (matched filters) and he trains a conditional probability density function using histograms to use as the decision criteria for assigning a likelihood ratio to the vessel vs non-vessel classification problem.
With a few exceptions the supervised learning approaches produce outstanding results. Various combinations of features and learning have all produced results around 0.95 area under the ROC curve (our metric of choice because it is widely reported by others). However, supervised approaches require labeled data to train with, and depending on their features may need to be retrained for example, for systems
with different camera Field of View (FOV), or other changes in the input data versus the training data.
[5] G. Gonz´alez, F. Fleurety, and P. Fua, “Learning rotational features for filament detection,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009, pp. 1582–1589.
[19] J. Soares, J. Leandro, and R. M. Cesar, “Retinal vessel segmentation using the 2-D Gabor wavelet and supervised classification,” Medical Imaging, 2006.
[20] M. Sofka and C. V. Stewart, “Retinal vessel centerline extraction using multiscale matched filters, confidence and edge measures,” Medical Imaging, IEEE Transactions on, vol. 25, no. 12, pp. 1531–1546, 2006.
[21] J. Staal, M. D. Abr`amoff, M. Niemeijer, M. A. Viergever, and B. van Ginneken, “Ridge-based vessel segmentation in color images of the retina,” Medical Imaging, IEEE Transactions on, vol. 23, no. 4, pp. 501–509, 2004.