We detect using the ViolaJones object detection framework as implemented in OpenCV. Instead of working directly with image intensities, the Viola-Jones method uses Haar-like features to classify images.
OpenCV provides trained cascades for various facial features. In our pipeline, we first use a face cascade to locate faces in each frame. We filter the detected faces and keep only the largest one. Next, we apply a mouth detector within the bottom half of the face region. We use nested cascades to reduce the false positive detections that appear if one uses a mouth detector on the entire image. If no mouth is found, we use the lower third of the face region as the ”detected” mouth. When a mouth region is determined, we expand it to be slightly larger than the detected region in order to have more pixels to blend with in our next step. An example of a detected face and mouth region is shown in Figure 1.