Object detection is the task of detecting and recognizing objects in media such as images. Objects of interest need to be recognized, positioned, and classified. Object detection has been popular in 2D computer vision since the success of CNN in image recognition.

Deep learning approaches dominate the object detection task once they were introduced. In 2D computer vision, the deep learning approaches can take an image as the input the output bounding boxes with class labels as the result.

The famous You Only Look Once (YOLO) model introduced in 2016 and its variants took an important role in many real-world applications. In contrast, because point clouds are sparse and have variable point density, much hand-crafting was involved in feature extraction for point clouds until the introduction of VoxelNet.