The world is made of three dimensions (3-D). Humans perceive the world in 3-D but really see 2-D images in the eye retina. We have the aptitude to estimate the depth of objects in a scene. We want to mimic this aptitude of recovering 3-D perception of a scene from a 2-D image in computer vision.
The formation of an image depends on the light source (intensity), the camera (extrinsic and intrinsic parameters) and the scene (shape). Digital images are stored as arrays of numbers. Those numbers can represent intensity. The higher is the number, the brighter is the image. The basic elements of an imaging device is the aperture to limit the amount of light entering the system, the optical system to focus the light and some imaging photosensitive surface such as a film or sensors. The world is made of 3 dimensions. When we take a picture, the 3-D is mapped to 2-D. We are in a way losing one dimension which makes it hard in computer vision to recover 3-D from 2-D. A projection is a way to transform a world from one dimensionality to another. Our initial world is 3-D and therefore we need to formulate a projection from this 3-D world into the 2-D one that we see. There are various projection models. The most popular projection model is called pinhole camera model or perspective projection. The distance from the lens to the image plane is called focal length and the distance from the lens to the object is the depth. There is another projection which is called orthographic projection. In orthographic projection, there is no perspective effect. It is similar to taking a picture from very far away. The perspective projection model can be simplified as a scaled orthographic projection model where the focal length parameter is dropped.
There are many methods to recover 3-D from 2-D in computer vision. Our vision system is 2-D but we have a way to perceive 3-D. We perceive the world in 3-D by using several depth cues; two eyes (stereo vision), motion, convergence, occlusion and texture density. In computer vision we try to recover 3-D information or 3-D shape from 2-D images using one or several of those depth cues. Such 3-D reconstruction methods are called shape from X where X can be stereo, motion, shading, texture or contours. There is lot of applications of 3-D reconstruction from 2-D image such as face recognition, object recognition, remote sensing (Unmanned Aerial Vehicles for example), robotics and computer graphics. Face recognition is a popular security tool to verify and identify individuals at check point areas such as airports and law enforcement agencies. Currently available face recognition systems, which are mostly based on 2-D face images, are sensitive to lighting conditions and changes in head pose. Systems with 3-D face reconstruction capabilities have great potential for application in security facilities as they can aid and enhance face recognition capacities.