Monocular Head Pose Estimation

Our system automatically estimates the human head pose from single view images. The 6DOF head pose was estimated using Pose from Orthography and Scaling with ITerations (POSIT) where a statistical anthropometric 3D rigid model is used as an approximation of the human head, combined with Active Appearance Models (AAM) for facial features extraction and tracking.
The results show that orientations and head location were, on average, found within 2 degrees or 1cm error standard deviations respectively.

Head Pose Estimation
The full automatic framework for head pose extraction is composed by the two parts. An AAM model fitting is performed on a subject leading to shape model landmarks location tracking over time.

Figure 1: Anthropometric head used as POSIT 3D model.
The head pose estimation is performed using POSIT. As 3D model an anthropometric 3D rigid model of the human head its used, see figure 1, since it is the best suitable rigid body model that describe the 3D face surface of several individuals. It was acquired by a frontal laser 3D scan of an physical model, selecting the equivalent 3D points of the AAM annotation procedure creating a sparse 3D model. Figure 2 illustrates this procedure.
Figure 2: Left) Physical model used. Center) Laser scan data acquired. Right) OpenGL built model using the AAM shape features.
AnthropometricModel AnthropometricLaserScan glAnthropometric
By tracking features in each video frame combined with the landmark-based nature of AAMs, the image/3Dmodel registration problem required for the use of POSIT is easily solved.

Experimental Results
The orientation of the estimated pose is represented by the Roll, Pitch and Yaw (RPY) angles. Figure 3 shows some examples of pose estimation where the pose is represented by an animated 3DOF rotational OpenGL model showed at images top right. This model, used only for display, follows the subject head rotations, ignoring translational effects.

Figure 3: Example of pose estimation.
PitchSampleImage YawSampleImage RollSampleImage

The application with AAM model fitting combined with POSIT for pose estimation runs at 5 frames/s on 1024x768 images using a 3.4 GHz P4 Intel Processor under Linux OS. AAM is based on a 58 landmark shape points (N=58), sampling 48178 pixels with color information (m=48178x3=144534) by OpenGL hardware-assisted texture mapping using a Nvidia GeForce 7300 graphics board.

Demo video