Human body motion tracking and analysis has received a significant amount of attention in the computer vision research community in the past decade. This has been motivated by the ambitious goal of achieving a vision-based perceptual user interface in which the state and the action of the user(s) are automatically inferred from a set of video cameras. The objective is to extend the current mouse-keyboard interaction techniques in order to allow the user to interact naturally in an immersed environment, as the system perceives and responds appropriately to user's gestures. Understanding human action in an environment is a challenging task as it involves different granularity in its analysis and description according to the targeted application.