Abstract:In computer vision and multimedia areas, it's an important yet challenging problem to perceive human motion at semantic level. In this work, a novel approach is presented to map the low-level response to semantic description of human actions. The features are based on the detection of deformable part models, in which the body pose information is contained implicitly under the specific human actions. The filter responses of the detectors are mapped to an effective feature description, which encodes the position and appearance information of human body and parts. The obtained features capture the relative configuration of body parts, and are robust to the false detections occurred in the individual part detectors. Comprehensive experiments conducted on three databases show the presented method achieves remarkable performance in most of the cases.