Abstract:It is very important to select the most suitable motion words from surrounding text to describe the persons’ motion expressed in images during semantic understanding. Traditional approaches often learn a generative model to denote the occurrence probability between visual objects & motion and their corresponding annotated tags, and the learned model is then utilized to recognize persons’ actions in a new image outside training samples. However, all of existing approaches neglect the grouping effect of high-dimensional heterogeneous features inherent in images. In fact, different kinds of heterogeneous features have different intrinsic discriminative power for image understanding. For instance, the features extracted from arms are most discriminative to human waving motion. The selection of groups of discriminative features for motion recognition is hence crucial. In this paper, we propose an approach to select discriminative subgroup visual features from high-dimensional pose features by Group LASSO during the learning of generative model in order to boost the motion recognition. Experiments show that the proposed approach in this paper can obtain better performance for the recognition of motions with large pose change.