Abstract:With the development of video acquisition and transmission technologies, and the widespread applications of mobile terminal devices, more and more set-based images are available. The key issue of image set classification is how to measure the distance between two sets over the complexity of inner structure of the set. To address this problem, this paper presents a framework, called double sparse regularizations for image set distance learning (DSRID). In DSRID, the distance between two sets is calculated by the distance between two prominent sub-structures in each set, which enhances the robustness and discrimination of the measure. According to different set representations, this framework is implemented in traditional Euclidean space and two common manifolds, i.e., symmetric positive definite matrices manifold (SPD manifold) and Grassmann manifold. Extensive experiments demonstrate the effectiveness of the proposed method on set-based face recognition, action recognition and object categorization.