Abstract:This paper proposes a visual word soft-histogram for image representation based on statistical modeling and discriminative learning of visual words. This type of learning uses Gaussian mixture models (GMM) to reflect the appearance variation of each visual word and employs the max-min posterior pseudo-probabilities discriminative learning method to estimate GMMs of visual words. The similarities between each visual word and corresponding local features are computed, summed, and normalized to construct a soft-histogram. This paper also discusses the implementation of two representation methods. The first one is called classification-based soft histogram, in which each local feature is assigned to only one visual word with maximum similarity. The second one is called completely soft histogram, in which each local feature is assigned to all the visual words. The experimental results of Caltech-4 and PASCAL VOC 2006 confirm the effectiveness of this method.