Abstract:Object tracking is one of the most important tasks in numerous applications of computer vision. It is challenging as target objects often undergo significant appearance changes caused by deformation, abrupt motion, background clutter and occlusion. Therefore, it is important to build a robust object appearance model for visual tracking. Discriminative correlation filters (DCF) with deep convolutional features have achieved favorable performance in recent tracking benchmarks. The object in each frame can be detected by corresponding response map, which means the desired response map should get a highest value at the location of the object. In this scenario, considering the continuous characteristics of the response values, it can be naturally formulated as a continuous conditional random field (CRF) learning problem. Moreover, the integral of the partition function can be calculated in a closed form so that the log-likelihood maximization can be exactly solved. Therefore, here a conditional random field based robust object tracking algorithm is proposed to improve deep correlation filters, and an end-to-end deep convolutional neural network is designed for estimating response maps from input images by integrating the unary and pairwise potentials of continuous CRF into a tracking model. With the combination between the initial response map and similarity matrix which are obtained through the unary and pairwise potentials respectively, a smoother and more accurate response map can be achieved, which improves the tracking robustness. The proposed approach against 9 state-of-the-art trackers on OTB-2013 and OTB-2015 benchmarks are evaluated. The extensive experiments demonstrate that the proposed algorithm is 3% and 3.5% higher than the baseline methods in success plot, and is 6.1% and 4.8% higher than the baseline ones in precision plot on OTB-2013 and OTB-2015 benchmarks respectively.