Abstract:Humans have ability to quickly select a subset of the visual input and allocate processing resources to those visually important regions. In computer vision community, understanding and emulating such attention mechanism of the human visual system has attracted much attention from the researchers and shown a wide range of applications. More recently, with the ever increasing computational power and availability of large-scale saliency datasets, deep learning has become a popular tool for modeling visual attention. This review includes the recent advances in visual attention modeling, including fixation prediction and salient object detection. It also discusses popular visual attention benchmarks and various evaluation metrics. The emphasis of this review is both on the deep learning based studies and the represented non-deep learning models. Extensive experiments are also performed on various benchmarks for evaluating the performance of those visual attention models. In the end, the review highlights current research trends and provides insight into the future direction.