Abstract:Image aesthetic assessment and emotional analysis aim to enable computers to identify the aesthetic and emotional responses of human beings caused by visual stimulations, respectively. Existing research usually treats them as two independent tasks. However, people’s aesthetic and emotional responses do not appear in isolation. On the contrary, from the perspective of psychological cognition, the two responses are interrelated and mutually influenced. Therefore, this study follows the idea of deep multi-task learning to deal with image aesthetic assessment and emotional analysis under a unified framework and explore their relationship. Specifically, a novel adaptive feature interaction module is proposed to correlate the backbone networks of the two tasks and achieve a unified prediction. In addition, a dynamic feature interaction mechanism is introduced to adaptively determine the degree of feature interaction between the tasks according to the feature dependencies. As the multi-task network updates structural parameters, the study, based on the inconsistency in complexity and convergence speed between the two tasks, proposes a novel gradient balancing strategy to ensure that the network parameters of each task can be smoothly learned under the unified prediction framework. Furthermore, the study constructs a large-scale unified image aesthetic and emotional dataset–UAE. According to the study, UAE is the first image collection containing both aesthetic and emotional labels. Finally, the model and codes of the proposed method as well as the UAE dataset have been released at https://github.com/zhenshen-mla/Aesthetic-Emotion-Dataset.