Abstract:This study proposes structure-motivated interactive deep reinforcement learning (SMILE) method to solve the problems of low training efficiency and inexplicable strategy of deep reinforcement learning (DRL) in high-dimensional robot behavior control. First, the high-dimensional single robot control problem is transformed into a low-dimensional multi-controllers control problem according to some structural decomposition schemes, so as to solve the curse of dimensionality in continuous control. In addition, SMILE dynamically outputs the dependency among the controllers through two coordination graph (CG) models, ATTENTION and PODT, in order to realize the information exchange and coordinated learning among the internal joints of the robot. In order to balance the computational complexity and information redundancy of the above two CG models, two different models, APODT and PATTENTION, are then proposed to update the CG, which can realize the dynamic adaptation between the short-term dependency and long-term dependency among the controllers. The experimental results show that this kind of structurally decomposed learning can improve the learning efficiency substantially, and more explicit interpretations of the final learned policy can be achieved through the relational inference and coordinated learning among the components of a robot.