Abstract:In recent years, deep reinforcement learning has achieved impressive results in complex control tasks. However, its applicability to real-world problems has been seriously weakened by the high sensitivity of hyperparameters and the difficulty in guaranteeing convergence. Metaheuristic algorithms, as a class of black-box optimization methods simulating the objective laws of nature, can effectively avoid the sensitivity of hyperparameters. Nevertheless, they are still faced with various problems, such as the inability to adapt to a huge scale of parameters to be optimized and the low efficiency of sample usage. To address the above problems, this study proposes the twin delayed deep deterministic policy gradient based on a gravitational search algorithm (GSA-TD3). The method combines the advantages of the two types of algorithms. Specifically, it updates the policy by gradient optimization for higher sample efficiency and a faster learning speed. Moreover, it applies the population update method based on the law of gravity to the policy search process to make it more exploratory and stable. GSA-TD3 is further applied to a series of complex control tasks, and experiments show that it significantly out performs similar deep reinforcement learning methods at the forefront.