Abstract:Many reinforcement learning methods do not take into consideration the safety of decisions made by agents. In fact, regardless of many successful applications in research and industrial area, it is still necessary to make sure that agent decisions are safe. The traditional approaches to address the safety problems mainly include changing the objective function, changing the exploration process of agents and so on, which, however, neglect the possible grave consequences caused by unsafety decisions and, as a result, cannot effectively solve the problem. To address the issue, a safe Sarsa(λ) and a safe Sarsa method, based on the constrained Markov decision processes, are proposed by imposing safety constraints to the action space. During the solution process, the agent should not only seek to get the maximum state-action value, but also satisfy the safety constraints, so as to obtain an optimal safety strategy. Since the standard reinforcement learning methods are no longer suitable for solving the safe Sarsa(λ) and safe Sarsa model, in order to obtain the global optimal state-action value function under the constrained conditions, a solution model of safe reinforcement learning is also introduced. Such model is based on linearized multidimensional constraints and adopts the Lagrange multiplier method to transform safe reinforcement learning model into a convex model provided that the objective and constraint functions are differentiable. The proposed solution algorithm guides the agent away from a local optimal and improves the solution efficiency and precision. The feasibility of the algorithm is proved. Finally, the effectiveness of the algorithm is verified by experiments.