Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Many reinforcement learning methods do not take into consideration the safety of decisions made by agents. In fact, regardless of many successful applications in research and industrial area, it is still necessary to make sure that agent decisions are safe. The traditional approaches to address the safety problems mainly include changing the objective function, changing the exploration process of agents and so on, which, however, neglect the possible grave consequences caused by unsafety decisions and, as a result, cannot effectively solve the problem. To address the issue, a safe Sarsa(λ) and a safe Sarsa method, based on the constrained Markov decision processes, are proposed by imposing safety constraints to the action space. During the solution process, the agent should not only seek to get the maximum state-action value, but also satisfy the safety constraints, so as to obtain an optimal safety strategy. Since the standard reinforcement learning methods are no longer suitable for solving the safe Sarsa(λ) and safe Sarsa model, in order to obtain the global optimal state-action value function under the constrained conditions, a solution model of safe reinforcement learning is also introduced. Such model is based on linearized multidimensional constraints and adopts the Lagrange multiplier method to transform safe reinforcement learning model into a convex model provided that the objective and constraint functions are differentiable. The proposed solution algorithm guides the agent away from a local optimal and improves the solution efficiency and precision. The feasibility of the algorithm is proved. Finally, the effectiveness of the algorithm is verified by experiments.

    Reference
    Related
    Cited by
Get Citation

朱斐,葛洋洋,凌兴宏,刘全.基于受限MDP的无模型安全强化学习方法.软件学报,2022,33(8):3086-3102

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 30,2019
  • Revised:September 08,2020
  • Adopted:
  • Online: August 13,2022
  • Published: August 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063