Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes

doi:10.13328/j.cnki.jos.006318

微信服务号

微信订阅号

2025-4-7- 2

Home > Archive>Volume 33, Issue 8, 2022 >3086-3102. DOI:10.13328/j.cnki.jos.006318

PDF HTML XML Export Cite reminder

Model-free Safe Reinforcement Learning Method Based on Constrained Markov Decision Processes
DOI:
                        10.13328/j.cnki.jos.006318
                    
Author:
                        ZHU FeiZHU Fei
School of Computer Science and Technology, Soochow University, Suzhou 215006, China;Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China;Provincial Key Laboratory for Computer Information Processing Technology (Soochow University), Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GE Yang-YangGE Yang-Yang
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LING Xing-HongLING Xing-Hong
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LIU QuanLIU Quan
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Many reinforcement learning methods do not take into consideration the safety of decisions made by agents. In fact, regardless of many successful applications in research and industrial area, it is still necessary to make sure that agent decisions are safe. The traditional approaches to address the safety problems mainly include changing the objective function, changing the exploration process of agents and so on, which, however, neglect the possible grave consequences caused by unsafety decisions and, as a result, cannot effectively solve the problem. To address the issue, a safe Sarsa(λ) and a safe Sarsa method, based on the constrained Markov decision processes, are proposed by imposing safety constraints to the action space. During the solution process, the agent should not only seek to get the maximum state-action value, but also satisfy the safety constraints, so as to obtain an optimal safety strategy. Since the standard reinforcement learning methods are no longer suitable for solving the safe Sarsa(λ) and safe Sarsa model, in order to obtain the global optimal state-action value function under the constrained conditions, a solution model of safe reinforcement learning is also introduced. Such model is based on linearized multidimensional constraints and adopts the Lagrange multiplier method to transform safe reinforcement learning model into a convex model provided that the objective and constraint functions are differentiable. The proposed solution algorithm guides the agent away from a local optimal and improves the solution efficiency and precision. The feasibility of the algorithm is proved. Finally, the effectiveness of the algorithm is verified by experiments.

Key words:constrained Markov decision processes;safe reinforcement learning;multiple constraints;Sarsa(λ) algorithm;Sarsa algorithm

Get Citation

朱斐,葛洋洋,凌兴宏,刘全.基于受限MDP的无模型安全强化学习方法.软件学报,2022,33(8):3086-3102

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 30,2019
Revised:September 08,2020
Adopted:
Online: August 13,2022
Published: August 06,2022

You are the first2033441Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History