安全强化学习算法及其在CPS智能控制中的应用
作者:
作者单位:

作者简介:

赵恒军(1985-),男,博士,讲师,CCF专业会员,主要研究领域为信息物理系统,形式化方法;
曾霞(1987-),女,博士,讲师,主要研究领域为信息物理系统,数值符号计算;
李权忠(1995-),男,硕士生,主要研究领域为强化学习,智能控制;
刘志明(1961-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为软件理论与方法.

通讯作者:

曾霞,E-mail:xzeng0712@swu.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61902325,62032019,61972385,61732019,61702425);西南大学国家人才建设项目(SWU116007)


Safe Reinforcement Learning Algorithm and Its Application in Intelligent Control for CPS
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    信息物理系统(cyber-physical system,CPS)的安全控制器设计是一个热门研究方向,现有基于形式化方法的安全控制器设计存在过度依赖模型、可扩展性差等问题.基于深度强化学习的智能控制可处理高维非线性复杂系统和不确定性系统,正成为非常有前景的CPS控制技术,但是缺乏对安全性的保障.针对强化学习控制在安全性方面的不足,围绕一个工业油泵控制系统典型案例,开展安全强化学习算法和智能控制应用研究.首先,形式化了工业油泵控制的安全强化学习问题,搭建了工业油泵仿真环境;随后,通过设计输出层结构和激活函数,构造了神经网络形式的油泵控制器,使得油泵开关时间的线性不等式约束得到满足;最后,为了更好地权衡安全性和最优性控制目标,基于增广拉格朗日乘子法设计实现了新型安全强化学习算法.在工业油泵案例上的对比实验表明,该算法生成的控制器在安全性和最优性上均超越了现有同类算法.在进一步评估中,所生成神经网络控制器以90%的概率通过了严格形式化验证;同时,与理论最优控制器相比实现了低至2%的最优目标值损失.所提方法有望推广至更多应用场景,实例研究的方案有望为安全智能控制和形式化验证领域其他学者提供借鉴.

    Abstract:

    The problem of safe controller design for cyber-physical systems (CPS) is a hot research topic. The existing safe controller design based on formal methods has problems such as excessive reliance on system models and poor scalability. Intelligent control based on deep reinforcement learning can handle high-dimensional nonlinear complex systems and uncertain systems, and is becoming a very promising CPS control technology, but it lacks safety guarantees. This study addresses the safety issues of reinforcement learning control by focusing on a case study of a typical industrial oil pump control system, and carries out research in designing new safe reinforcement learning algorithm and applying the algorithm in intelligent control scenario. First, the safe reinforcement learning problem of the industrial oil pump is formulated, and simulation environment of the oil pump is built. Then, by designing the structure and activation function of the output layer, the neural network type oil pump controller is constructed to satisfy the linear inequality constraints of the oil pump switching time. Finally, in order to better balance the safety and optimality control objectives, a new safe reinforcement learning algorithm is designed based on the augmented Lagrange multiplier method. Comparative experiment on the industrial oil pump shows that the controller generated by the proposed algorithm surpasses existing algorithms in the same category, both in safety and optimality. In further evaluation, the neural network controllers generated in this study pass rigorous formal verification with probability of 90%. Meanwhile, compared with the theoretically optimal controller, neural network controllers achieve a loss of optimal objective value as low as 2%. The method proposed in this study is expected to be extended to more application scenarios, and the case study scheme is expected to be referenced by other researchers in the field of intelligent control and formal verification.

    参考文献
    相似文献
    引证文献
引用本文

赵恒军,李权忠,曾霞,刘志明.安全强化学习算法及其在CPS智能控制中的应用.软件学报,2022,33(7):2538-2561

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-09-05
  • 最后修改日期:2021-10-14
  • 录用日期:
  • 在线发布日期: 2022-01-28
  • 出版日期: 2022-07-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号