因果时空语义驱动的深度强化学习抽象建模方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

杜德慧,E-mail:dhdu@sei.ecnu.edu.cn

中图分类号:

TP311

基金项目:


Causal Spatiotemporal Semantic-Driven Deep Reinforcement Learning Abstraction Modeling Method
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着智能信息物理融合系统(Intelligent Cyber-physical System, ICPS)的快速发展,智能技术在感知、决策、规控等方面的应用日益广泛.其中,深度强化学习因其在处理复杂的动态环境方面的高效性,已被广泛用于ICPS的控制组件中.然而,由于运行环境的开放性和ICPS系统的复杂性,深度强化学习在学习过程中需要对复杂多变的状态空间进行探索,这极易导致决策生成时效率低下和泛化性不足等问题.目前对于该问题的常见解决方法是将大规模的细粒度马尔可夫决策过程(Markov Decision Processes, MDP)抽象为小规模的粗粒度马尔可夫决策过程,从而简化模型的计算复杂度并提高求解效率.但这些方法尚未考虑如何保证原状态的时空语义信息、聚类抽象的系统空间和真实系统空间之间的语义一致性问题.针对以上问题,本文提出基于因果时空语义的深度强化学习抽象建模方法.首先,提出反映时间和空间价值变化分布的因果时空语义,并在此基础上对状态进行双阶段语义抽象以构建深度强化学习过程的抽象马尔可夫模型;其次,结合抽象优化技术对抽象模型进行调优,以减少抽象状态与相应具体状态之间的语义误差;最后,结合车道保持、自适应巡航、交叉路口会车等案例进行了大量的实验,并使用验证器PRISM对模型进行评估分析,结果表明我们所提出的抽象建模技术在模型的抽象表达能力、准确性及语义等价性方面具有较好的效果.

    Abstract:

    With the rapid development of Intelligent Cyber-Physical Systems (ICPS), intelligent technologies are increasingly being applied in intelligent components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has been widely used in the control components of ICPS due to its efficiency in handling complex dynamic environments. However, the openness of the operating environment and the complexity of ICPS require DRL to explore a highly dynamic state space during the learning process, which can lead to inefficiencies and inadequate generalization in decision-making. A common solution to this problem is to abstract a large-scale fine-grained Markov Decision Process (MDP) into a smaller-scale coarse-grained MDP, thereby simplifying the model’s computational complexity and improving the solution efficiency. However, these methods have yet to address how to ensure the semantic consistency between the temporal-spatial semantic information of the original states, the clustered abstract system space, and the real system space. To solve the above problems, this paper proposes a causal temporal-spatial semantic-based abstraction modeling method for deep reinforcement learning. First, causal temporal-spatial semantics reflecting the distribution of value changes over time and space are introduced, and based on this, a two-stage semantic abstraction is performed on the states to construct an abstract MDP model for the deep reinforcement learning process. Next, abstraction optimization techniques are employed to refine the abstract model, reducing the semantic errors between the abstract states and the corresponding specific states. Finally, extensive experiments were conducted using cases such as lane-keeping, adaptive cruise control, and intersection crossing, and the model was evaluated and analyzed with the PRISM verifier. The results demonstrate that our proposed abstraction modeling technique performs well in terms of the model’s abstraction capability, accuracy, and semantic equivalence.

    参考文献
    相似文献
    引证文献
引用本文

田丽丽,杜德慧,聂基辉,陈逸康,李荥达.因果时空语义驱动的深度强化学习抽象建模方法.软件学报,2025,36(8):0

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-10-14
  • 录用日期:
  • 在线发布日期: 2024-12-10
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号