因果时空语义驱动的深度强化学习抽象建模方法
作者:
通讯作者:

杜德慧,E-mail:dhdu@sei.ecnu.edu.cn

中图分类号:

TP18


Causal-spatiotemporal-semantics-driven Abstraction Modeling Method for Deep Reinforcement Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    随着智能信息物理融合系统(intelligent cyber-physical system, ICPS)的快速发展, 智能技术在感知、决策、规控等方面的应用日益广泛. 其中, 深度强化学习因其在处理复杂的动态环境方面的高效性, 已被广泛用于ICPS的控制组件中. 然而, 由于运行环境的开放性和ICPS系统的复杂性, 深度强化学习在学习过程中需要对复杂多变的状态空间进行探索, 这极易导致决策生成时效率低下和泛化性不足等问题. 目前对于该问题的常见解决方法是将大规模的细粒度马尔可夫决策过程(Markov decision process, MDP)抽象为小规模的粗粒度马尔可夫决策过程, 从而简化模型的计算复杂度并提高求解效率. 但这些方法尚未考虑如何保证原状态的时空语义信息、聚类抽象的系统空间和真实系统空间之间的语义一致性问题. 针对以上问题, 提出基于因果时空语义的深度强化学习抽象建模方法. 首先, 提出反映时间和空间价值变化分布的因果时空语义, 并在此基础上对状态进行双阶段语义抽象以构建深度强化学习过程的抽象马尔可夫模型; 其次, 结合抽象优化技术对抽象模型进行调优, 以减少抽象状态与相应具体状态之间的语义误差; 最后, 结合车道保持、自适应巡航、交叉路口会车等案例进行了大量的实验, 并使用验证器PRISM对模型进行评估分析, 结果表明所提出的抽象建模技术在模型的抽象表达能力、准确性及语义等价性方面具有较好的效果.

    Abstract:

    With the rapid advancement of intelligent cyber-physical system (ICPS), intelligent technologies are increasingly utilized in components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has gained wide application in ICPS control components due to its effectiveness in managing complex and dynamic environments. However, the openness of the operating environment and the inherent complexity of ICPS necessitate the exploration of highly dynamic state spaces during the learning process. This often results in inefficiencies and poor generalization in decision-making. A common approach to address these issues is to abstract large-scale, fine-grained Markov decision processes (MDPs) into smaller-scale, coarse-grained MDPs, thus reducing computational complexity and enhancing solution efficiency. Nonetheless, existing methods fail to adequately ensure consistency between the spatiotemporal semantics of the original states, the abstracted system space, and the real system space. To address these challenges, this study proposes a causal spatiotemporal semantic-driven abstraction modeling method for deep reinforcement learning. First, causal spatiotemporal semantics are introduced to capture the distribution of value changes across time and space. Based on these semantics, a two-stage semantic abstraction process is applied to the states, constructing an abstract MDP model for the deep reinforcement learning process. Subsequently, abstraction optimization techniques are employed to fine-tune the abstract model, minimizing semantic discrepancies between the abstract states and their corresponding detailed states. Finally, extensive experiments are conducted on scenarios including lane-keeping, adaptive cruise control, and intersection crossing. The proposed model is evaluated and analyzed using the PRISM verifier. The results indicate that the proposed abstraction modeling technique demonstrates superior performance in abstraction expressiveness, accuracy, and semantic equivalence.

    参考文献
    相似文献
    引证文献
引用本文

田丽丽,杜德慧,聂基辉,陈逸康,李荥达.因果时空语义驱动的深度强化学习抽象建模方法.软件学报,2025,36(8):1-18

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-26
  • 最后修改日期:2024-10-14
  • 在线发布日期: 2024-12-10
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号