因果时空语义驱动的深度强化学习抽象建模方法

doi:10.13328/j.cnki.jos.007354

微信服务号

微信订阅号

2025年5月6日 12:54 星期二

首页 > 过刊浏览>2025年第36卷第8期 >1-18. DOI:10.13328/j.cnki.jos.007354

PDF HTML阅读 XML下载导出引用引用提醒

因果时空语义驱动的深度强化学习抽象建模方法
DOI:
                        10.13328/j.cnki.jos.007354
                    
CSTR:
                        
                    
作者:
                        田丽丽田丽丽
华东师范大学 计算机科学与技术学院, 上海200062;华东师范大学 智能教育研究院, 上海 200062;上海市高可信计算重点实验室(华东师范大学), 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
杜德慧杜德慧
华东师范大学 软件工程学院, 上海200062;华东师范大学 智能教育研究院, 上海 200062;上海市高可信计算重点实验室(华东师范大学), 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
聂基辉聂基辉
华东师范大学 软件工程学院, 上海200062;上海市高可信计算重点实验室(华东师范大学), 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
陈逸康陈逸康
华东师范大学 软件工程学院, 上海200062;上海市高可信计算重点实验室(华东师范大学), 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
李荥达李荥达
华东师范大学 软件工程学院, 上海200062;上海市高可信计算重点实验室(华东师范大学), 上海 200062
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:杜德慧,E-mail:dhdu@sei.ecnu.edu.cn
中图分类号:TP18
基金项目:

Causal-spatiotemporal-semantics-driven Abstraction Modeling Method for Deep Reinforcement Learning

Author:

TIAN Li-Li
TIAN Li-Li
School of Computer Science and Technology, East China Normal University, Shanghai 200062, China;Institute of AI Education, East China Normal University, Shanghai 200062, China;Shanghai Key Laboratory of Trustworthy Computing (East China Normal University), Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
DU De-Hui
DU De-Hui
Software Engineering Institute, East China Normal University, Shanghai 200062, China;Institute of AI Education, East China Normal University, Shanghai 200062, China;Shanghai Key Laboratory of Trustworthy Computing (East China Normal University), Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
NIE Ji-Hui
NIE Ji-Hui
Software Engineering Institute, East China Normal University, Shanghai 200062, China;Shanghai Key Laboratory of Trustworthy Computing (East China Normal University), Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Yi-Kang
CHEN Yi-Kang
Software Engineering Institute, East China Normal University, Shanghai 200062, China;Shanghai Key Laboratory of Trustworthy Computing (East China Normal University), Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
LI Ying-Da
LI Ying-Da
Software Engineering Institute, East China Normal University, Shanghai 200062, China;Shanghai Key Laboratory of Trustworthy Computing (East China Normal University), Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着智能信息物理融合系统(intelligent cyber-physical system, ICPS)的快速发展, 智能技术在感知、决策、规控等方面的应用日益广泛. 其中, 深度强化学习因其在处理复杂的动态环境方面的高效性, 已被广泛用于ICPS的控制组件中. 然而, 由于运行环境的开放性和ICPS系统的复杂性, 深度强化学习在学习过程中需要对复杂多变的状态空间进行探索, 这极易导致决策生成时效率低下和泛化性不足等问题. 目前对于该问题的常见解决方法是将大规模的细粒度马尔可夫决策过程(Markov decision process, MDP)抽象为小规模的粗粒度马尔可夫决策过程, 从而简化模型的计算复杂度并提高求解效率. 但这些方法尚未考虑如何保证原状态的时空语义信息、聚类抽象的系统空间和真实系统空间之间的语义一致性问题. 针对以上问题, 提出基于因果时空语义的深度强化学习抽象建模方法. 首先, 提出反映时间和空间价值变化分布的因果时空语义, 并在此基础上对状态进行双阶段语义抽象以构建深度强化学习过程的抽象马尔可夫模型; 其次, 结合抽象优化技术对抽象模型进行调优, 以减少抽象状态与相应具体状态之间的语义误差; 最后, 结合车道保持、自适应巡航、交叉路口会车等案例进行了大量的实验, 并使用验证器PRISM对模型进行评估分析, 结果表明所提出的抽象建模技术在模型的抽象表达能力、准确性及语义等价性方面具有较好的效果.

关键词:深度强化学习;抽象建模;因果时空语义;智能信息物理融合系统(ICPS);马尔可夫决策过程(MDP)

Abstract:

With the rapid advancement of intelligent cyber-physical system (ICPS), intelligent technologies are increasingly utilized in components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has gained wide application in ICPS control components due to its effectiveness in managing complex and dynamic environments. However, the openness of the operating environment and the inherent complexity of ICPS necessitate the exploration of highly dynamic state spaces during the learning process. This often results in inefficiencies and poor generalization in decision-making. A common approach to address these issues is to abstract large-scale, fine-grained Markov decision processes (MDPs) into smaller-scale, coarse-grained MDPs, thus reducing computational complexity and enhancing solution efficiency. Nonetheless, existing methods fail to adequately ensure consistency between the spatiotemporal semantics of the original states, the abstracted system space, and the real system space. To address these challenges, this study proposes a causal spatiotemporal semantic-driven abstraction modeling method for deep reinforcement learning. First, causal spatiotemporal semantics are introduced to capture the distribution of value changes across time and space. Based on these semantics, a two-stage semantic abstraction process is applied to the states, constructing an abstract MDP model for the deep reinforcement learning process. Subsequently, abstraction optimization techniques are employed to fine-tune the abstract model, minimizing semantic discrepancies between the abstract states and their corresponding detailed states. Finally, extensive experiments are conducted on scenarios including lane-keeping, adaptive cruise control, and intersection crossing. The proposed model is evaluated and analyzed using the PRISM verifier. The results indicate that the proposed abstraction modeling technique demonstrates superior performance in abstraction expressiveness, accuracy, and semantic equivalence.

Key words:deep reinforcement learning (DRL);abstraction modeling;causal spatiotemporal semantics;intelligent cyber-physical system (ICPS);Markov decision process (MDP)

引用本文

田丽丽,杜德慧,聂基辉,陈逸康,李荥达.因果时空语义驱动的深度强化学习抽象建模方法.软件学报,2025,36(8):1-18

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-08-26
最后修改日期:2024-10-14
录用日期:
在线发布日期: 2024-12-10
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码