因果时空语义驱动的深度强化学习抽象建模方法

doi:10.13328/j.cnki.jos.007354

微信服务号

微信订阅号

2025年4月7日 13:00 星期一

首页 > 过刊浏览>2025年第36卷第8期 >0-0. DOI:10.13328/j.cnki.jos.007354

PDF HTML阅读 XML下载导出引用引用提醒

因果时空语义驱动的深度强化学习抽象建模方法
DOI:
                        10.13328/j.cnki.jos.007354
                    
CSTR:
                        
                    
作者:
                        田丽丽田丽丽
华东师范大学 计算机科学与技术学院, 上海 200062;华东师范大学 智能教育研究院, 上海 200062;上海市高可信重点实验室, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
杜德慧杜德慧
华东师范大学 软件工程学院, 上海 200062;华东师范大学 智能教育研究院, 上海 200062;上海市高可信重点实验室, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
聂基辉聂基辉
华东师范大学 软件工程学院, 上海 200062;上海市高可信重点实验室, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
陈逸康陈逸康
华东师范大学 软件工程学院, 上海 200062;上海市高可信重点实验室, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找
李荥达李荥达
华东师范大学 软件工程学院, 上海 200062;上海市高可信重点实验室, 上海 200062
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:杜德慧,E-mail:dhdu@sei.ecnu.edu.cn
中图分类号:TP311
基金项目:

Causal Spatiotemporal Semantic-Driven Deep Reinforcement Learning Abstraction Modeling Method

Author:

TIAN Li-Li
TIAN Li-Li
School of Computer Science and Technology, East China Normal University, Shanghai 200062, China;Shanghai Institute of Artificial Intelligence for Education, East China Normal University, Shanghai 200062, China;Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
DU De-Hui
DU De-Hui
School of Software and Engineering, East China Normal University, Shanghai 200062, China;Shanghai Institute of Artificial Intelligence for Education, East China Normal University, Shanghai 200062, China;Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
NIE Ji-Hui
NIE Ji-Hui
School of Software and Engineering, East China Normal University, Shanghai 200062, China;Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Yi-Kang
CHEN Yi-Kang
School of Software and Engineering, East China Normal University, Shanghai 200062, China;Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找
LI Ying-Da
LI Ying-Da
School of Software and Engineering, East China Normal University, Shanghai 200062, China;Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

随着智能信息物理融合系统(Intelligent Cyber-physical System, ICPS)的快速发展,智能技术在感知、决策、规控等方面的应用日益广泛.其中,深度强化学习因其在处理复杂的动态环境方面的高效性,已被广泛用于ICPS的控制组件中.然而,由于运行环境的开放性和ICPS系统的复杂性,深度强化学习在学习过程中需要对复杂多变的状态空间进行探索,这极易导致决策生成时效率低下和泛化性不足等问题.目前对于该问题的常见解决方法是将大规模的细粒度马尔可夫决策过程(Markov Decision Processes, MDP)抽象为小规模的粗粒度马尔可夫决策过程,从而简化模型的计算复杂度并提高求解效率.但这些方法尚未考虑如何保证原状态的时空语义信息、聚类抽象的系统空间和真实系统空间之间的语义一致性问题.针对以上问题,本文提出基于因果时空语义的深度强化学习抽象建模方法.首先,提出反映时间和空间价值变化分布的因果时空语义,并在此基础上对状态进行双阶段语义抽象以构建深度强化学习过程的抽象马尔可夫模型;其次,结合抽象优化技术对抽象模型进行调优,以减少抽象状态与相应具体状态之间的语义误差;最后,结合车道保持、自适应巡航、交叉路口会车等案例进行了大量的实验,并使用验证器PRISM对模型进行评估分析,结果表明我们所提出的抽象建模技术在模型的抽象表达能力、准确性及语义等价性方面具有较好的效果.

关键词:深度强化学习;抽象建模;因果时空语义;ICPS;MDP

Abstract:

With the rapid development of Intelligent Cyber-Physical Systems (ICPS), intelligent technologies are increasingly being applied in intelligent components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has been widely used in the control components of ICPS due to its efficiency in handling complex dynamic environments. However, the openness of the operating environment and the complexity of ICPS require DRL to explore a highly dynamic state space during the learning process, which can lead to inefficiencies and inadequate generalization in decision-making. A common solution to this problem is to abstract a large-scale fine-grained Markov Decision Process (MDP) into a smaller-scale coarse-grained MDP, thereby simplifying the model’s computational complexity and improving the solution efficiency. However, these methods have yet to address how to ensure the semantic consistency between the temporal-spatial semantic information of the original states, the clustered abstract system space, and the real system space. To solve the above problems, this paper proposes a causal temporal-spatial semantic-based abstraction modeling method for deep reinforcement learning. First, causal temporal-spatial semantics reflecting the distribution of value changes over time and space are introduced, and based on this, a two-stage semantic abstraction is performed on the states to construct an abstract MDP model for the deep reinforcement learning process. Next, abstraction optimization techniques are employed to refine the abstract model, reducing the semantic errors between the abstract states and the corresponding specific states. Finally, extensive experiments were conducted using cases such as lane-keeping, adaptive cruise control, and intersection crossing, and the model was evaluated and analyzed with the PRISM verifier. The results demonstrate that our proposed abstraction modeling technique performs well in terms of the model’s abstraction capability, accuracy, and semantic equivalence.

Key words:Deep Reinforcement Learning;Abstract Modeling;Causal Spatio-temporal Semantics;ICPS;MDP

引用本文

田丽丽,杜德慧,聂基辉,陈逸康,李荥达.因果时空语义驱动的深度强化学习抽象建模方法.软件学报,2025,36(8):0

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-08-26
最后修改日期:2024-10-14
录用日期:
在线发布日期: 2024-12-10
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码