Causal Spatiotemporal Semantic-Driven Deep Reinforcement Learning Abstraction Modeling Method
Author:
Affiliation:

Clc Number:

TP311

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the rapid development of Intelligent Cyber-Physical Systems (ICPS), intelligent technologies are increasingly being applied in intelligent components such as perception, decision-making, and control. Among these, deep reinforcement learning (DRL) has been widely used in the control components of ICPS due to its efficiency in handling complex dynamic environments. However, the openness of the operating environment and the complexity of ICPS require DRL to explore a highly dynamic state space during the learning process, which can lead to inefficiencies and inadequate generalization in decision-making. A common solution to this problem is to abstract a large-scale fine-grained Markov Decision Process (MDP) into a smaller-scale coarse-grained MDP, thereby simplifying the model’s computational complexity and improving the solution efficiency. However, these methods have yet to address how to ensure the semantic consistency between the temporal-spatial semantic information of the original states, the clustered abstract system space, and the real system space. To solve the above problems, this paper proposes a causal temporal-spatial semantic-based abstraction modeling method for deep reinforcement learning. First, causal temporal-spatial semantics reflecting the distribution of value changes over time and space are introduced, and based on this, a two-stage semantic abstraction is performed on the states to construct an abstract MDP model for the deep reinforcement learning process. Next, abstraction optimization techniques are employed to refine the abstract model, reducing the semantic errors between the abstract states and the corresponding specific states. Finally, extensive experiments were conducted using cases such as lane-keeping, adaptive cruise control, and intersection crossing, and the model was evaluated and analyzed with the PRISM verifier. The results demonstrate that our proposed abstraction modeling technique performs well in terms of the model’s abstraction capability, accuracy, and semantic equivalence.

    Reference
    Related
    Cited by
Get Citation

田丽丽,杜德慧,聂基辉,陈逸康,李荥达.因果时空语义驱动的深度强化学习抽象建模方法.软件学报,2025,36(8):0

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 26,2024
  • Revised:October 14,2024
  • Adopted:
  • Online: December 10,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063