Planning Network Model Based on Generalized Asynchronous Value Iteration

doi:10.13328/j.cnki.jos.006077

微信服务号

微信订阅号

2025-6-5- 9

Home > Archive>Volume 32, Issue 11, 2021 >3496-3511. DOI:10.13328/j.cnki.jos.006077

PDF HTML XML Export Cite reminder

Planning Network Model Based on Generalized Asynchronous Value Iteration
DOI:
                        10.13328/j.cnki.jos.006077
                    
Author:
                        CHEN Zi-XuanCHEN Zi-Xuan
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Zong-ZhangZHANG Zong-Zhang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
PAN Zhi-YuanPAN Zhi-Yuan
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Lin-JingZHANG Lin-Jing
School of Computer Science and Technology, Soochow University, Suzhou 215006, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP181
Fund Project:National Natural Science Foundation of China (61876119); Natural Science Foundation of Jiangsu Province (BK20181432); Fundamental Research Funds for the Central Universities (022114380010)

Article

Figures

Metrics

Reference

Related [20]

Cited by

Materials

Comments

Abstract:

In recent years, how to generate policies with generalization abilities has become one of the hot issues in the field of deep reinforcement learning, and many related research achievements have appeared. One representative work among them is generalized value iteration network (GVIN). GVIN is a differential planning network that uses a special graph convolution operator to approximately represent a state-transition matrix, and uses the value iteration (VI) process to perform planning during the learning of structure information in irregular graphs, resulting in policies with generalization abilities. In GVIN, each round of VI involves performing value updates synchronously at all states over the entire state space. Since there is no consideration about how to rationally allocate the planning time according to the importance of states, synchronous updates may degrade the planning performance of network when the state space is large. This work applies the idea of asynchronous update to further study GVIN. By defining the priority of each state and performing asynchronous VI, a planning network is proposed, it is called generalized asynchronous value iteration network (GAVIN). In unknown tasks with irregular graph structure, compared with GVIN, GAVIN has a more efficient and effective planning process. Furthermore, this work improves the reinforcement learning algorithm and the graph convolutional operator in GVIN, and their effectiveness are verified by path planning experiments in irregular graphs and real maps.

Key words:deep learning;reinforcement learning;imitation learning;planning;asynchronous update

Get Citation

陈子璇,章宗长,潘致远,张琳婧.一种基于广义异步值迭代的规划网络模型.软件学报,2021,32(11):3496-3511

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:November 12,2019
Revised:March 17,2020
Adopted:
Online: November 05,2021
Published: November 06,2021

You are the first2051323Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History