Planning Network Model Based on Generalized Asynchronous Value Iteration
Author:
Affiliation:

Clc Number:

TP181

Fund Project:

National Natural Science Foundation of China (61876119); Natural Science Foundation of Jiangsu Province (BK20181432); Fundamental Research Funds for the Central Universities (022114380010)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    In recent years, how to generate policies with generalization abilities has become one of the hot issues in the field of deep reinforcement learning, and many related research achievements have appeared. One representative work among them is generalized value iteration network (GVIN). GVIN is a differential planning network that uses a special graph convolution operator to approximately represent a state-transition matrix, and uses the value iteration (VI) process to perform planning during the learning of structure information in irregular graphs, resulting in policies with generalization abilities. In GVIN, each round of VI involves performing value updates synchronously at all states over the entire state space. Since there is no consideration about how to rationally allocate the planning time according to the importance of states, synchronous updates may degrade the planning performance of network when the state space is large. This work applies the idea of asynchronous update to further study GVIN. By defining the priority of each state and performing asynchronous VI, a planning network is proposed, it is called generalized asynchronous value iteration network (GAVIN). In unknown tasks with irregular graph structure, compared with GVIN, GAVIN has a more efficient and effective planning process. Furthermore, this work improves the reinforcement learning algorithm and the graph convolutional operator in GVIN, and their effectiveness are verified by path planning experiments in irregular graphs and real maps.

    Reference
    Related
    Cited by
Get Citation

陈子璇,章宗长,潘致远,张琳婧.一种基于广义异步值迭代的规划网络模型.软件学报,2021,32(11):3496-3511

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 12,2019
  • Revised:March 17,2020
  • Adopted:
  • Online: November 05,2021
  • Published: November 06,2021
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063