Point-Based Online Value Iteration Algorithm for POMDPs

doi:10.3724/SP.J.1001.2013.04258

微信服务号

微信订阅号

2025-4-24- 19

Home > Archive>Volume 24, Issue 1, 2013 >25-36. DOI:10.3724/SP.J.1001.2013.04258

PDF HTML XML Export Cite reminder

Point-Based Online Value Iteration Algorithm for POMDPs
DOI:
                        10.3724/SP.J.1001.2013.04258
                    
Author:
                        WU BoWU Bo
School of Information Science and Engineering, Central South University, Changsha 410083, China;Hunan Engineering Laboratory for Advanced Control and Intelligent Automation, Changsha 410083, China;Education Technology and Information Center, Shenzhen Polytechnic, Shenzhen 518055, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU MinWU Min
School of Information Science and Engineering, Central South University, Changsha 410083, China;Hunan Engineering Laboratory for Advanced Control and Intelligent Automation, Changsha 410083, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHE Jin-HuaSHE Jin-Hua
School of Computer Science, Tokyo University of Technology, Tokyo 192-0982, Japan
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Partially observable Markov decision processes (POMDPs) provide a rich framework for sequential decision-making in stochastic domains of uncertainty. However, solving POMDPs is typically computationally intractable because the belief states of POMDPs have two curses: Dimensionality and history, and online algorithms that can not simultaneously satisfy the requirement of low errors and high timeliness. In order to address these problems, this paper proposes a point-based online value iteration (PBOVI) algorithm for POMDPs. This algorithm for speeding up POMDPs solving involves performing value backup at specific reachable belief points, rather than over the entire a belief simplex. The paper exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online and proposes a novel idea to reuse the belief states that have been computed last time to avoid repeated computation. The experiment and simulation results show that the proposed algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies, so it can meet the requirement of a real-time system.

Key words:POMDPs;belief state;point-based algorithm;online algorithm;AND/OR tree

Get Citation

仵博,吴敏,佘锦华.基于点的POMDPs在线值迭代算法.软件学报,2013,24(1):25-36

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:February 03,2012
Revised:May 18,2012
Adopted:
Online: December 29,2012
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History