基于点的POMDPs在线值迭代算法

doi:10.3724/SP.J.1001.2013.04258

微信服务号

微信订阅号

首页 > 过刊浏览>2013年第24卷第1期 >25-36. DOI:10.3724/SP.J.1001.2013.04258

PDF HTML阅读 XML下载导出引用引用提醒

基于点的POMDPs在线值迭代算法
DOI:
                        10.3724/SP.J.1001.2013.04258
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61074058, 60874042); 国家教育部博士点基金(20090162120068)

Point-Based Online Value Iteration Algorithm for POMDPs

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

部分可观察马尔可夫决策过程(partially observable Markov decision processes,简称POMDPs)是动态不确定环境下序贯决策的理想模型,但是现有离线算法陷入信念状态“维数灾”和“历史灾”问题,而现有在线算法无法同时满足低误差与高实时性的要求,造成理想的POMDPs模型无法在实际工程中得到应用.对此,提出一种基于点的POMDPs在线值迭代算法(point-based online value iteration,简称PBOVI).该算法在给定的可达信念状态点上进行更新操作,避免对整个信念状态空间单纯体进行求解,加速问题求解;采用分支界限裁剪方法对信念状态与或树进行在线裁剪;提出信念状态结点重用思想,重用上一时刻已求解出的信念状态点,避免重复计算.实验结果表明,该算法具有较低误差率、较快收敛性,满足系统实时性的要求.

Abstract:

Partially observable Markov decision processes (POMDPs) provide a rich framework for sequential decision-making in stochastic domains of uncertainty. However, solving POMDPs is typically computationally intractable because the belief states of POMDPs have two curses: Dimensionality and history, and online algorithms that can not simultaneously satisfy the requirement of low errors and high timeliness. In order to address these problems, this paper proposes a point-based online value iteration (PBOVI) algorithm for POMDPs. This algorithm for speeding up POMDPs solving involves performing value backup at specific reachable belief points, rather than over the entire a belief simplex. The paper exploits branch-and-bound pruning approach to prune the AND/OR tree of belief states online and proposes a novel idea to reuse the belief states that have been computed last time to avoid repeated computation. The experiment and simulation results show that the proposed algorithm has its effectiveness in reducing the cost of computing policies and retaining the quality of the policies, so it can meet the requirement of a real-time system.

参考文献

相似文献

引证文献

引用本文

仵博,吴敏,佘锦华.基于点的POMDPs在线值迭代算法.软件学报,2013,24(1):25-36

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2012-02-03
最后修改日期:2012-05-18
录用日期:
在线发布日期: 2012-12-29
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码