一种基于特特征向量提取的FMDP模型求解方法

微信服务号

微信订阅号

首页 > 过刊浏览>2005年第16卷第5期 >733-743

一种基于特特征向量提取的FMDP模型求解方法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60173011(国家自然科学基金);the National High-Tech Research and Development Plan of China under Grant Nos.863-317-01-04-99,2001AA113120(国家高技术研究发展计划(863))

An Efficient Solution Algorithm for Factored MDP Using Feature Vector Extraction

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在诸如机器人足球赛等典型的可分解马尔可夫决策过程(factored Markov decision process,简称FMDP)模型中,不同状态属性在不同的状态下,对于状态评估的影响程度是不同的,其中存在若干关键状态属性,能够唯一或近似判断当前状态的好坏.为了解决FMDP模型中普遍存在的"维数灾"问题,在效用函数非线性的情况下,通过对状态特征向量的提取近似状态效用函数,同时根据对FMDP模型的认知程度,从线性规划和再励学习两种求解角度分别进行约束不等式组的化简和状态效用函数的高维移植,从而达到降低计算复杂度,加快联合策略生成速度的目的.以机器人足球赛任意球战术配合为背景进行实验来验证基于状态特征向量的再励学习算法的有效性和学习结果的可移植性.与传统再励学习算法相比,基于状态特征向量的再励学习算法能够极大地加快策略的学习速度.但更重要的是,还可以将学习到的状态效用函数方便地移植到更高维的FMDP模型中,从而直接计算出联合策略而不需要重新进行学习.

Abstract:

In factored Markov decision process (FMDP) such as Robocup system, the effect to value evaluation of various states is different from each other within state attributes. There are some important state attributes that can determine the whole state value either uniquely, or at least, approximately. Instead of using the relevance among states to reduce the state space, this paper addresses the problem of curse of dimensionality in large FMDP by approximating state value function through feature vector extraction. A key contribution of this paper is that it reduces the computation complexity by constraints reduction in linear programming, speeds up the production of joint strategy by transplanting the value function to the more complex game in reinforcement learning. Experimental results are provided on Robocup free kick, demonstrating a promising indication of the efficiency of the approach and its’ ability of transplanting the learning result. Comparing this algorithm to an existing state-of-the-art approach indicates that it can not only improve the learning speed, but also can transplant state value function to the Robocup with more players instead of learning again.

参考文献

相似文献

引证文献

引用本文

张双民,石纯一.一种基于特特征向量提取的FMDP模型求解方法.软件学报,2005,16(5):733-743

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2004-02-25
最后修改日期:2004-05-08
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史