一种基于特特征向量提取的FMDP模型求解方法
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60173011(国家自然科学基金);the National High-Tech Research and Development Plan of China under Grant Nos.863-317-01-04-99,2001AA113120(国家高技术研究发展计划(863))


An Efficient Solution Algorithm for Factored MDP Using Feature Vector Extraction
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    在诸如机器人足球赛等典型的可分解马尔可夫决策过程(factored Markov decision process,简称FMDP)模型中,不同状态属性在不同的状态下,对于状态评估的影响程度是不同的,其中存在若干关键状态属性,能够唯一或近似判断当前状态的好坏.为了解决FMDP模型中普遍存在的"维数灾"问题,在效用函数非线性的情况下,通过对状态特征向量的提取近似状态效用函数,同时根据对FMDP模型的认知程度,从线性规划和再励学习两种求解角度分别进行约束不等式组的化简和状态效用函数的高维移植,从而达到降低计算复杂度,加快联合策略生成速度的目的.以机器人足球赛任意球战术配合为背景进行实验来验证基于状态特征向量的再励学习算法的有效性和学习结果的可移植性.与传统再励学习算法相比,基于状态特征向量的再励学习算法能够极大地加快策略的学习速度.但更重要的是,还可以将学习到的状态效用函数方便地移植到更高维的FMDP模型中,从而直接计算出联合策略而不需要重新进行学习.

    Abstract:

    In factored Markov decision process (FMDP) such as Robocup system, the effect to value evaluation of various states is different from each other within state attributes. There are some important state attributes that can determine the whole state value either uniquely, or at least, approximately. Instead of using the relevance among states to reduce the state space, this paper addresses the problem of curse of dimensionality in large FMDP by approximating state value function through feature vector extraction. A key contribution of this paper is that it reduces the computation complexity by constraints reduction in linear programming, speeds up the production of joint strategy by transplanting the value function to the more complex game in reinforcement learning. Experimental results are provided on Robocup free kick, demonstrating a promising indication of the efficiency of the approach and its’ ability of transplanting the learning result. Comparing this algorithm to an existing state-of-the-art approach indicates that it can not only improve the learning speed, but also can transplant state value function to the Robocup with more players instead of learning again.

    参考文献
    相似文献
    引证文献
引用本文

张双民,石纯一.一种基于特特征向量提取的FMDP模型求解方法.软件学报,2005,16(5):733-743

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2004-02-25
  • 最后修改日期:2004-05-08
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号