基于后悔值的多Agent冲突博弈强化学习模型
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:


Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定.

    Abstract:

    For conflict game, a rational but conservative action selection method is investigated, namely, minimizing regret function in the worst case. By this method the loss incurred possibly in future is the lowest under this very policy, and Nash equilibrium mixed policy is obtained without information about other agents. Based on regret, a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward. This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games. Based on Markov repeated game model, this paper demonstrates the convergence property of this algorithm, and analyzes the relationship between belief and optimal policy. Additionally, compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process), the proposed algorithm decreases the number of conflicts dramatically, enhances coordination among agents, improves system performance, and helps to maintain system stability.

    参考文献
    相似文献
    引证文献
引用本文

肖 正,张世永.基于后悔值的多Agent冲突博弈强化学习模型.软件学报,2008,19(11):2957-2967

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2007-06-28
  • 最后修改日期:2007-08-24
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号