基于后悔值的多Agent冲突博弈强化学习模型

微信服务号

微信订阅号

首页 > 过刊浏览>2008年第19卷第11期 >2957-2967

基于后悔值的多Agent冲突博弈强化学习模型
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Reinforcement Learning Model Based on Regret for Multi-Agent Conflict Games

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定.

Abstract:

For conflict game, a rational but conservative action selection method is investigated, namely, minimizing regret function in the worst case. By this method the loss incurred possibly in future is the lowest under this very policy, and Nash equilibrium mixed policy is obtained without information about other agents. Based on regret, a reinforcement learning model and its algorithm for conflict game under multi-agent complex environment are put forward. This model also builds agents' belief updating process on the concept of cross entropy distance, which further optimizes action selection policy for conflict games. Based on Markov repeated game model, this paper demonstrates the convergence property of this algorithm, and analyzes the relationship between belief and optimal policy. Additionally, compared with extended Q-learning algorithm under MMDP (multi-agent markov decision process), the proposed algorithm decreases the number of conflicts dramatically, enhances coordination among agents, improves system performance, and helps to maintain system stability.

参考文献

相似文献

引证文献

引用本文

肖正,张世永.基于后悔值的多Agent冲突博弈强化学习模型.软件学报,2008,19(11):2957-2967

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2007-06-28
最后修改日期:2007-08-24
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史