面向参数化动作空间的多智能体中心化策略梯度分解及其应用
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

国家自然科学基金(62173336, 92271108)


Factored Multi-agent Centralised Policy Gradients with Parameterized Action Space and Its Application
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 多智能体强化学习方法凭借AlphaStar、AlphaDogFight、AlphaMosaic等成功案例展示出卓越的决策能力以及广泛的应用前景. 在真实环境的多智能体决策系统中, 其任务的决策空间往往是同时具有离散型动作变量和连续型动作变量的参数化动作空间. 这类动作空间的复杂性结构使得传统单一针对离散型或连续型的多智能体强化学习算法不在适用, 因此研究能用于参数化动作空间的多智能体强化学习算法具有重要的现实意义. 提出一种面向参数化动作空间的多智能体中心化策略梯度分解算法, 利用中心化策略梯度分解算法保证多智能体的有效协同, 结合参数化深度确定性策略梯度算法中双头策略输出实现对参数化动作空间的有效耦合. 通过在Hybrid Predator-Prey场景中不同参数设置下的实验结果表明该算法在经典的多智能体参数化动作空间协作任务上具有良好的性能. 此外, 在多巡航导弹协同突防场景中进行算法效能验证, 实验结果表明该算法在多巡航导弹突防这类具有高动态、行为复杂化的协同任务中有效性和可行性.

    Abstract:

    In recent years, multi-agent reinforcement learning methods have demonstrated excellent decision-making capabilities and broad application prospects in successful cases such as AlphaStar, AlphaDogFight, and AlphaMosaic. In the multi-agent decision-making system in a real-world environments, the decision-making space of its task is often a parameterized action space with both discrete and continuous action variables. The complex structure of this type of action space makes traditional multi-agent reinforcement learning algorithms no longer applicable. Therefore, researching for parameterized action spaces holds important significance in real-world application. This study proposes a factored multi-agent centralised policy gradients algorithm for parameterized action space in multi-agent settings. By utilizing the factored centralised policy gradient algorithm, effective coordination among multi-agent is ensured. After that, the output of the dual-headed policy in the parameterized deep deterministic policy gradient algorithm is employed to achieve effective coupling in the parameterized action space. Experimental results under different parameter settings in the hybrid predator-prey scenario show that the algorithm has good performance on classic multi-agent parameterized action space collaboration tasks. Additionally, the algorithm’s effectiveness and feasibility is validated in a multi-cruise-missile collaborative penetration tasks with complex and high dynamic properties.

    参考文献
    相似文献
    引证文献
引用本文

田树聪,谢愈,张远龙,周正春,高阳.面向参数化动作空间的多智能体中心化策略梯度分解及其应用.软件学报,,():1-18

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-07-11
  • 最后修改日期:2023-10-09
  • 录用日期:
  • 在线发布日期: 2024-07-17
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号