显式知识推理和深度强化学习结合的动态决策
作者:
作者单位:

作者简介:

通讯作者:

伍楷舜,E-mail:wu@szu.edu.cn;林方真,E-mail:flin@cse.ust.hk

中图分类号:

TP18

基金项目:

国家自然科学基金(61806132, U2001207, 61872248); 广东省自然科学基金(2017A030312008); 深圳市自然科学基金(ZDSYS20190902092853047, R2020A045); 珠江人才计划(2019ZT08X603) ; 广东省普通高校创新团队项目(2019KCXTD005)


Dynamic Decision Making Based on Explicit Knowledge Reasoning and Deep Reinforcement Learning
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 深度强化学习在序列决策领域被广泛应用并且效果良好, 尤其在具有高维输入、大规模状态空间的应用场景中优势明显. 然而, 深度强化学习相关方法也存在一些局限, 如缺乏可解释性、初期训练低效与冷启动等问题. 针对这些问题, 提出了一种基于显式知识推理和深度强化学习的动态决策框架, 将显式的知识推理与深度强化学习结合. 该框架通过显式知识表示将人类先验知识嵌入智能体训练中, 让智能体在强化学习中获得知识推理结果的干预, 以提高智能体的训练效率, 并增加模型的可解释性. 将显式知识分为两种, 即启发式加速知识与规避式安全知识. 前者在训练初期干预智能体决策, 加快训练速度; 而后者将避免智能体作出灾难性决策, 使其训练过程更为稳定. 实验表明, 该决策框架在不同强化学习算法上、不同应用场景中明显提高了模型训练效率, 并增加了模型的可解释性.

    Abstract:

    In recent years, deep reinforcement learning has been widely used in sequential decisions with positive effects, and it has outstanding advantages in application scenarios with high-dimensional input and large state spaces. However, deep reinforcement learning faces some limitations such as a lack of interpretability, inefficient initial training, and a cold start. To address these issues, this study proposes a dynamic decision framework combing explicit knowledge reasoning with deep reinforcement learning. The framework successfully embeds the priori knowledge in intelligent agent training via explicit knowledge representation and gets the agent intervened by the knowledge reasoning results during the reinforcement learning, so as to improve the training efficiency and the model’s interpretability. The explicit knowledge in this study is categorized into two kinds, namely, heuristic acceleration knowledge and evasive safety knowledge. The heuristic acceleration knowledge intervenes in the decision of the agent in the initial training to speed up the training, while the evasive safety knowledge keeps the agent from making catastrophic decisions to keep the training process stable. The experimental results show that the proposed framework significantly improves the training efficiency and the model’s interpretability under different application scenarios and reinforcement learning algorithms.

    参考文献
    相似文献
    引证文献
引用本文

张昊迪,陈振浩,陈俊扬,周熠,连德富,伍楷舜,林方真.显式知识推理和深度强化学习结合的动态决策.软件学报,,():1-15

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-09-05
  • 最后修改日期:2021-10-14
  • 录用日期:
  • 在线发布日期: 2022-01-28
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号