基于模型的强化学习中可学习的样本加权机制

doi:10.13328/j.cnki.jos.006489

微信服务号

微信订阅号

2025年8月13日 7:21 星期三

首页 > 过刊浏览>2023年第34卷第6期 >2765-2775. DOI:10.13328/j.cnki.jos.006489

PDF HTML阅读 XML下载导出引用引用提醒

基于模型的强化学习中可学习的样本加权机制
DOI:
                        10.13328/j.cnki.jos.006489
                    
CSTR:
                        
                    
作者:
                        黄文振黄文振
中国科学院大学 人工智能学院, 北京 100049;中国科学院 自动化研究所 智能系统与工程研究中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
尹奇跃尹奇跃
中国科学院大学 人工智能学院, 北京 100049;中国科学院 自动化研究所 智能系统与工程研究中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
张俊格张俊格
中国科学院大学 人工智能学院, 北京 100049;中国科学院 自动化研究所 智能系统与工程研究中心, 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
黄凯奇黄凯奇
中国科学院大学 人工智能学院, 北京 100049;中国科学院 自动化研究所 智能系统与工程研究中心, 北京 100190;中国科学院 脑科学与智能技术卓越创新中心, 上海 200031
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:黄文振(1992-),男,博士,主要研究领域为强化学习;尹奇跃(1990-),男,博士,副研究员,CCF专业会员,主要研究领域为机器学习,数据挖掘,人工智能与游戏;张俊格(1986-),男,博士,研究员,主要研究领域为博弈决策,强化学习,模式识别,人工智能;黄凯奇(1977-),男,博士,研究员,博士生导师,CCF杰出会员,主要研究领域为计算机视觉,模式识别,人机对抗,视觉监控应用
通讯作者:张俊格，jgzhang@nlpr.ia.ac.cn
中图分类号:TP181
基金项目:国家自然科学基金（61876181，61673375）；北京市科技创新计划（Z19110000119043）；中国科学院青年创新促进会项目；中国科学院项目（QYZDB-SSW-JSC006）

Learnable Weighting Mechanism in Model-based Reinforcement Learning

Author:

HUANG Wen-Zhen
HUANG Wen-Zhen
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;Center for Research on Intelligent System and Engineering (CRISE), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
YIN Qi-Yue
YIN Qi-Yue
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;Center for Research on Intelligent System and Engineering (CRISE), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Jun-Ge
ZHANG Jun-Ge
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;Center for Research on Intelligent System and Engineering (CRISE), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
HUANG Kai-Qi
HUANG Kai-Qi
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China;Center for Research on Intelligent System and Engineering (CRISE), Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai 200031, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

基于模型的强化学习方法利用已收集的样本对环境进行建模并使用构建的环境模型生成虚拟样本以辅助训练，因而有望提高样本效率.但由于训练样本不足等问题，构建的环境模型往往是不精确的，其生成的样本也会因携带的预测误差而对训练过程产生干扰.针对这一问题，提出了一种可学习的样本加权机制，通过对生成样本重加权以减少它们对训练过程的负面影响.该影响的量化方法为，先使用待评估样本更新价值和策略网络，再在真实样本上计算更新前后的损失值，使用损失值的变化量来衡量待评估样本对训练过程的影响.实验结果表明，按照该加权机制设计的强化学习算法在多个任务上均优于现有的基于模型和无模型的算法.

关键词:基于模型的强化学习;模型误差;元学习;强化学习;深度学习

Abstract:

Model-based reinforcement learning methods train a model to simulate the environment by using the collected samples and utilize the imaginary samples generated by the model to optimize the policy, thus they have potential to improve sample efficiency. Nevertheless, due to the shortage of training samples, the environment model is often inaccurate, and the imaginary samples generated by it would be deleterious for the training process. For this reason, a learnable weighting mechanism is proposed which can reduce the negative effect on the training process by weighting the generated samples. The effect of the imaginary samples on the training process is quantified through calculating the difference between the losses on the real samples before and after updating value and policy networks by the imaginary samples. The experimental results show that the reinforcement learning algorithm using the weighting mechanism is superior to existing model-based and model-free algorithms in multiple tasks.

Key words:model-based reinforcement learning;model-bias;meta-learning;reinforcement learning;deep learning

引用本文

黄文振,尹奇跃,张俊格,黄凯奇.基于模型的强化学习中可学习的样本加权机制.软件学报,2023,34(6):2765-2775

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2021-04-14
最后修改日期:2021-06-07
录用日期:
在线发布日期: 2022-10-14
出版日期: 2023-06-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码