扩散模型期望最大化的离线强化学习方法

doi:10.13328/j.cnki.jos.007296

微信服务号

微信订阅号

2025年5月1日 15:03 星期四

首页 > 过刊浏览>年第卷第期 >1-15. DOI:10.13328/j.cnki.jos.007296

PDF HTML阅读 XML下载导出引用引用提醒

扩散模型期望最大化的离线强化学习方法
DOI:
                        10.13328/j.cnki.jos.007296
                    
CSTR:
                        
                    
作者:
                        刘全刘全
苏州大学 计算机科学与技术学院, 江苏 苏州 215008;江苏省计算机信息处理技术重点实验室 (苏州大学), 江苏 苏州 215006
在期刊界中查找
在百度中查找
在本站中查找
颜洁颜洁
苏州大学 计算机科学与技术学院, 江苏 苏州 215008
在期刊界中查找
在百度中查找
在本站中查找
乌兰乌兰
苏州大学 计算机科学与技术学院, 江苏 苏州 215008
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:国家自然科学基金(62376179, 62176175); 新疆维吾尔自治区自然科学基金(2022D01A238); 江苏高校优势学科建设工程

Offline Reinforcement Learning Method with Diffusion Model and Expectation Maximization

Author:

LIU Quan
LIU Quan
School of Computer Science and Technology, Soochow University, Suzhou 215008, China;Jiangsu Provincial Key Laboratory for Computer Information Processing Technology (Soochow University), Suzhou 215006, China
在期刊界中查找
在百度中查找
在本站中查找
YAN Jie
YAN Jie
School of Computer Science and Technology, Soochow University, Suzhou 215008, China
在期刊界中查找
在百度中查找
在本站中查找
WU Lan
WU Lan
School of Computer Science and Technology, Soochow University, Suzhou 215008, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

在连续且密集奖励的任务中, 离线强化学习取得了显著的效果. 然而由于其训练过程不与环境交互, 泛化能力降低, 在离散且稀疏奖赏的环境下性能难以得到保证. 扩散模型通过加噪结合样本数据邻域的信息, 生成贴近样本数据分布的动作, 强化智能体的学习和泛化能力. 针对以上问题, 提出一种扩散模型期望最大化的离线强化学习方法(offline reinforcement learning with diffusion models and expectation maximization, DMEM). 该方法通过极大似然对数期望最大化更新目标函数, 使策略具有更强的泛化性. 将扩散模型引入策略网络中, 利用扩散的特征, 增强策略学习数据样本的能力. 同时从高维空间的角度看期望回归更新价值函数, 引入一个惩戒项使价值函数评估更准确. 将DMEM应用于一系列离散且稀疏奖励的任务中, 实验表明, 与其他经典的离线强化学习方法相比, DMEM性能上具有较大的优势.

关键词:离线强化学习;扩散模型;优势函数加权;期望回归;期望最大化

Abstract:

Offline reinforcement learning has yielded significant results in tasks with continuous and intensive rewards. However, since the training process does not interact with the environment, the generalization ability is reduced, and the performance is difficult to guarantee in a discrete and sparse reward environment. The diffusion model combines the information in the neighborhood of the sample data with noise addition to generate actions that are close to the distribution of the sample data, which strengthens the learning and generalization ability of the agents. To this end, offline reinforcement learning with diffusion models and expectation maximization (DMEM) is proposed. The method updates the objective function by maximizing the expectation of the maximum likelihood logarithm to make the strategy more generalizable. Additionally, the diffusion model is introduced into the strategy network to utilize the diffusion characteristics to enhance the ability of the strategy to learn data samples. Meanwhile, the expectile regression is employed to update the value function from the perspective of high-dimensional space, and a penalty term is introduced to make the evaluation of the value function more accurate. DMEM is applied to a series of tasks with discrete and sparse rewards, and experiments show that DMEM has a large advantage in performance over other classical offline reinforcement learning methods.

Key words:offline reinforcement learning;diffusion model;advantage function weighted;expectile regression;expectation maximization

引用本文

刘全,颜洁,乌兰.扩散模型期望最大化的离线强化学习方法.软件学报,,():1-15

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-05-06
最后修改日期:2024-07-18
录用日期:
在线发布日期: 2025-02-19
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码