Shaping Reward Learning Approach from Passive Samples

doi:10.3724/SP.J.1001.2013.04471

微信服务号

微信订阅号

2025-4-24- 16

Home > Archive>Volume 24, Issue 11, 2013 >2667-2675. DOI:10.3724/SP.J.1001.2013.04471

PDF HTML XML Export Cite reminder

Shaping Reward Learning Approach from Passive Samples
DOI:
                        10.3724/SP.J.1001.2013.04471
                    
Author:
                        QIAN YuQIAN Yu
National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU YangYU Yang
National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHOU Zhi-HuaZHOU Zhi-Hua
National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Reinforcement learning (RL) deals with long-term reward maximization problems via learning correct short-term decisions from on previous experience. It has been revealed that reward shaping, which provides simpler and easier reward functions to replace the actual environmental reward, is an effective way to guide and accelerate reinforcement learning. However, building a shaping reward requires either domain knowledge or demonstrations from an optimal policy, both involve participation of human experts that is costly. This work investigates whether it is possible to automatically learn a better shaping reward along with an RL process. RL algorithms commonly sample a lot of trajectories throughout the learning process. Those passive samples, though containing many failed attempts, may provide useful information for building a shaping reward function. A policy-invariance condition for reward shaping is introduced as a more effective way to handle noisy examples, followed by the RFPotential approach to learn a shaping reward from massive examples efficiently. Empirical studies on various RL algorithms and domains show that RFPotential can accelerate the RL process.

Key words:shaping reward;passive sample;policy-invariance;reinforcement learning

Get Citation

钱煜,俞扬,周志华.一种基于自生成样本学习的奖赏塑形方法.软件学报,2013,24(11):2667-2675

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 06,2013
Revised:July 17,2013
Adopted:
Online: November 01,2013
Published:

You are the first2038141Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History