Group-wise Contrastive Learning Based Sequence-aware Skill Discovery

doi:10.13328/j.cnki.jos.007184

微信服务号

微信订阅号

Home > Archive>Volume , Issue , >1-15. DOI:10.13328/j.cnki.jos.007184

PDF HTML XML Export Cite reminder

Group-wise Contrastive Learning Based Sequence-aware Skill Discovery
DOI:
                        10.13328/j.cnki.jos.007184
                    
Author:
                        
                        
                    
Affiliation:
Clc Number:TP18
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Reinforcement learning has achieved remarkable results in decision-making tasks like intelligent dialogue systems, yet its efficiency diminishes notably in scenarios with intricate structures and scarce rewards. Researchers have integrated the skill discovery framework into reinforcement learning, aiming to maximize skill disparities to establish policies and boost agent performance in such tasks. However, the constraint posed by the limited diversity of sampled trajectory data confines existing skill discovery methods to learning a single skill per reinforcement learning episode. Consequently, this limitation results in subpar performance in complex tasks requiring sequential skill combinations within a single episode. To address this challenge, a group-wise contrastive learning based sequence-aware skill discovery method (GCSSD) is proposed, which integrates contrastive learning into the skill discovery framework. Initially, to augment trajectory data diversity, the complete trajectories interacting with the environment are segmented and grouped, employing contrastive loss to learn skill embedding representations from grouped trajectories. Subsequently, skill policy training is conducted by combining the skill embedding representation with reinforcement learning. Lastly, to enhance performance in tasks featuring diverse sequential skill combinations, the sampled trajectories are segmented into skill representations and embedded into the learned policy network, facilitating the sequential combination of learned skill policies. Experimental results demonstrate the efficacy of the GCSSD method in tasks characterized by sparse rewards and sequential skill combinations, showcasing its capability to swiftly adapt to tasks with varying sequential skill combinations using learned skills.

Reference

Cited by

Get Citation

杨尚东,余淼盈,陈兴国,陈蕾.基于分组对比学习的序贯感知技能发现.软件学报,,():1-15

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:September 20,2023
Revised:December 25,2023
Adopted:
Online: November 20,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

Article Metrics

History