基于分类检索的操作规划方法
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

科技创新 2030—“新一代人工智能”重大项目(2022ZD0160900)


Classification-based Retrieval Procedure Planner
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    聚焦于教学视频(instructional videos)中的操作规划(procedure planning)问题, 探讨如何根据给定的开始和结束视觉状态, 在教学视频提供的动作空间中规划出一条将开始状态转变为结束状态的动作序列. 教学视频以记录和展示各种事件的操作过程为特点, 每个事件对应一组特定动作, 从而形成事件的动作空间. 多个事件的动作空间共同构成了教学视频的整体动作空间. 传统方法未能充分挖掘事件的语义信息, 过于依赖强化学习等复杂训练方法, 既增加了算法设计的复杂性, 又导致模型的可解释性较差. 针对这些问题, 结合教学视频的特点, 提出了一种基于分类检索的操作规划方法CPP (classification-based retrieval procedure planner), 分阶段解决操作规划任务. 具体而言, 该方法首先通过视觉状态识别事件类别, 将动作空间限定在一个较小的子空间内, 显著降低规划的复杂性; 随后, 在该子空间中进行动作序列的规划. 此外, 提出了一种混合规划策略, 将动作序列的检索与预测相结合, 进一步提升了规划性能. 实验结果表明, 方法在3个不同规模的教学视频数据集上均取得了显著效果, 为操作规划任务提供了一种简单而高效的基准方法.

    Abstract:

    This study focuses on the problem of procedure planning in instructional videos. Given the start and end observations, the task is to plan an action sequence that transforms the start state into the end state within the action space provided by the instructional videos. Instructional videos record and demonstrate the operational processes of various events. Each event includes a specific set of actions, forming the action space for that event. Therefore, the action space is composed of various subspaces corresponding to different events in instructional videos. Previous methods fail to effectively utilize the semantic information of events and overly rely on techniques such as reinforcement learning, resulting in complex training schemes and poorly explainable approaches. In contrast, this study considers the characteristics of instructional videos and proposes the classification-based retrieval procedure planner (CPP), a pipeline that addresses procedure planning from coarse to fine. Specifically, the planner first identifies the event category based on the given observations, narrowing the action space to a smaller subspace. Then, action planning is performed within the selected subspace, which is significantly easier than planning in the entire action space. Moreover, this study introduces a hybrid planning method that combines retrieval and prediction approaches to generate the action sequence. The proposed method achieves competitive results on three popular procedure planning datasets of varying scales, establishing itself as a simple yet robust baseline for procedure planning.

    参考文献
    相似文献
    引证文献
引用本文

吴益露,王瀚霖,王利民.基于分类检索的操作规划方法.软件学报,,():1-18

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-01-07
  • 最后修改日期:2025-02-10
  • 录用日期:
  • 在线发布日期: 2025-11-05
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号