基于半监督学习的长尾时序动作检测

doi:10.13328/j.cnki.jos.007154

微信服务号

微信订阅号

2025年6月14日 17:06 星期六

首页 > 过刊浏览>2025年第36卷第2期 >625-643. DOI:10.13328/j.cnki.jos.007154

PDF HTML阅读 XML下载导出引用引用提醒

基于半监督学习的长尾时序动作检测
DOI:
                        10.13328/j.cnki.jos.007154
                    
CSTR:
                        
                    
作者:
                        王雨虹王雨虹
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
武港山武港山
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
王利民王利民
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:科技创新2030—“新一代人工智能”重大项目(2022ZD0160900); 国家自然科学基金(62076119, 61921006)

Long-tailed Temporal Action Detection Based on Semi-supervised Learning

Author:

WANG Yu-Hong
WANG Yu-Hong
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
WU Gang-Shan
WU Gang-Shan
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Li-Min
WANG Li-Min
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

现实世界中的数据标签分布往往呈现长尾效应, 即少部分类别占据绝大多数样本, 时序动作检测问题也不例外. 现有的时序动作检测方法往往缺乏对少样本类别的关注, 即充分建模样本数量多的头部类别, 而忽视了样本数量少的尾部类别. 对长尾时序动作检测问题进行了系统的定义, 并针对长尾时序动作检测问题, 提出一种基于半监督学习的加权类别重平衡自训练方法, 充分利用现实世界中存在的大规模无标签数据, 来重平衡训练样本中的标签分布, 改善模型对尾部类别的拟合效果. 还针对时序动作检测任务, 提出一种伪标签损失加权方法, 使模型训练更加稳定. 在THUMOS14和HACS Segments数据集上进行实验, 并分别利用THUMOS15数据集和ActivityNet1.3数据集中的视频样本来构成相应的无标签数据集. 此外, 还针对视频审核应用需求, 收集Dance数据集, 包括35个动作类别、6632个有标签视频和13264个无标签视频, 并保留数据分布显著的长尾效应. 使用多种基线模型, 在 THUMOS14、HACS Segments 和 Dance 数据集上进行实验. 实验结果表明, 所提出的加权类别重平衡自训练方法可以提高模型对尾部动作类别的检测效果, 并且能应用于不同的基线时序动作检测模型提升其性能.

关键词:视频分析;时序动作检测;深度长尾学习;半监督学习

Abstract:

The label distribution in the real world often shows the long-tail effect, where a small number of categories account for the vast majority of samples. The temporal action detection problem is no exception. The existing temporal action detection methods often focus on the head categories with a large number of samples, while neglecting the few-sample categories. This study systematically defines the long-tail temporal action detection problem and proposes a weighted class-rebalancing self-training method (WCReST) based on a semi-supervised learning framework. WCReST makes full use of the large-scale unlabeled data that exists in the real world to rebalance the label distribution in the training samples to improve the model’s fit for the tail categories. Additionally, a pseudo-label loss weighting method is proposed for the temporal action detection task to enhance the stability of model training. Experiments are conducted on the THUMOS14 and HACS Segments datasets, using video samples from the THUMOS15 and ActivityNet1.3 datasets to form corresponding unlabeled datasets. In addition, the Dance dataset is collected to meet the application requirements of video review, which includes 35 action categories, 6632 labeled videos, and 13264 unlabeled videos, preserving the significant long-tail effect in data distribution. A variety of baseline models are used to conduct experiments on the THUMOS14, HACS Segments, and Dance datasets. The results demonstrate that the proposed WCReST can improve the model’s detection performance on tail action categories and can be applied to different baseline temporal action detection models to enhance their performance.

Key words:video analysis;temporal action detection;deep long-tailed learning;semi-supervised learning

引用本文

王雨虹,武港山,王利民.基于半监督学习的长尾时序动作检测.软件学报,2025,36(2):625-643

复制

文章指标

点击次数:212
下载次数: 1617
HTML阅读次数: 371
引用次数: 0

历史

收稿日期:2023-08-11
最后修改日期:2023-09-07
录用日期:
在线发布日期: 2024-07-17
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码