Grenander时间结构学习与推理优化下的行为识别
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP181

基金项目:

国家重点研发计划(2017YFB1002203); 国家自然科学基金(61503111); 安徽省自然科学基金(1808085MF168); 中央高校基本科研业务费专项资金(PA2020GDSK0059)


Temporal Structure Learning with Grenander Inference for Action Recognition
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对现有基于视频整体时间结构建模的行为识别方法中, 存在的时间噪声信息和歧义信息干扰现象, 从而引起行为类别识别错误的问题, 提出一种新型的Grenander推理优化下时间图模型(temporal graph model with Grenander inference, TGM-GI). 首先, 构建3D CNN-LSTM模块, 其中3D CNN用于行为的动态特征提取, LSTM模块用于该特征的时间依赖关系优化. 其次, 在深度模块基础上, 利用Grenander理论构建了行为识别的时间图模型, 并设计了两个模块分别处理慢行为时间冗余和异常行为干扰问题, 实现了时间噪声抑制下的时间结构提议. 随后, 设计融合特征约束和语义约束的Grenander测度, 并提出一种时序增量形式的Viterbi算法, 修正了行为时间模式中的歧义信息. 最后, 采用基于动态时间规划的模式匹配方法, 完成了基于时间模式的行为识别任务. 在UCF101和Olympic Sports两个公认数据集上, 与现有多种基于深度学习的行为识别方法进行比较, 该方法获得了最好的行为识别正确率. 该方法优于基准的3D CNN-LSTM方法, 在UCF101数据集上识别精度提高6.41%, 在 Olympic Sports数据集上识别精度提高5.67%.

    Abstract:

    Action recognition is one crucial and very challenging task in computer vision. Most of the existing methods use the temporal structure of the whole video and ignore its temporal noise and ambiguity feature, which leads to failure in action recognition. To address this problem, a novel temporal graph model is proposed with Grenander inference, namely, TGM-GI. First, a 3D CNN+ LSTM module is constructed to learn deep features, in which 3D CNN extracts the dynamic feature of video clips and LSTM optimizes the time dependence between features of two clips. Second, a temporal graph model is constructed with these deep features which use the generator space of Grenander theory. The original temporal pattern is modified using two operations, in which combination operation can remove redundancy clips like slow motion and denoise operation can remove low-frequency clips like abnormal motion. Third, an incremental Viterbi algorithm is proposed for temporal pattern learning with Grenander inference, in which a Grenander measure is designed with both feature bond and semantic bond. Finally, the dynamic time warping is used to match the Grenander temporal pattern of test video with the Grenander temporal pattern of the training set and the label of the test video is predicted. The experimental results show that the proposed TGM-GI outperforms the state-of-the-art methods on two acknowledge databases. The TGM-GI is superior to the baseline method of 3D CNN-LSTM, and its accuracy improves 6.41% on the UCF101 dataset and 5.67% on the Olympic Sports dataset respectively.

    参考文献
    相似文献
    引证文献
引用本文

吴克伟,高涛,谢昭,郭文斌. Grenander时间结构学习与推理优化下的行为识别.软件学报,2022,33(5):1865-1879

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-05-08
  • 最后修改日期:2020-06-27
  • 录用日期:
  • 在线发布日期: 2022-05-09
  • 出版日期: 2022-05-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号