基于多模态多粒度图卷积网络的老年人日常行为识别
作者:
作者单位:

作者简介:

丁静(1997-),女,硕士生,主要研究领域为视频行为识别;舒祥波(1986-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为图像视频内容分析,多媒体分析,计算机视觉;黄捧(1996-),男,博士生,主要研究领域为视频行为识别;姚亚洲(1987-),男,博士,教授,CCF专业会员,主要研究领域为多媒体技术,计算机视觉,机器学习;宋砚(1983-),女,博士,副教授,主要研究领域为多媒体内容分析,视频内容理解,计算机视觉

通讯作者:

舒祥波,shuxb@njust.edu.cn

中图分类号:

TP183

基金项目:

科技创新2030“新一代人工智能”重大项目课题(2018AAA0102001);国家自然科学基金(62072245,61932020,62102182,61976116)


Multimodal and Multi-granularity Graph Convolutional Networks for Elderly Daily Activity Recognition
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着人口老龄化问题日益严重,人们对家庭环境中老年人的安全问题越来越重视.目前,国内外一些研究机构正在试图研究通过家用摄像头对老年人的日常行为进行智能化看护,实现对一些危险行为的预警、报警与报备.为了助推这些技术的产业化,主要研究如何自动识别出老年人的日常行为,如“喝水”“洗手”“读书”“看报”等.通过对老年人的日常行为视频的调研发现,老年人的日常行为语义具有非常明显的细粒度特性,如“喝水”与“吃药”两种行为的语义高度相似,且只有少量的关键帧能准确体现出其类别语义.为了有效解决老年人行为识别问题,提出一种新的多模态多粒度图卷积网络(multimodal and multi-granularity graph convolutional networks,MM-GCN),通过利用图卷积网络分别从人体骨骼点(“点”)和人体骨架(“线”)、关键帧(“面”)和视频提名段(“段”)两种模态对老年人行为进行建模,捕捉“点-线-面-段”这4种颗粒度对象下的语义信息.最后,在目前最大规模的老年人日常行为数据集ETRI-Activity3D (11万+视频段、50+行为类别)上进行老年人行为识别性能评测,相比于当前最好的方法,提出的MM-GCN方法取得了最高的识别性能.此外,为了验证MM-GCN方法对常规人体行为识别任务的鲁棒性能,在业界标准的NTU RGB+D数据集上进行实验,MM-GCN方法也表现出了很不错的性能.

    Abstract:

    With the problem of the aging population becomes serious, more attention is payed to the safety of the elderly when they are at home alone. In order to provide early warning, alarm, and report of some dangerous behaviors, several domestic and foreign research institutions are focusing on studying the intelligent monitoring of the daily activities of the elderly in robot-view. For promoting the industrialization of these technologies, this work mainly studies how to automatically recognize the daily activities of the elderly, such as “drinking water”, “washing hands”, “reading a book”, “reading a newspaper”. Through the investigation of the daily activity videos of the elderly, it is found that the semantics of the daily activities of the elderly are obviously fine-grained. For example, the semantics of “drinking water” and “taking medicine” are highly similar, and only a small number of video frames can accurately reflect their category semantics. To effectively address such problem of the elderly behavior recognition, this work proposes a new multimodal multi-granularity graph convolutional network (MM-GCN), by applying the graph convolution network on four modalities, i.e., the skeleton (“point”), bone (“line”), frame (“frame”), and proposal (“segment”), to model the activities of the elderly, and capture the semantics under the four granularities of “point-line-frame-proposal”. Finally, the experiments are conducted to validate the activity recognition performance of the proposed method on ETRI-Activity3D (110000+ videos, 50+ classes), which is the largest daily activities dataset for the elderly. Compared with the state-of-the-art methods, the proposed MM-GCN achieves the highest recognition accuracy. In addition, in order to verify the robustness of MM-GCN for the normal human action recognition tasks, the experiment is also carried out on the benchmark NTU RGB+D, and the results show that MM-GCN is comparable to the SOTA methods.

    参考文献
    相似文献
    引证文献
引用本文

丁静,舒祥波,黄捧,姚亚洲,宋砚.基于多模态多粒度图卷积网络的老年人日常行为识别.软件学报,2023,34(5):2350-2364

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-04-02
  • 最后修改日期:2021-06-06
  • 录用日期:
  • 在线发布日期: 2022-09-30
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号