Multimodal and Multi-granularity Graph Convolutional Networks for Elderly Daily Activity Recognition
Author:
Affiliation:

Clc Number:

TP183

  • Article
  • | |
  • Metrics
  • |
  • Reference [58]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    With the problem of the aging population becomes serious, more attention is payed to the safety of the elderly when they are at home alone. In order to provide early warning, alarm, and report of some dangerous behaviors, several domestic and foreign research institutions are focusing on studying the intelligent monitoring of the daily activities of the elderly in robot-view. For promoting the industrialization of these technologies, this work mainly studies how to automatically recognize the daily activities of the elderly, such as “drinking water”, “washing hands”, “reading a book”, “reading a newspaper”. Through the investigation of the daily activity videos of the elderly, it is found that the semantics of the daily activities of the elderly are obviously fine-grained. For example, the semantics of “drinking water” and “taking medicine” are highly similar, and only a small number of video frames can accurately reflect their category semantics. To effectively address such problem of the elderly behavior recognition, this work proposes a new multimodal multi-granularity graph convolutional network (MM-GCN), by applying the graph convolution network on four modalities, i.e., the skeleton (“point”), bone (“line”), frame (“frame”), and proposal (“segment”), to model the activities of the elderly, and capture the semantics under the four granularities of “point-line-frame-proposal”. Finally, the experiments are conducted to validate the activity recognition performance of the proposed method on ETRI-Activity3D (110000+ videos, 50+ classes), which is the largest daily activities dataset for the elderly. Compared with the state-of-the-art methods, the proposed MM-GCN achieves the highest recognition accuracy. In addition, in order to verify the robustness of MM-GCN for the normal human action recognition tasks, the experiment is also carried out on the benchmark NTU RGB+D, and the results show that MM-GCN is comparable to the SOTA methods.

    Reference
    [1] López-Otín C, Blasco MA, Serrano M, Kroemer G. The hallmarks of aging. Cell, 2013, 153(6): 1194–1217. [doi: 10.1016/j.cell.2013.05.039]
    [2] 孙志军, 薛磊, 许阳明, 王正. 深度学习研究综述. 计算机应用研究, 2012, 29(8): 2806–2810. [doi: 10.3969/j.issn.1001-3695.2012.08.002]
    Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012, 29(8): 2806–2810 (in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2012.08.002]
    [3] 奚雪峰, 周国栋. 面向自然语言处理的深度学习研究. 自动化学报, 2016, 42(10): 1445–1465. [doi: 10.16383/j.aas.2016.c150682]
    Xi XF, Zhou GD. A survey on deep learning for natural language processing. Acta Automatica Sinica, 2016, 42(10): 1445–1465 (in Chinese with English abstract). [doi: 10.16383/j.aas.2016.c150682]
    [4] 张顺, 龚怡宏, 王进军. 深度卷积神经网络的发展及其在计算机视觉领域的应用. 计算机学报, 2019, 42(3): 453–482. [doi: 10.11897/SP.J.1016.2019.00453]
    Zhang S, Gong YH, Wang JJ. The development of deep convolution neural network and its applications on computer vision. Chinese Journal of Computers, 2019, 42(3): 453–482 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2019.00453]
    [5] 朱煜, 赵江坤, 王逸宁, 郑兵兵. 基于深度学习的人体行为识别算法综述. 自动化学报, 2016, 42(6): 848–857. [doi: 10.16383/j.aas.2016.c150710]
    Zhu Y, Zhao JK, Wang YN, Zheng BB. A review of human action recognition based on deep learning. Acta Automatica Sinica, 2016, 42(6): 848–857 (in Chinese with English abstract). [doi: 10.16383/j.aas.2016.c150710]
    [6] Kidd CD, Orr R, Abowd GD, Atkeson CG, Essa IA, Macintyre B, Mynatt E, Starner TE, Newstetter W. The aware home: A living laboratory for ubiquitous computing research. In: Proc. of the 1999 Int’l Workshop on Cooperative Buildings. Integrating Information, Organizations, and Architecture. Pittsburgh: Springer, 1999. 191–198.
    [7] Jang J, Kim D, Park C, Jang M, Lee J, Kim J. ETRI-Activity3D: A large-scale RGB-D dataset for robots to recognize daily activities of the elderly. In: Proc. of the 2020 IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems. Las Vegas: IEEE, 2020. 10990–10997.
    [8] Veeriah V, Zhuang NF, Qi GJ. Differential recurrent neural networks for action recognition. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 4041–4049.
    [9] Bruna J, Zaremba W, Szlam A, LeCun Y. Spectral networks and locally connected networks on graphs. arXiv:1312.6203, 2014.
    [10] Yan R, Xie LX, Tang JH, Shu XB, Tian Q. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2020.
    [11] 王文冠, 沈建冰, 贾云得. 视觉注意力检测综述. 软件学报, 2019, 30(2): 416–439. http://www.jos.org.cn/1000-9825/5636.htm
    Wang WG, Shen JB, Jia YD. Review of visual attention detection. Ruan Jian Xue Bao/Journal of Software, 2019, 30(2): 416-439 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5636.htm
    [12] Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: Proc. of the 2005 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. San Diego: IEEE, 2005. 886-893.
    [13] Lowe DG. Object recognition from local scale-invariant features. In: Proc. of the 7th IEEE Int’l Conf. on Computer Vision. Kerkyra: IEEE, 1999. 1150-1157.
    [14] Yan R, Xie LX, Tang JH, Shu XB, Tian Q. Social adaptive module for weakly-supervised group activity recognition. In: Proc. of the 16th European Conf. on Computer Vision. Glasgow: Springer, 2020. 208–224.
    [15] Yan SJ, Xiong YJ, Lin DH. Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv:1801.07455, 2018.
    [16] Li CL, Cui Z, Zheng WM, Xu CY, Yang J. Spatio-temporal graph convolution for skeleton based action recognition. arXiv:1802.09834, 2018.
    [17] Lin CH, Chou PY, Lin CH, Tsai MY. SlowFast-GCN: A novel skeleton-based action recognition framework. In: Proc. of the 2020 IEEE Int’l Conf. on Pervasive Artificial Intelligence. Taipei: IEEE, 2020. 170–174.
    [18] Feichtenhofer C, Fan HQ, Malik J, He KM. Slowfast networks for video recognition. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 6201–6210.
    [19] Gao XS, Li KQ, Zhang Y, Miao QG, Sheng LJ. Xie J, Xu JF. 3D skeleton-based video action recognition by graph convolution network. In: Proc. of the 2019 IEEE Int’l Conf. on Smart Internet of Things. Tianjin: IEEE, 2019. 500–501.
    [20] Li MS, Chen SH, Chen X, Zhang Y, Wang YF, Tian Q. Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 3590–3598.
    [21] Li B, Li X, Zhang ZF, Wu F. Spatio-temporal graph routing for skeleton-based action recognition. In: Proc. of the 33rd AAAI Conf. on Artificial Intelligence and the 31st Innovative Applications of Artificial Intelligence Conf. and the 9th AAAI Symp. on Educational Advances in Artificial Intelligence. Honolulu: AAAI, 2019. 8561–8568.
    [22] Shi L, Zhang YF, Cheng J, Lu HQ. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 12018–12027.
    [23] Zhang JR, Shen FM, Xu X, Shen HT. Temporal reasoning graph for activity recognition. IEEE Transactions on Image Processing, 2020, 29: 5491–5506. [doi: 10.1109/TIP.2020.2985219]
    [24] Shi XB, Li HW, Liu F, Zhang DY, Bi J, Li KZ. Graph convolutional networks with objects for skeleton-based action recognition. In: Proc. of the 2019 IEEE Int’l Conf. on Ubiquitous Computing & Communications and Data Science and Computational Intelligence and Smart Computing, Networking and Services. Shenyang: IEEE, 2019. 280–285.
    [25] Cheng K, Zhang YF, He XY, Chen WH, Cheng J, Lu HQ. Skeleton-based action recognition with shift graph convolutional network. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 180–189.
    [26] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述. 软件学报, 2021, 32(2): 327–348. http://www.jos.org.cn/1000-9825/6125.htm
    Du PF, Li XY, Gao YL. Survey on multimodal visual language representation learning. Ruan Jian Xue Bao/Journal of Software, 2021, 32(2): 327–348 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6125.htm
    [27] Wang PC, Li WQ, Wan J, Ogunbona P, Liu XW. Cooperative training of deep aggregation networks for RGB-D action recognition. In: Proc.of the 32nd AAAI Conf. on Artificial Intelligence. New Orleans: AAAI, 2018. 7404–7411.
    [28] Liu MY, Yuan JS. Recognizing human actions as the evolution of pose estimation maps. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 1159–1168.
    [29] Hu JF, Zheng WS, Lai JH, Zhang JG. Jointly learning heterogeneous features for RGB-D activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2186–2200. [doi: 10.1109/TPAMI.2016.2640292]
    [30] Hu JF, Zheng WS, Pan JH, Lai JH, Zhang JG. Deep bilinear learning for RGB-D action recognition. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 346–362.
    [31] Liu TS, Kong J, Jiang M. RGB-D action recognition using multimodal correlative representation learning model. IEEE Sensors Journal, 2019, 19(5): 1862–1872. [doi: 10.1109/JSEN.2018.2884443]
    [32] Li JN, Xie XM, Pan QZ, Cao YH, Zhao ZF, Shi GM. SGM-Net: Skeleton-guided multimodal network for action recognition. Pattern Recognition, 2020, 104: 107356. [doi: 10.1016/j.patcog.2020.107356]
    [33] Shi L, Zhang YF, Cheng J, Lu HQ. Skeleton-based action recognition with directed graph neural networks. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 7904–7913.
    [34] Gao X, Hu W, Tang JX, Liu JY, Guo ZM. Optimized skeleton-based action recognition via sparsified graph regression. In: Proc. of the 27th ACM Int’l Conf. on Multimedia. Nice: ACM, 2019. 601–610.
    [35] Liu ZY, Zhang HW, Chen ZH, Wang ZY, Ouyang WL. Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 140–149.
    [36] Bartolomeo P. The attention systems of the human brain. In: Bartolomeo P, ed. Attention Disorders After Right Brain Damage. London: Springer, 2014. 1–19.
    [37] Du WB, Wang YL, Qiao Y. RPAN: An end-to-end recurrent pose-attention network for action recognition in videos. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 3745–3754.
    [38] Baradel F, Wolf C, Mille J, Taylor G W. Glimpse clouds: Human activity recognition from unstructured feature points. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 469–478.
    [39] Liu J, Wang G, Hu P, Duan LY, Kot AC. Global context-aware attention LSTM networks for 3D action recognition. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 3671–3680.
    [40] Hu ZY, Lee EJ. Dual attention-guided multiscale dynamic aggregate graph convolutional networks for skeleton-based human action recognition. Symmetry, 2020, 12(10): 1589. [doi: 10.3390/sym12101589]
    [41] Li D, Yao T, Duan LY, Mei T, Rui Y. Unified spatio-temporal attention networks for action recognition in videos. IEEE Transactions on Multimedia, 2019, 21(2): 416–428. [doi: 10.1109/TMM.2018.2862341]
    [42] Carreira J, Zisserman A. Quo vadis, action recognition? A new model and the kinetics dataset. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 4724–4733.
    [43] Kim TS, Reiter A. Interpretable 3D human action analysis with temporal convolutional networks. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017. 1623–1631.
    [44] Lin TW, Zhao X, Su HS, Wang CJ, Yang M. BSN: Boundary sensitive network for temporal action proposal generation. In: Proc. of the 15th European Conf. on Computer Vision. Munich: Springer, 2018. 3–21.
    [45] Shahroudy A, Liu J, Ng TT, Wang G. NTU RGB+D: A large scale dataset for 3D human activity analysis. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 1010–1019.
    [46] Li S, Li WQ, Cook C, Zhu C, Gao YB. Independently recurrent neural network (indrnn): Building a longer and deeper RNN. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 5457–5466.
    [47] Wang HS, Wang L. Beyond joints: Learning representations from primitive geometries for skeleton-based action recognition and detection. IEEE Transactions on Image Processing, 2018, 27(9): 4382–4394. [doi: 10.1109/TIP.2018.2837386]
    [48] Li C, Xie CY, Zhang BC, Han JG, Zhen XT, Chen J. Memory attention networks for skeleton-based action recognition. IEEE Trans. on Neural Networks and Learning Systems, 2021.
    [49] Xu YY, Cheng J, Wang L, Xia HY, Liu F, Tao DP. Ensemble one-dimensional convolution neural networks for skeleton-based action recognition. IEEE Signal Processing Letters, 2018, 25(7): 1044–1048. [doi: 10.1109/LSP.2018.2841649]
    [50] Li C, Zhong QY, Xie D, Pu SL. Skeleton-based action recognition with convolutional neural networks. In: Proc. of the 2017 IEEE Int’l Conf. on Multimedia & Expo Workshops. Hong Kong: IEEE, 2017. 597–600.
    [51] Li C, Zhong QY, Xie D, Pu SL. Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: ACM, 2018. 786–792.
    [52] Wen YH, Gao L, Fu HB, Zhang FL, Xia SH. Graph CNNs with motif and variable temporal block for skeleton-based action recognition. In: Proc. of the 2019 AAAI Conf. on Artificial Intelligence. Honolulu: AAAI, 2019. 8989–8996.
    Related
    Cited by
Get Citation

丁静,舒祥波,黄捧,姚亚洲,宋砚.基于多模态多粒度图卷积网络的老年人日常行为识别.软件学报,2023,34(5):2350-2364

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 02,2021
  • Revised:June 06,2021
  • Online: September 30,2022
  • Published: May 06,2023
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063