基于深度学习的新型视频分析系统综述
作者:
作者简介:

智慧信息系统新技术专题

通讯作者:

徐辰,cxu@dase.ecnu.edu.cn

基金项目:

国家自然科学基金(61902128);上海市扬帆计划(19YF1414200)


Survey of Novel Video Analysis Systems Based on Deep Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [76]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    摄像设备在生活中的普及,使得视频数据快速增长,这些数据中蕴含丰富的信息.早期,研究人员基于传统的计算机视觉技术开发视频分析系统,用于提取并分析视频数据.近年来,深度学习技术在人脸识别等领域取得了突破性进展,基于深度学习的新型视频分析系统不断涌现.从应用、技术、系统等角度,综述了新型视频分析系统的研究进展.首先,回顾了视频分析系统的发展历史,指出了新型视频分析系统与传统视频分析系统的区别;其次,分析了新型视频分析系统在计算和存储两方面所面临的挑战,从视频数据的组织分布和视频分析的应用需求两方面探讨了新型视频分析系统的影响因素;再次,将新型视频分析系统划分为针对计算优化的系统和针对存储优化的系统两大类,选取其中典型的代表并介绍其核心设计理念;最后,从多个维度对比和分析了新型视频分析系统,指出了这些系统当前存在的问题,并据此展望了新型视频分析系统未来的研究和发展方向.

    Abstract:

    The popularity of camera devices in daily life has led to a rapid growth in video data, which contains rich information. Earlier, researchers developed video analytics systems based on traditional computer vision techniques to extract and then to analyze video data. In recent years, deep learning has made breakthroughs in areas such as face recognition, and novel video analysis systems based on deep learning have appeared. This paper presents an overview of the research progress of novel video analytics systems from the perspectives of applications, technologies, and systems. Firstly, the development history of video analytics systems is reviewed and the differences are pointe out between novel video analytics systems and traditional video analytics systems. Secondly, the challenges of the novel video analysis system are analyzed in terms of both computation and storage, and the influencing factors of the novel video analysis system are discussed in terms of the organization and distribution of video data and the application requirements of video analysis. Then, the novel video analytics systems are classified into two categories: Optimized for computation and optimized for storage, typical representatives of these systems are selects and their main ideas are introduced. Finally, the novel video analytics systems are compared and analyzed from multiple dimensions, the current problems of these systems are pointed out, and the future research and development direction of novel video analytics systems are looked at accordingly.

    参考文献
    [1] Cisco. The all bytes era: Trends and analysis. 2017(in Chinese). https://www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/index.html
    [2] Cao SD, Hua Y, Feng D, Sun YY, Zuo PF. High-performance distributed storage system for large-scale high-definition video data. Ruan Jian Xue Bao/Journal of Software, 2017, 28(8): 1999-2009(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5203.htm [doi: 10.13328/j.cnki.jos.005203]
    [3] Buch N, Velastin SA, Orwell J. A review of computer vision techniques for the analysis of urban traffic. IEEE Trans. on Intelligent Transportation Systems, 2011, 12(3): 920-939. [doi: 10.1109/TITS.2011.2119372]
    [4] Tang Z, Naphade M, Liu MY, Yang X, Birchfield S, Wang S, Kumar R, Anastasiu D, Hwang JN. Cityflow: A city-scale benchmark for multi-target multi-camera vehicle tracking and re-identification. In: Proc of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition. Los Angeles: IEEE, 2019. 8789-8798. [doi: 10.1109/CVPR.2019.00900]
    [5] Shirazi MS, Morris BT. Vision-based turning movement monitoring: Count, speed & waiting time estimation. IEEE Intelligent Transportation Systems Magazine, 2016, 8(1): 23-34. [doi: 10.1109/MITS.2015.2477474]
    [6] Jia SJ, Hu SP, Yang MZ, Liu ST. Indoor target anomaly detection based on Dense_YOLO. Journal of Dalian Jiaotong University, 2019, 40(3): 102-107(in Chinese with English abstract). [doi: 10.13291/j.cnki.djdxac.2019.03.020]
    [7] Kang D, Bailis P, Zaharia M. Challenges and opportunities in DNN-based video analytics: A demonstration of the Blazeit video query engine. In: Proc. of the 9th Biennial Conf. on Innovative Data Systems Research. 2019. http://cidrdb.org/cidr2019/papers/p141-kang-cidr19.pdf
    [8] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 2017, 60(6): 84-90. [doi: 10.1145/3065386]
    [9] Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafher J, Lee D, Petkovie D, Steele D, Yanker P. Query by image and video content: The QBIC system. Computer, 1995, 28(9): 23-32. [doi: 10.1109/2.410146]
    [10] Hampapur A, Gupta A, Horowitz B, Shu CF, Fuller C, Bach JR, Gorkani M, Jain RC. Virage video engine. In: Proc. of the Storage and Retrieval for Image and Video Databases V: Int’l Society for Optics and Photonics. San Jose: SPIE, 1997. 188-198. [doi: 10. 1117/12.263407]
    [11] Pentland A, Picard RW, Sclaroff S. Photobook: Content-based manipulation of image databases. Int’l Journal of Computer Vision, 1996, 18(3): 233-254. [doi: 10.1117/12.171786]
    [12] Minka T. An image database browser that learns from user interaction [Ph.D. Thesis]. Cambridge: Massachusetts Institute of Technology, 1996.
    [13] Smith JR, Chang SF. VisualSEEk: A fully automated content-based image query system. In: Proc. of the 4th ACM Int’l Conf. on Multimedia. New York: ACM, 1997. 87-98. [doi: 10.1145/244130.244151]
    [14] Petkovic M, Jonker W. A framework for video modelling. In: Proc. of the Int’l Conf. on Applied Informatics. Innsbruck: Springer, 2000.
    [15] Aguierre STG, Davenport G. The stratification system: A design environment for random access video. In: Proc. of the Int’l Workshop on Network and Operating System Support for Digital Audio and Video. Cambridge: ACM, 1992. 250-261.
    [16] Jiang H, Elmagarmid AK. Spatial and temporal content-based acces to hypervideo databases. VLDB Journal, 1998, 7(4): 226-238. [doi: 10.1007/s007780050066]
    [17] Jordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science, 2015, 349(6245): 255-260. [doi: 10. 1016/j.compbiomed.2019.02.017]
    [18] Lecun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436-444. [doi: 10.1038/nature14539]
    [19] Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks, 2015, 61: 85-117. [doi: 10.1016/j.neunet.2014. 09.003]
    [20] Lu F, Liu CH, Huang CY, Yang Y, Xie Y, Liu CX. Overview on deep learning-based object detection. Computer Systems & Applications, 2021, 30(3): 1-13(in Chinese with English abstract). http://www.c-s-a.org.cn/1003-3254/7839.html [doi: 10.15888/ j. cnki.csa.007839]
    [21] Zhou FY, Jin LP, Dong J. Review of convolutional neural network. Chinese Journal of Computers, 2017, 40(6): 1229-1251(in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2017.01229]
    [22] Dubey SR. A decade survey of content based image retrieval using deep learning. IEEE Trans. on Circuits and Systems for Video Technology, 2021, 8215(c): 1-17. [doi: 10.1109/TCSVT.2021.3080920]
    [23] Wang Z, Yan M, Liu S, Chen JJ, Zhang DD, Wu Z, Chen X. Survey on testing of deep neural networks. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5): 1255-1275(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5951.htm [doi: 10. 13328/j.cnki.jos.005951]
    [24] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 1-9. [doi: 10.1109/ CVPR.2015.7298594]
    [25] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770-778. [doi: 10.1109/CVPR.2016.90]
    [26] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 779-788. [doi: 10.1109/CVPR.2016.91]
    [27] Song J, Xiao L, Lian ZC, Cai ZY, Jiang GP. Overview and prospect of deep learning for image segmentation in digital pathology. Ruan Jian Xue Bao/Journal of Software, 2021, 32(5): 1427-1460(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6205.htm [doi: 10.13328/j.cnki.jos.006205]
    [28] Abadi M, Agarwal A, Barham P, Brevdo E, Chen ZF, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia YQ, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mane D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viegas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng XQ. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv: 1603.04467, 2016.
    [29] Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: Convolutional architecture for fast feature embedding. In: Proc. of the 22nd ACM Int’l Conf. on Multimedia. Orlando: ACM, 2014. 675-678. [doi: 10.1145/2647868. 2654889]
    [30] Taigman Y, Yang M, Ranzato M, Wolf L. Deepface: Closing the gap to human-level performance in face verification. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1701-1708. [doi: 10.1109/CVPR.2014.220]
    [31] Schroff F, Kalenichenko D, Philbin J. Facenet: A unified embedding for face recognition and clustering. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 815-823. [doi: 10.1109/CVPR.2015.7298682]
    [32] Kim S, Dalmia S, Metze F. Gated embeddings in end-to-end speech recognition for conversational-context fusion. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 1131-1141. [doi: 10.18653/v1/P19- 1107]
    [33] Hosseini-Kivanani N, Vasquez-Correa JC, Stede M, Nöth E. Automated cross-language intelligibility analysis of Parkinson’s disease patients using speech recognition technologies. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop. Florence: ACL, 2019. 74-80. [doi: 10.18653/v1/P19-2010]
    [34] Luong MT, Pham H, Manning CD. Effective approaches to attention-based neural machine translation. In: Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015. 1412-1421. [doi: 10.18653/v1/D15-1166]
    [35] Wang Z, Tan Y, Zhang M. Graph-based recommendation on social networks. In: Proc. of the 12th Asia-Pacific Web Conf. (APWeb 2010). Busan: IEEE, 2010. 116-122. [doi: 10.1109/APWeb.2010.60]
    [36] Huang LW, Jiang BT, Lv SY, Liu YB, Li DY. Survey on deep learing based recommender systems. Chinese Journal of Computers, 2018, 41(7): 1619-1647(in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2018.01619]
    [37] Sun ZJ, Xue L, Xu YM, Wang Z. Overview of deep learning. Application Research of Computers, 2012, 29(8): 2806-2810(in Chinese with English abstract). [doi: 10.3969/j.issn.1001-3695.2012.08.002]
    [38] Wiegand T, Sullivan GJ, Bjontegaard G, Luthra A. Overview of the H.264/AVC video coding standard. IEEE Trans. on Circuits and Systems for Video Technology, 2003, 13(7): 560-576. [doi: 10.1109/TCSVT.2003.815165]
    [39] Sullivan GJ, Ohm JR, Han WJ, Wiegand T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. on Circuits and Systems for Video Technology, 2012, 22(12): 1649-1668. [doi: 10.1109/TCSVT.2012.2221191]
    [40] Jia CM, Ma HC, Yang WH, Ren WQ, Pan JS, Liu D, Liu JY, Ma SW. Video processing and compression technologies. Journal of Image and Graphics, 2021, 26(6): 1179-1200(in Chinese with English abstract). http://kns.cnki.net/kcms/detail/detail.aspx?FileName=ZGTB202106001&DbName=CJFQ2021
    [41] AMBER Alert, U.S. Department of justice. http://www.amberalert.gov/faqs.htm
    [42] SR 520 Bridge Tolling, WA. https://www.wsdot.wa.gov/Tolling/520/default.htm
    [43] Kang D, Emmons J, Abuzaid F, Bailis P, Zaharia M. NoScope: Optimizing neural network queries over video at scale. arXiv: 1703.02529v3, 2017.
    [44] Kang D, Bailis P, Zaharia M. Blazelt: Optimizing declarative aggregation and limit queries for neural network based video analytics. Proc. of the VLDB Endowment, 2019, 13(4): 533-546. [doi: 10.14778/3372716.3372725]
    [45] Anderson MR, Cafarella M, Ros G, Wenisch TF. Physical representation-based predicate optimization for a visual analytics database. In: Proc. of the 35th IEEE Int’l Conf. on Data Engineering. Macau SAR: IEEE, 2019. 1466-1477. [doi: 10.1109/ICDE. 2019.00132]
    [46] Hsieh K, Ananthanarayanan G, Bodik P, Venkataraman S, Bahl P, Philipose M, Gibbons PB, Mutlu O. Focus: Querying large video datasets with low latency and low cost. In: Proc. of the 13th Symp. on Operating Systems Design and Implementation. Carlsbad: USENIX, 2018. 269-286. https://www.usenix.org/system/files/osdi18-hsieh.pdf
    [47] Wang J, Balazinska M. Deluceva: Delta-based neural network inference for fast video analytics. In: Proc. of the 32nd Int’l Conf. on Scientific and Statistical Database Management. Vienna: IEEE, 2020. 1-12. [doi: 10.1145/3400903.3400930]
    [48] Suprem A, Arulraj J, Pu C, Ferreira J. ODIN: Automated drift detection and recovery in video analytics. Proc. of the VLDB Endowment, 2020, 13(11): 2453-2465. [doi: 10.14778/3407790.3407837]
    [49] Kang D, Mathur A, Veeramacheneni T, Bailis P, Zaharia M. Jointly optimizing preprocessing and inference for DNN-based visual analytics. Proc. of the VLDB Endowment, 2020, 14(2): 87-100. [doi: 10.14778/3425879.3425881]
    [50] Bastani F, He S, Balasingam A, Gopalakrishnan K, Alizadeh M, Balakrishnan H, Cafarella M, Kraska T, Madden S. MIRIS: Fast object track queries in video. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1907-1921. [doi: 10.1145/3318464.3389692]
    [51] Jiang J, Ananthanarayanan G, Bodik P, Sen S, Stoica I. Chameleon: Scalable adaptation of video analytics. In: Proc. of the 2018 Conf. of the ACM Special Interest Group on Data Communication. Budapest: ACM, 2018. 253-266. [doi: 10.1145/3230543. 3230574]
    [52] Zhang H, Ananthanarayanan G, Bodik P, Philipose M, Bahl P, Freedman MJ. Live video analytics at scale with approximation and delay-tolerance. In: Proc. of the 14th Symp. on Networked Systems Design and Implementation. Boston: USENIX, 2017. 377-392. https://www.usenix.org/system/files/conference/nsdi17/nsdi17-zhang.pdf
    [53] Makhzani A, Frey B. Pixelgan autoencoders. In: Proc. of the Advances in Neural Information Processing Systems 30: Annual Conf. Neural Information Processing Systems 2017. Long Beach: MIT, 2017. 1975-1985. https://arxiv.org/pdf/1706.00531.pdf
    [54] Cao F, Estert M, Qian W, Zhou AY. Density-based clustering over an evolving data stream with noise. In: Proc. of the 2006 SIAM Int’l Conf. on Data Mining. Bethesda: Society for Industrial and Applied Mathematics, 2006. 328-339. [doi: 10.1137/1. 9781611972764.29]
    [55] O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R. Streaming-data algorithms for high-quality clustering. In: Proc. of the 18th Int’l Conf. on Data Engineering. San Jose: IEEE, 2002. 685-694. [doi: 10.1109/ICDE.2002.994785]
    [56] Cropley J. Top video surveillance trends for 2016. Technical Report, 2016. https://www.scati.com/en/news/top-video-surveillancetrends-for-2016_326.html
    [57] Bastani F, Moll O, Madden S. VAAS: Video analytics at scale. Proc. of the VLDB Endowment, 2020, 13(12): 2877-2880. [doi: 10. 14778/3415478.3415498]
    [58] Haynes B, Daum M, He D, Mazumdar A, Balazinska M, Cheung A, Ceze L. VSS: A storage system for video analytics. In: Proc. of the 2021 ACM SIGMOD Int’l Conf. on Management of Data. Xi’an: ACM, 2021. 685-696. [doi: 10.1145/3448016.3459242]
    [59] Haynes B, Mazumdar A, Balazinska M, Ceze L, Cheung A. LightDB: A DBMS for virtual reality video. Proc. of the VLDB Endowment, 2018, 11(10). [doi: 10.14778/3231751.3231768]
    [60] Mazumdar A, Haynes B, Balazinska M, Ceze L, Cheung A, Oskin M. Perceptual compression for video storage and processing systems. In: Proc. of the ACM Symp. on Cloud Computing. Santa Cruz: ACM, 2019. 179-192. [doi: 10.1145/3357223.3362725]
    [61] Xu T, Botelho LM, Lin FX. VStore: A data store for analytics on large videos. In: Proc. of the 14th EuroSys Conf. 2019. Dresden: ACM, 2019. 1-17. [doi: 10.1145/3302424.3303971]
    [62] Daum M, Haynes B, He D, Mazumdar A, Balazinska M. TASM: A tile-based storage manager for video analytics. In: Proc. of the 37th IEEE Int’l Conf. on Data Engineering. Chania: IEEE, 2021. 1775-1786. [doi: 10.1109/icde51399.2021.00156]
    [63] Haynes B, Minyaylov A, Balazinska M, Ceze L, Cheung A. Visualcloud demonstration: A DBMS for virtual reality. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 1615-1618. [doi: 10.1145/3035918.3058734]
    [64] Rashid ZN, Zebari SRM, Sharif KH, Jacksi K. Distributed cloud computing and distributed parallel computing: A review. In: Proc. of the 2018 Int’l Conf. on Advanced Science and Engineering. Berlin: Springer, 2018. 167-172. [doi: 10.1109/ICOASE.2018. 8548937]
    [65] Sahni J, Vidyarthi DP. Heterogeneity-aware elastic scaling of streaming applications on cloud platforms. The Journal of Supercomputing, 2021, 1-28. [doi: 10.1007/s11227-021-03692-w]
    附中文参考文献:
    [1] 思科. 皆字节时代: 趋势与分析. 2017. https://www.cisco.com/c/en/us/solutions/executive-perspectives/annual-internet-report/index.html
    [2] 操顺德, 华宇, 冯丹, 孙园园, 左鹏飞. 面向海量高清视频数据的高性能分布式存储系统. 软件学报, 2017, 28(8): 1999-2009. http://www.jos.org.cn/1000-9825/5203.htm [doi: 10.13328/j.cnki.jos.005203]
    [6] 贾世杰, 胡斯平, 杨明珠, 刘舒婷. 基于Dense_YOLO的室内目标异常检测. 大连交通大学学报, 2019, 40(3): 102-107. [doi: 10.13291/j.cnki.djdxac.2019.03.020]
    [20] 陆峰, 刘华海, 黄长缨, 杨艳, 谢禹, 刘财喜. 基于深度学习的目标检测技术综述. 计算机系统应用, 2021, 30(3): 1-13. http:// www.c-s-a.org.cn/1003-3254/7839.html [doi: 10.15888/j.cnki.csa.007839]
    [21] 周飞燕, 金林鹏, 董军. 卷积神经网络研究综述. 计算机学报, 2017, 40(6): 1229-1251. [doi: 10.11897/SP.J.1016.2017.01229]
    [23] 王赞, 闫明, 刘爽, 陈俊洁, 张栋迪, 吴卓, 陈翔. 深度神经网络测试研究综述. 软件学报, 2020, 31(5): 1255-1275. http://www.jos.org.cn/1000-9825/5951.htm [doi: 10.13328/j.cnki.jos.005951]
    [27] 宋杰, 肖亮, 练智超, 蔡子贇, 蒋国平. 基于深度学习的数字病理图像分割综述与展望. 软件学报, 2021, 32(5): 1427-1460. http://www.jos.org.cn/1000-9825/6205.htm [doi: 10.13328/j.cnki.jos.006205]
    [36] 黄立威, 江碧涛, 吕守业, 刘艳博, 李德毅. 基于深度学习的推荐系统研究综述. 计算机学报, 2018, 41(7): 1619-1647. [doi: 10.11897/SP.J.1016.2018.01619]
    [37] 孙志军, 薛磊, 许阳明, 王正. 深度学习研究综述. 计算机应用研究, 2012, 29(8): 2806-2810. [doi: 10.3969/j.issn.1001-3695. 2012.08.002]
    [40] 贾川民, 马海川, 杨文瀚, 任文琦, 潘金山, 刘东, 刘家瑛, 马思伟. 视频处理与压缩技术. 中国图像图形学报, 2021, 26(6): 1179-1200. http://kns.cnki.net/kcms/detail/detail.aspx?FileName=ZGTB202106001&DbName=CJFQ2021
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

孟令睿,丁光耀,徐辰,钱卫宁,周傲英.基于深度学习的新型视频分析系统综述.软件学报,2022,33(10):3635-3655

复制
分享
文章指标
  • 点击次数:2843
  • 下载次数: 6265
  • HTML阅读次数: 4659
  • 引用次数: 0
历史
  • 收稿日期:2021-07-20
  • 最后修改日期:2021-08-30
  • 在线发布日期: 2022-02-22
  • 出版日期: 2022-10-06
文章二维码
您是第19705252位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号