基于宽容训练和隐私保护的快速监控视频检索模型
作者:
作者简介:

覃浩(1998-),男,硕士生,主要研究领域为自然语言处理,视觉语言预训练模型,模型压缩,视频检索;张若非(1974-),男,博士,教授,博士生导师,主要研究领域为机器学习,数据挖掘,自然语言处理,多模态内容表示和理解;王平辉(1984-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为机器学习与数据挖掘,自然语言处理,移动互联网安全;覃遵颖(1985-),女,高级工程师,主要研究领域为机器学习,数据挖掘.

通讯作者:

王平辉,phwang@mail.xjtu.edu.cn

基金项目:

国家自然科学基金(61902305,61922067);深圳基础研究资助项目(JCYJ20170816100819428);教育部-中国移动“人工智能”项目(MCM20190701)


Fast Surveillance Video Retrieval Model Based on Tolerant Training and Privacy Protection
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [52]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    监控视频关键帧检索和属性查找在交通、安防、教育等领域具有众多应用场景,应用深度学习模型处理海量视频数据在一定程度上缓解了人力消耗,但是存在隐私泄露、计算资源消耗大、时间长等特点.基于上述场景,提出了一个面向大规模监控视频的安全、快速的视频检索模型.具体地,根据云端算力大、监控摄像头算力规模小的特点,在云端部署重量级模型,并使用所提出的宽容训练策略对其进行定制化知识蒸馏,将蒸馏后的轻量级模型部署在监控摄像头内,同时使用局部加密算法对图像敏感部分进行加密,结合云端TEE技术和用户授权机制,在极低资源消耗的情况下实现隐私保护.通过合理控制蒸馏策略的“容忍度”,能够较好地平衡摄像头视频输入阶段和云端检索阶段的耗时,在保证极高准确率的前提下,保证极低的检索时延.相比于传统检索方法,该模型具有安全高效、可伸缩、低延时的特点.实验结果显示,在多个公开数据集上,该模型相比于传统检索方法提供9x-133x的加速.

    Abstract:

    Surveillance video keyframe retrieval and attribute search have many application scenarios in traffic, security, education and other fields. The application of deep learning model to process massive video data to a certain extent alleviates manpower consumption, but it is characterized by privacy disclosure, large consumption of computing resources and long time. Based on the above scenarios, this study proposes a safe and fast video retrieval model for mass surveillance video. In particular, according to the characteristics of large computing power in the cloud and small scale of computing power in the surveillance camera, heavyweight model is deployed in the cloud, and the proposed tolerance training strategy is used for customized knowledge distillation, the distilled lightweight model is then deployed inside a surveillance camera, at the same time using local encryption algorithm to encrypt sensitive to image part, combined with cloud TEE technology and user authorization mechanism, privacy protection can be achieved with very low resource consumption. By reasonably controlling the "tolerance" of distillation strategy, the time-consuming of camera video input stage and cloud retrieval stage can be balanced, and extremely low retrieval delay is ensured on the premise of extremely high accuracy. Compared with traditional retrieval methods, the proposed model has the characteristics of security, efficiency, scalability and low latency. Experimental results show that the proposed model provides 9×-133× acceleration compared with traditional retrieval methods on multiple open data sets.

    参考文献
    [1] He K, Zhang X, Ren S, Sun J.Deep residual learning for image recognition.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2016.770-778.
    [2] Redmon J, Divvala S, Girshick R, Farhadi A.You only look once:Unified, real-time object detection.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2016.779-788.
    [3] Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition.In:Proc.of the 3rd Int'l Conf.on Learning Representations (ICLR 2015).2015.1-14.
    [4] Jia Z, Maggioni M, Staiger B, Scarpazza DP.Dissecting the nvidia Volta GPU architecture via microbenchmarking.arXiv:1804.06826, 2018.
    [5] Devlin J, Chang MW, Lee K, Toutanova K.BERT:Pre-training of deep bidirectional transformers for language understanding.In:Proc.of the Conf.of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL HLT 2019), Vol.1.2019.4171-4186.
    [6] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I.Attention is all you need.In:Proc.of the Advances in Neural Information Processing Systems.2017.5998-6008.
    [7] Zhou D, Frémont V, Quost B, Dai Y, Li H.Moving object detection and segmentation in urban environments from a moving platform.Image and Vision Computing, 2017, 68:76-87.
    [8] Zhang K, Zhang Z, Li Z, Qiao Y.Joint face detection and alignment using multitask cascaded convolutional networks.IEEE Signal Processing Letters, 2016, 23(10):1499-1503.
    [9] Dufaux F, Ebrahimi T.A framework for the validation of privacy protection solutions in video surveillance.In:Proc.of the IEEE Int'l Conf.on Multimedia and Expo (ICME 2010).2010.66-71.
    [10] Upmanyu M, Namboodiri AM, Srinathan K, Jawahar CV.Efficient privacy preserving video surveillance.In:Proc.of the IEEE Int'l Conf.on Computer Vision.2009.1639-1646.
    [11] Ahn J, Shim HJ, Jeon B, et al.Digital video scrambling method using intra prediction mode.In:Advances in Multimedia Information Processing, 2005.386-393.
    [12] Liu Z, Li X.Motion vector encryption in multimedia streaming.In:Proc.of the 10th Int'l Multimedia Modelling Conf.(MMM 2004).2004.64-71.
    [13] Zhou J, Liang Z, Chen Y, Au OC.Security analysis of multimedia encryption schemes based on multiple Huffman table.IEEE Signal Processing Letters, 2007, 14(3):201-204.
    [14] Zhang W, Cheung SCS, Chen M.Hiding privacy information in video surveillance system.In:Proc.of the Int'l Conf.on Image Processing (ICIP), Vol.3.2005.II-868.
    [15] Park J, Kim DS, Lim H.Privacy-preserving reinforcement learning using homomorphic encryption in cloud computing infrastructures.IEEE Access, 2020, 8:203564-203579.
    [16] Liu J, Tian Y, Zhou Y, Xiao Y, Ansari N.Privacy preserving distributed data mining based on secure multi-party computation.Computer Communications, 2020, 153:208-216.
    [17] Hunt T, Zhu Z, Xu Y, Peter S, Witchel E.Ryoan:A distributed sandbox for untrusted computation on secret data.ACM Trans.on Computer Systems, 2018, 35(4):1-32.
    [18] Baumann A, Peinado M, Hunt G.Shielding applications from an untrusted cloud with haven.ACM Trans.on Computer Systems, 2015, 33(3):1-26.
    [19] Bengio Y, Louradour J, Collobert R, Weston J.Curriculum learning.In:Proc.of the 26th Annual Int'l Conf.on Machine Learning.2009.41-48.
    [20] Guo S, Huang W, Zhang H, et al.CurriculumNet:Weakly supervised learning from large-scale Web images.In:Proc.of the European Conf.on Computer Vision (ECCV).2018.135-150.
    [21] Jiang L, Meng D, Mitamura T, Hauptmann AG.Easy samples first:Self-paced reranking for zero-example multimedia search.In:Proc.of the ACM Conf.on Multimedia (MM 2014).2014.547-556.
    [22] Platanios EA, Stretcu O, Neubig G, Poczos B, Mitchell TM.Competence-based curriculum learning for neural machine translation.In:Proc.of the Conf.of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL HLT 2019), Vol.1.2019.1162-1172.
    [23] Tay Y, Wang S, Tuan LA, Fu J, Phan MC, Yuan X, Rao J, Hui SC, Zhang A.Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives.In:Proc.of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).2020.4922-4931.
    [24] El-Bouri R, Eyre D, Watkinson P, Zhu T, Clifton DA.Student-teacher curriculum learning via reinforcement learning:Predicting hospital inpatient admission location.In:Proc.of the 37th Int'l Conf.on Machine Learning (ICML 2020).2020.2848-2857.
    [25] Florensa C, Held D, Wulfmeier M, Zhang M, Abbeel P.Reverse curriculum generation for reinforcement learning.In:Proc.of the Conf.on Robot Learning.2017.482-495.
    [26] Narvekar S, Sinapov J, Stone P.Autonomous task sequencing for customized curriculum design in reinforcement learning.In:Proc.of the Int'l Joint Conf.on Artificial Intelligence.2017.2536-2542.
    [27] Qu M, Tang J, Han J.Curriculum learning for heterogeneous star network embedding via deep reinforcement learning.In:Proc.of the 11th ACM Int'l Conf.on Web Search and Data Mining (WSDM 2018).2018.468-476.
    [28] Gong C, Yang J, Tao D.Multi-modal curriculum learning over graphs.ACM Trans.on Intelligent Systems and Technology, 2019, 10(4):1-25.
    [29] Guo Y, Chen Y, Zheng Y, Zhao P, Chen J, Huang J, Tan M.Breaking the curse of space explosion:Towards efficient NAS with curriculum search.In:Proc.of the Int'l Conf.on Machine Learning.2020.3822-3831.
    [30] Hinton G, Vinyals O, Dean J, et al.Distilling the knowledge in a neural network.arXiv:1503.02531, 2015.
    [31] Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y.FitNets:Hints for thin deep nets.In:Proc.of the 3rd Int'l Conf.on Learning Representations (ICLR 2015), Vol.2.2015.1-13.
    [32] Zagoruyko S, Komodakis N.Paying more attention to attention:Improving the performance of convolutional neural networks via attention transfer.In:Proc.of the 5th Int'l Conf.on Learning Representations (ICLR 2017).2017.1-13.
    [33] Kim J, Park Y, Kim G, Hwang SJ.SplitNet:Learning to semantically split deep networks for parameter reduction and model parallelization.In:Proc.of the 34th Int'l Conf.on Machine Learning (ICML 2017), Vol.4.2017.1866-1874.
    [34] Lowe DG.Object recognition from local scale-invariant features.In:Proc.of the IEEE Int'l Conf.on Computer Vision, Vol.2.1999.1150-1157.
    [35] Dalal N, Triggs B.Histograms of oriented gradients for human detection.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition (CVPR 2005), Vol.1.2005.886-893.
    [36] Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D.Object detection with discriminatively trained part-based models.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.
    [37] Girshick R, Donahue J, Darrell T, Malik J.Rich feature hierarchies for accurate object detection and semantic segmentation.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2014.580-587.
    [38] Girshick R.Fast R-CNN.In:Proc.of the IEEE Int'l Conf.on Computer Vision.2015.1440-1448.
    [39] Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A.The pascal visual object classes challenge:A retrospective.Int'l Journal of Computer Vision, 2015, 111(1):98-136.
    [40] Ren S, He K, Girshick R, Sun J.Faster R-CNN:Towards real-time object detection with region proposal networks.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
    [41] He K, Gkioxari G, Dollár P, Girshick R.Mask R-CNN.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2020, 42(2):386-397.
    [42] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC.SSD:Single shot multibox detector.In:Proc.of the European Conf.on Computer Vision (ECCV 2016).2016.21-37.
    [43] Redmon J, Farhadi A.Yolov3:An incremental improvement.arXiv:1804.02767, 2018.
    [44] Bochkovskiy A, Wang CY, Liao HYM.Yolov4:Optimal speed and accuracy of object detection.arXiv:2004.10934, 2020.
    [45] LeCun Y, Bottou L, Bengio Y, Haffner P.Gradient-based learning applied to document recognition.Proc.of the IEEE, 1998, 86(11):2278-2324.
    [46] Krizhevsky A, Sutskever I, Hinton GE.ImageNet classification with deep convolutional neural networks.Communications of the ACM, 2017, 60(6):84-90.
    [47] Elsken T, Metzen JH, Hutter F.Neural architecture search:A survey.Journal of Machine Learning Research, 2019, 20(1):1997-2017.
    [48] Johnson J, Douze M, Jegou H.Billion-scale similarity search with GPUs.IEEE Trans.on Big Data, 2021, 7(3):535-547.
    [49] Li E, Zhou Z, Chen X.Edge intelligence:On-demand deep learning model co-inference with device-edge synergy.In:Proc.of the Workshop on Mobile Edge Communications (MECOMM 2018).2018.31-36.
    [50] Grulich PM, Nawab F.Collaborative edge and cloud neural networks for real-time video processing.Proc.of the VLDB Endowment, 2018, 11(12):2046-2049.
    [51] Hsieh K, Ananthanarayanan G, Bodik P, Venkataraman S, Bahl P, Philipose M, Gibbons PB, Mutlu O.Focus:Querying large video datasets with low latency and low cost.In:Proc.of the 13th USENIX Symp.on Operating Systems Design and Implementation (OSDI 2018).2018.269-286.
    [52] Kang D, Emmons J, Abuzaid F, Bailis P, Zaharia M.NoScope:Optimizing deep CNN-based queries over video streams at scale.Proc.of the VLDB Endowment, 2017, 10(11):1586-1597.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

覃浩,王平辉,张若非,覃遵颖.基于宽容训练和隐私保护的快速监控视频检索模型.软件学报,2023,34(3):1292-1309

复制
分享
文章指标
  • 点击次数:1209
  • 下载次数: 3902
  • HTML阅读次数: 3064
  • 引用次数: 0
历史
  • 收稿日期:2022-05-15
  • 最后修改日期:2022-09-07
  • 在线发布日期: 2022-10-26
  • 出版日期: 2023-03-06
文章二维码
您是第20262803位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号