基于宽容训练和隐私保护的快速监控视频检索模型

doi:10.13328/j.cnki.jos.006790

微信服务号

微信订阅号

2025年7月19日 10:40 星期六

首页 > 过刊浏览>2023年第34卷第3期 >1292-1309. DOI:10.13328/j.cnki.jos.006790

PDF HTML阅读 XML下载导出引用引用提醒

基于宽容训练和隐私保护的快速监控视频检索模型
DOI:
                        10.13328/j.cnki.jos.006790
                    
CSTR:
                        
                    
作者:
                        覃浩覃浩
西安交通大学 网络空间安全学院, 陕西 西安 710049;智能网络与网络安全教育部重点实验室(西安交通大学), 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
王平辉王平辉
西安交通大学 网络空间安全学院, 陕西 西安 710049;智能网络与网络安全教育部重点实验室(西安交通大学), 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
张若非张若非
西安交通大学 网络空间安全学院, 陕西 西安 710049;智能网络与网络安全教育部重点实验室(西安交通大学), 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找
覃遵颖覃遵颖
西安交通大学 软件学院, 陕西 西安 710049
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:覃浩(1998-),男,硕士生,主要研究领域为自然语言处理,视觉语言预训练模型,模型压缩,视频检索;张若非(1974-),男,博士,教授,博士生导师,主要研究领域为机器学习,数据挖掘,自然语言处理,多模态内容表示和理解;王平辉(1984-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为机器学习与数据挖掘,自然语言处理,移动互联网安全;覃遵颖(1985-),女,高级工程师,主要研究领域为机器学习,数据挖掘.
通讯作者:王平辉，phwang@mail.xjtu.edu.cn
中图分类号:
基金项目:国家自然科学基金（61902305，61922067）；深圳基础研究资助项目（JCYJ20170816100819428）；教育部-中国移动“人工智能”项目（MCM20190701）

Fast Surveillance Video Retrieval Model Based on Tolerant Training and Privacy Protection

Author:

QIN Hao
QIN Hao
School of Cyber Science and Engineering, Xian Jiaotong University, Xian 710049, China;Ministry of Education Key Laboratory for Intelligent Networks and Network Security (Xian Jiaotong University), Xian 710049, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Ping-Hui
WANG Ping-Hui
School of Cyber Science and Engineering, Xian Jiaotong University, Xian 710049, China;Ministry of Education Key Laboratory for Intelligent Networks and Network Security (Xian Jiaotong University), Xian 710049, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Ruo-Fei
ZHANG Ruo-Fei
School of Cyber Science and Engineering, Xian Jiaotong University, Xian 710049, China;Ministry of Education Key Laboratory for Intelligent Networks and Network Security (Xian Jiaotong University), Xian 710049, China
在期刊界中查找
在百度中查找
在本站中查找
QIN Zun-Ying
QIN Zun-Ying
School of Software Engineering, Xian Jiaotong University, Xian 710049, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [52]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

监控视频关键帧检索和属性查找在交通、安防、教育等领域具有众多应用场景，应用深度学习模型处理海量视频数据在一定程度上缓解了人力消耗，但是存在隐私泄露、计算资源消耗大、时间长等特点.基于上述场景，提出了一个面向大规模监控视频的安全、快速的视频检索模型.具体地，根据云端算力大、监控摄像头算力规模小的特点，在云端部署重量级模型，并使用所提出的宽容训练策略对其进行定制化知识蒸馏，将蒸馏后的轻量级模型部署在监控摄像头内，同时使用局部加密算法对图像敏感部分进行加密，结合云端TEE技术和用户授权机制，在极低资源消耗的情况下实现隐私保护.通过合理控制蒸馏策略的“容忍度”，能够较好地平衡摄像头视频输入阶段和云端检索阶段的耗时，在保证极高准确率的前提下，保证极低的检索时延.相比于传统检索方法，该模型具有安全高效、可伸缩、低延时的特点.实验结果显示，在多个公开数据集上，该模型相比于传统检索方法提供9x-133x的加速.

关键词:视频检索;隐私保护;知识蒸馏;课程学习

Abstract:

Surveillance video keyframe retrieval and attribute search have many application scenarios in traffic, security, education and other fields. The application of deep learning model to process massive video data to a certain extent alleviates manpower consumption, but it is characterized by privacy disclosure, large consumption of computing resources and long time. Based on the above scenarios, this study proposes a safe and fast video retrieval model for mass surveillance video. In particular, according to the characteristics of large computing power in the cloud and small scale of computing power in the surveillance camera, heavyweight model is deployed in the cloud, and the proposed tolerance training strategy is used for customized knowledge distillation, the distilled lightweight model is then deployed inside a surveillance camera, at the same time using local encryption algorithm to encrypt sensitive to image part, combined with cloud TEE technology and user authorization mechanism, privacy protection can be achieved with very low resource consumption. By reasonably controlling the "tolerance" of distillation strategy, the time-consuming of camera video input stage and cloud retrieval stage can be balanced, and extremely low retrieval delay is ensured on the premise of extremely high accuracy. Compared with traditional retrieval methods, the proposed model has the characteristics of security, efficiency, scalability and low latency. Experimental results show that the proposed model provides 9×-133× acceleration compared with traditional retrieval methods on multiple open data sets.

Key words:video retrieval;privacy protection;knowledge distillation;curriculum learning

参考文献

[1] He K, Zhang X, Ren S, Sun J.Deep residual learning for image recognition.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2016.770-778.

[2] Redmon J, Divvala S, Girshick R, Farhadi A.You only look once:Unified, real-time object detection.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2016.779-788.

[3] Simonyan K, Zisserman A.Very deep convolutional networks for large-scale image recognition.In:Proc.of the 3rd Int'l Conf.on Learning Representations (ICLR 2015).2015.1-14.

[4] Jia Z, Maggioni M, Staiger B, Scarpazza DP.Dissecting the nvidia Volta GPU architecture via microbenchmarking.arXiv:1804.06826, 2018.

[5] Devlin J, Chang MW, Lee K, Toutanova K.BERT:Pre-training of deep bidirectional transformers for language understanding.In:Proc.of the Conf.of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL HLT 2019), Vol.1.2019.4171-4186.

[6] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I.Attention is all you need.In:Proc.of the Advances in Neural Information Processing Systems.2017.5998-6008.

[7] Zhou D, Frémont V, Quost B, Dai Y, Li H.Moving object detection and segmentation in urban environments from a moving platform.Image and Vision Computing, 2017, 68:76-87.

[8] Zhang K, Zhang Z, Li Z, Qiao Y.Joint face detection and alignment using multitask cascaded convolutional networks.IEEE Signal Processing Letters, 2016, 23(10):1499-1503.

[9] Dufaux F, Ebrahimi T.A framework for the validation of privacy protection solutions in video surveillance.In:Proc.of the IEEE Int'l Conf.on Multimedia and Expo (ICME 2010).2010.66-71.

[10] Upmanyu M, Namboodiri AM, Srinathan K, Jawahar CV.Efficient privacy preserving video surveillance.In:Proc.of the IEEE Int'l Conf.on Computer Vision.2009.1639-1646.

[11] Ahn J, Shim HJ, Jeon B, et al.Digital video scrambling method using intra prediction mode.In:Advances in Multimedia Information Processing, 2005.386-393.

[12] Liu Z, Li X.Motion vector encryption in multimedia streaming.In:Proc.of the 10th Int'l Multimedia Modelling Conf.(MMM 2004).2004.64-71.

[13] Zhou J, Liang Z, Chen Y, Au OC.Security analysis of multimedia encryption schemes based on multiple Huffman table.IEEE Signal Processing Letters, 2007, 14(3):201-204.

[14] Zhang W, Cheung SCS, Chen M.Hiding privacy information in video surveillance system.In:Proc.of the Int'l Conf.on Image Processing (ICIP), Vol.3.2005.II-868.

[15] Park J, Kim DS, Lim H.Privacy-preserving reinforcement learning using homomorphic encryption in cloud computing infrastructures.IEEE Access, 2020, 8:203564-203579.

[16] Liu J, Tian Y, Zhou Y, Xiao Y, Ansari N.Privacy preserving distributed data mining based on secure multi-party computation.Computer Communications, 2020, 153:208-216.

[17] Hunt T, Zhu Z, Xu Y, Peter S, Witchel E.Ryoan:A distributed sandbox for untrusted computation on secret data.ACM Trans.on Computer Systems, 2018, 35(4):1-32.

[18] Baumann A, Peinado M, Hunt G.Shielding applications from an untrusted cloud with haven.ACM Trans.on Computer Systems, 2015, 33(3):1-26.

[19] Bengio Y, Louradour J, Collobert R, Weston J.Curriculum learning.In:Proc.of the 26th Annual Int'l Conf.on Machine Learning.2009.41-48.

[20] Guo S, Huang W, Zhang H, et al.CurriculumNet:Weakly supervised learning from large-scale Web images.In:Proc.of the European Conf.on Computer Vision (ECCV).2018.135-150.

[21] Jiang L, Meng D, Mitamura T, Hauptmann AG.Easy samples first:Self-paced reranking for zero-example multimedia search.In:Proc.of the ACM Conf.on Multimedia (MM 2014).2014.547-556.

[22] Platanios EA, Stretcu O, Neubig G, Poczos B, Mitchell TM.Competence-based curriculum learning for neural machine translation.In:Proc.of the Conf.of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (NAACL HLT 2019), Vol.1.2019.1162-1172.

[23] Tay Y, Wang S, Tuan LA, Fu J, Phan MC, Yuan X, Rao J, Hui SC, Zhang A.Simple and effective curriculum pointer-generator networks for reading comprehension over long narratives.In:Proc.of the 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).2020.4922-4931.

[24] El-Bouri R, Eyre D, Watkinson P, Zhu T, Clifton DA.Student-teacher curriculum learning via reinforcement learning:Predicting hospital inpatient admission location.In:Proc.of the 37th Int'l Conf.on Machine Learning (ICML 2020).2020.2848-2857.

[25] Florensa C, Held D, Wulfmeier M, Zhang M, Abbeel P.Reverse curriculum generation for reinforcement learning.In:Proc.of the Conf.on Robot Learning.2017.482-495.

[26] Narvekar S, Sinapov J, Stone P.Autonomous task sequencing for customized curriculum design in reinforcement learning.In:Proc.of the Int'l Joint Conf.on Artificial Intelligence.2017.2536-2542.

[27] Qu M, Tang J, Han J.Curriculum learning for heterogeneous star network embedding via deep reinforcement learning.In:Proc.of the 11th ACM Int'l Conf.on Web Search and Data Mining (WSDM 2018).2018.468-476.

[28] Gong C, Yang J, Tao D.Multi-modal curriculum learning over graphs.ACM Trans.on Intelligent Systems and Technology, 2019, 10(4):1-25.

[29] Guo Y, Chen Y, Zheng Y, Zhao P, Chen J, Huang J, Tan M.Breaking the curse of space explosion:Towards efficient NAS with curriculum search.In:Proc.of the Int'l Conf.on Machine Learning.2020.3822-3831.

[30] Hinton G, Vinyals O, Dean J, et al.Distilling the knowledge in a neural network.arXiv:1503.02531, 2015.

[31] Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y.FitNets:Hints for thin deep nets.In:Proc.of the 3rd Int'l Conf.on Learning Representations (ICLR 2015), Vol.2.2015.1-13.

[32] Zagoruyko S, Komodakis N.Paying more attention to attention:Improving the performance of convolutional neural networks via attention transfer.In:Proc.of the 5th Int'l Conf.on Learning Representations (ICLR 2017).2017.1-13.

[33] Kim J, Park Y, Kim G, Hwang SJ.SplitNet:Learning to semantically split deep networks for parameter reduction and model parallelization.In:Proc.of the 34th Int'l Conf.on Machine Learning (ICML 2017), Vol.4.2017.1866-1874.

[34] Lowe DG.Object recognition from local scale-invariant features.In:Proc.of the IEEE Int'l Conf.on Computer Vision, Vol.2.1999.1150-1157.

[35] Dalal N, Triggs B.Histograms of oriented gradients for human detection.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition (CVPR 2005), Vol.1.2005.886-893.

[36] Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D.Object detection with discriminatively trained part-based models.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2010, 32(9):1627-1645.

[37] Girshick R, Donahue J, Darrell T, Malik J.Rich feature hierarchies for accurate object detection and semantic segmentation.In:Proc.of the IEEE Computer Society Conf.on Computer Vision and Pattern Recognition.2014.580-587.

[38] Girshick R.Fast R-CNN.In:Proc.of the IEEE Int'l Conf.on Computer Vision.2015.1440-1448.

[39] Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A.The pascal visual object classes challenge:A retrospective.Int'l Journal of Computer Vision, 2015, 111(1):98-136.

[40] Ren S, He K, Girshick R, Sun J.Faster R-CNN:Towards real-time object detection with region proposal networks.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.

[41] He K, Gkioxari G, Dollár P, Girshick R.Mask R-CNN.IEEE Trans.on Pattern Analysis and Machine Intelligence, 2020, 42(2):386-397.

[42] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC.SSD:Single shot multibox detector.In:Proc.of the European Conf.on Computer Vision (ECCV 2016).2016.21-37.

[43] Redmon J, Farhadi A.Yolov3:An incremental improvement.arXiv:1804.02767, 2018.

[44] Bochkovskiy A, Wang CY, Liao HYM.Yolov4:Optimal speed and accuracy of object detection.arXiv:2004.10934, 2020.

[45] LeCun Y, Bottou L, Bengio Y, Haffner P.Gradient-based learning applied to document recognition.Proc.of the IEEE, 1998, 86(11):2278-2324.

[46] Krizhevsky A, Sutskever I, Hinton GE.ImageNet classification with deep convolutional neural networks.Communications of the ACM, 2017, 60(6):84-90.

[47] Elsken T, Metzen JH, Hutter F.Neural architecture search:A survey.Journal of Machine Learning Research, 2019, 20(1):1997-2017.

[48] Johnson J, Douze M, Jegou H.Billion-scale similarity search with GPUs.IEEE Trans.on Big Data, 2021, 7(3):535-547.

[49] Li E, Zhou Z, Chen X.Edge intelligence:On-demand deep learning model co-inference with device-edge synergy.In:Proc.of the Workshop on Mobile Edge Communications (MECOMM 2018).2018.31-36.

[50] Grulich PM, Nawab F.Collaborative edge and cloud neural networks for real-time video processing.Proc.of the VLDB Endowment, 2018, 11(12):2046-2049.

[51] Hsieh K, Ananthanarayanan G, Bodik P, Venkataraman S, Bahl P, Philipose M, Gibbons PB, Mutlu O.Focus:Querying large video datasets with low latency and low cost.In:Proc.of the 13th USENIX Symp.on Operating Systems Design and Implementation (OSDI 2018).2018.269-286.

[52] Kang D, Emmons J, Abuzaid F, Bailis P, Zaharia M.NoScope:Optimizing deep CNN-based queries over video streams at scale.Proc.of the VLDB Endowment, 2017, 10(11):1586-1597.

引用本文

覃浩,王平辉,张若非,覃遵颖.基于宽容训练和隐私保护的快速监控视频检索模型.软件学报,2023,34(3):1292-1309

复制

文章指标

点击次数:1209
下载次数: 3902
HTML阅读次数: 3064
引用次数: 0

历史

收稿日期:2022-05-15
最后修改日期:2022-09-07
录用日期:
在线发布日期: 2022-10-26
出版日期: 2023-03-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码