基于数据合成和度量学习的台标检测与识别

doi:10.13328/j.cnki.jos.006619

微信服务号

微信订阅号

2025年4月5日 20:26 星期六

首页 > 过刊浏览>2022年第33卷第9期 >3180-3194. DOI:10.13328/j.cnki.jos.006619

PDF HTML阅读 XML下载导出引用引用提醒

基于数据合成和度量学习的台标检测与识别
DOI:
                        10.13328/j.cnki.jos.006619
                    
CSTR:
                        
                    
作者:
                        张广朋张广朋
北京工业大学 信息学部, 北京 100124
在期刊界中查找
在百度中查找
在本站中查找
张冬明张冬明
国家计算机网络应急技术处理协调中心, 北京 100029
在期刊界中查找
在百度中查找
在本站中查找
张菁张菁
北京工业大学 信息学部, 北京 100124
在期刊界中查找
在百度中查找
在本站中查找
王川宁王川宁
北京工业大学 信息学部, 北京 100124
在期刊界中查找
在百度中查找
在本站中查找
王立冬王立冬
北京广播电视台, 北京 100022
在期刊界中查找
在百度中查找
在本站中查找
邹学强邹学强
国家计算机网络应急技术处理协调中心, 北京 100029
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:张广朋(1997－), 男, 硕士, 主要研究领域为人工智能系统设计与集成;张冬明(1977－), 男, 博士, 研究员, 博士生导师, CCF专业会员, 主要研究领域为视频编码, 多媒体内容检索;张菁(1975－), 女, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为多媒体内容分析与处理;王川宁(1997－), 男, 硕士, 主要研究领域为人工智能系统设计与应用;王立冬(1967－), 女, 教授级高级工程师, 主要研究领域广播电视工程技术, 视音频信号处理, 媒体网络;邹学强(1978－), 男, 博士, 高级工程师, 主要研究领域为网络安全
通讯作者:张冬明, E-mail: zhdm@cert.org.cn
中图分类号:TP391
基金项目:国家重点研发计划(2018YFB080402); 国家自然科学基金(61672495, 61971016); 北京市自然科学基金-市教委联合资助项目(KZ201910005007)

TV Logo Detection and Recognition Based on Data Synthesis and Metric Learning

Author:

ZHANG Guang-Peng
ZHANG Guang-Peng
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Dong-Ming
ZHANG Dong-Ming
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Jing
ZHANG Jing
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Chuan-Ning
WANG Chuan-Ning
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Li-Dong
WANG Li-Dong
Beijing Radio & Television Station, Beijing 100022, China
在期刊界中查找
在百度中查找
在本站中查找
ZOU Xue-Qiang
ZOU Xue-Qiang
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [32]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

台标是视频的重要语义信息, 其检测与识别面临类别多、结构复杂、区域小、信息量低、背景干扰大等难题. 为提高模型的泛化能力, 提出将台标图像叠加到背景图像中合成台标数据, 来构建训练数据集. 进一步, 提出两阶段可伸缩台标检测与识别(scalable logo detection and recognition, SLDR)方法, 其采用batch-hard度量学习方法快速训练匹配模型, 确定台标类别. SLDR的检测与识别分离机制使得其可将检测目标扩展到未知类别. 实验结果表明, 合成数据可以有效提升模型的泛化能力和检测精度. 实验亦显示SLDR方法在不更新检测模型的情况下, 即可获得与端到端模型相当的精度.

关键词:数据合成;度量学习;可伸缩;台标检测和识别

Abstract:

A TV logo represents important semantic information of videos. However, its detection and recognition are faced with many problems, including varied categories, complex structures, limited areas, low information content, and severe background disturbance. To improve the generalization ability of the detection model, this study proposes synthesizing TV logo data to construct a training dataset by superimposing TV logo images on background images. Further, a two-stage scalable logo detection and recognition (SLDR) method is put forward, which uses the batch-hard metric learning method to rapidly train the matching model and determine the category of TV logos. In addition, the detection targets can be expanded to unknown categories due to the separation mechanism of detection and recognition in SLDR. The experimental results reveal that synthetic data can effectively improve the generalization ability and detection precision of models, and the SLDR method can achieve comparable precision with the end-to-end model without updating the detection model.

Key words:data synthesis;metric learning;scalable;TV logo detection and recognition

参考文献

[1] 徐佳宇, 张冬明, 靳国庆, 包秀国, 袁庆升, 张勇东. PNET: 像素级台标识别网络. 计算机辅助设计与图形学学报, 2018, 30(10): 1878–1889. [doi: 10.3724/SP.J.1089.2018.16944]

Xu JY, Zhang DM, Jin GQ, Bao XG, Yuan QS, Zhang YD. PNET: Pixel-wise TV logo recognition network. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(10): 1878–1889 (in Chinese with English abstract). [doi: 10.3724/SP.J.1089.2018.16944]

[2] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering. In: Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 815–823.

[3] Wang M, Deng WH. Deep face recognition: A survey. Neurocomputing, 2021, 429: 215–244. [doi: 10.1016/j.neucom.2020.10.081]

[4] Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. arXiv: 1703.07737, 2017.

[5] Wu D, Zheng SJ, Zhang XP, Yuan CA, Cheng F, Zhao Y, Lin YJ, Zhao ZQ, Jiang YL, Huang DS. Deep learning-based methods for person re-identification: A comprehensive review. Neurocomputing, 2019, 337: 354–371. [doi: 10.1016/j.neucom.2019.01.079]

[6] Tüzkö A, Herrmann C, Manger D, Beyerer J. Open set logo detection and retrieval. arXiv: 1710.10891, 2017.

[7] Bastan M, Wu HY, Cao T, Kota B, Tek M. Large scale open-set deep logo detection. arXiv: 1911.07440, 2019.

[8] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 779–788.

[9] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. SSD: Single shot multibox detector. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 21–37.

[10] Lin TY, Goyal P, Girshick R, He KM, Dollár P. Focal loss for dense object detection. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2999–3007.

[11] He JM, Xie YX, Luan XD, Niu X, Zhang X. A TV logo detection and recognition method based on SURF feature and bag-of-words model. In: Proc. of the 2nd IEEE Int’l Conf. on Computer and Communications. Chengdu: IEEE, 2016. 370–374.

[12] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.

[13] Girshick R. Fast R-CNN. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1440–1448.

[14] Ren SQ, He KM, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/tpami.2016.2577031]

[15] Taigman Y, Yang M, Ranzato MA, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1701–1708.

[16] Fehérvári I, Appalaraju S. Scalable logo recognition using proxies. In: Proc. of the 2019 IEEE Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2019. 715–725.

[17] Bhunia AK, Bhunia AK, Ghose S, Das A, Roy PP, Pal U. A deep one-shot network for query-based logo retrieval. Pattern Recognition, 2019, 96: 106965. [doi: 10.1016/j.patcog.2019.106965]

[18] Tian Z, Shen CH, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9626–9635.

[19] Yu Z, Yu J, Xiang CC, Zhao Z, Tian Q, Tao DC. Rethinking diversified and discriminative proposal generation for visual grounding. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: IJCAI, 2018. 1114–1120.

[20] Zhang L, Xia T, Zhang YD, Li JT. Hollow TV logo detection. In: Proc. of the 18th IEEE Int’l Conf. on Image Processing. Brussels: IEEE, 2011. 3581–3584.

[21] Su H, Zhu XT, Gong SG. Deep learning logo detection with data expansion by synthesising context. In: Proc. of the 2017 IEEE Winter Conf. on Applications of Computer Vision. Santa Rosa: IEEE, 2017. 530–539.

[22] Su H, Zhu XT, Gong SG. Open logo detection challenge. In: Proc. of the British Machine Vision Conf. Newcastle: BMVA, 2018. 16.

[23] Montserrat DM, Lin Q, Allebach J, Delp EJ. Logo detection and recognition with synthetic images. Electronic Imaging, 2018, 30(10): 3371–3377. [doi: 10.2352/issn.2470-1173.2018.10.imawm-337](查阅所有网上资料, 未能确认文献类型, 请联系作者确认文献类型及格式是否正确)

[24] Jiang YC, Gao C, Ji LX, Wu YC. Context-based synthetic data for logo recognition. In: Proc. of the 2019 Int’l Conf. on Artificial Intelligence and Advanced Manufacturing. Dublin: IEEE, 2019. 60–65.

[25] Romberg S, Pueyo LG, Lienhart R, Van Zwol R. Scalable logo recognition in real-world images. In: Proc. of the 1st ACM Int’l Conf. on Multimedia Retrieval. Trento: ACM Press, 2011. 25.

[26] He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2980–2988.

[27] Lee Y, Park J. CenterMask: Real-time anchor-free instance segmentation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 13903–13912.

[28] Yu J, Yao JH, Zhang J, Yu Z, Tao DC. SPRNet: Single-pixel reconstruction for one-stage instance segmentation. IEEE Transactions on Cybernetics, 2021, 51(4): 1731–1742. [doi: 10.1109/TCYB.2020.2969046]

[29] Bolya D, Zhou C, Xiao FY, Lee YJ. YOLACT: Real-time instance segmentation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9156–9165.

[30] Wu CY, Manmatha R, Smola AJ, Krähenbühl P. Sampling matters in deep embedding learning. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2859–2867.

[31] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020.

引用本文

张广朋,张冬明,张菁,王川宁,王立冬,邹学强.基于数据合成和度量学习的台标检测与识别.软件学报,2022,33(9):3180-3194

复制

文章指标

点击次数:1306
下载次数: 4484
HTML阅读次数: 3621
引用次数: 0

历史

收稿日期:2021-06-23
最后修改日期:2021-08-15
录用日期:
在线发布日期: 2022-02-22
出版日期: 2022-09-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码