基于数据合成和度量学习的台标检测与识别
作者:
作者简介:

张广朋(1997-), 男, 硕士, 主要研究领域为人工智能系统设计与集成;张冬明(1977-), 男, 博士, 研究员, 博士生导师, CCF专业会员, 主要研究领域为视频编码, 多媒体内容检索;张菁(1975-), 女, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为多媒体内容分析与处理;王川宁(1997-), 男, 硕士, 主要研究领域为人工智能系统设计与应用;王立冬(1967-), 女, 教授级高级工程师, 主要研究领域广播电视工程技术, 视音频信号处理, 媒体网络;邹学强(1978-), 男, 博士, 高级工程师, 主要研究领域为网络安全

通讯作者:

张冬明, E-mail: zhdm@cert.org.cn

中图分类号:

TP391

基金项目:

国家重点研发计划(2018YFB080402); 国家自然科学基金(61672495, 61971016); 北京市自然科学基金-市教委联合资助项目(KZ201910005007)


TV Logo Detection and Recognition Based on Data Synthesis and Metric Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [32]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    台标是视频的重要语义信息, 其检测与识别面临类别多、结构复杂、区域小、信息量低、背景干扰大等难题. 为提高模型的泛化能力, 提出将台标图像叠加到背景图像中合成台标数据, 来构建训练数据集. 进一步, 提出两阶段可伸缩台标检测与识别(scalable logo detection and recognition, SLDR)方法, 其采用batch-hard度量学习方法快速训练匹配模型, 确定台标类别. SLDR的检测与识别分离机制使得其可将检测目标扩展到未知类别. 实验结果表明, 合成数据可以有效提升模型的泛化能力和检测精度. 实验亦显示SLDR方法在不更新检测模型的情况下, 即可获得与端到端模型相当的精度.

    Abstract:

    A TV logo represents important semantic information of videos. However, its detection and recognition are faced with many problems, including varied categories, complex structures, limited areas, low information content, and severe background disturbance. To improve the generalization ability of the detection model, this study proposes synthesizing TV logo data to construct a training dataset by superimposing TV logo images on background images. Further, a two-stage scalable logo detection and recognition (SLDR) method is put forward, which uses the batch-hard metric learning method to rapidly train the matching model and determine the category of TV logos. In addition, the detection targets can be expanded to unknown categories due to the separation mechanism of detection and recognition in SLDR. The experimental results reveal that synthetic data can effectively improve the generalization ability and detection precision of models, and the SLDR method can achieve comparable precision with the end-to-end model without updating the detection model.

    参考文献
    [1] 徐佳宇, 张冬明, 靳国庆, 包秀国, 袁庆升, 张勇东. PNET: 像素级台标识别网络. 计算机辅助设计与图形学学报, 2018, 30(10): 1878–1889. [doi: 10.3724/SP.J.1089.2018.16944]
    Xu JY, Zhang DM, Jin GQ, Bao XG, Yuan QS, Zhang YD. PNET: Pixel-wise TV logo recognition network. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(10): 1878–1889 (in Chinese with English abstract). [doi: 10.3724/SP.J.1089.2018.16944]
    [2] Schroff F, Kalenichenko D, Philbin J. FaceNet: A unified embedding for face recognition and clustering. In: Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition. Boston: IEEE, 2015. 815–823.
    [3] Wang M, Deng WH. Deep face recognition: A survey. Neurocomputing, 2021, 429: 215–244. [doi: 10.1016/j.neucom.2020.10.081]
    [4] Hermans A, Beyer L, Leibe B. In defense of the triplet loss for person re-identification. arXiv: 1703.07737, 2017.
    [5] Wu D, Zheng SJ, Zhang XP, Yuan CA, Cheng F, Zhao Y, Lin YJ, Zhao ZQ, Jiang YL, Huang DS. Deep learning-based methods for person re-identification: A comprehensive review. Neurocomputing, 2019, 337: 354–371. [doi: 10.1016/j.neucom.2019.01.079]
    [6] Tüzkö A, Herrmann C, Manger D, Beyerer J. Open set logo detection and retrieval. arXiv: 1710.10891, 2017.
    [7] Bastan M, Wu HY, Cao T, Kota B, Tek M. Large scale open-set deep logo detection. arXiv: 1911.07440, 2019.
    [8] Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: Unified, real-time object detection. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 779–788.
    [9] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC. SSD: Single shot multibox detector. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 21–37.
    [10] Lin TY, Goyal P, Girshick R, He KM, Dollár P. Focal loss for dense object detection. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2999–3007.
    [11] He JM, Xie YX, Luan XD, Niu X, Zhang X. A TV logo detection and recognition method based on SURF feature and bag-of-words model. In: Proc. of the 2nd IEEE Int’l Conf. on Computer and Communications. Chengdu: IEEE, 2016. 370–374.
    [12] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 580–587.
    [13] Girshick R. Fast R-CNN. In: Proc. of the 2015 IEEE Int’l Conf. on Computer Vision. Santiago: IEEE, 2015. 1440–1448.
    [14] Ren SQ, He KM, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149. [doi: 10.1109/tpami.2016.2577031]
    [15] Taigman Y, Yang M, Ranzato MA, Wolf L. DeepFace: Closing the gap to human-level performance in face verification. In: Proc. of the 2014 IEEE Conf. on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014. 1701–1708.
    [16] Fehérvári I, Appalaraju S. Scalable logo recognition using proxies. In: Proc. of the 2019 IEEE Winter Conf. on Applications of Computer Vision. Waikoloa: IEEE, 2019. 715–725.
    [17] Bhunia AK, Bhunia AK, Ghose S, Das A, Roy PP, Pal U. A deep one-shot network for query-based logo retrieval. Pattern Recognition, 2019, 96: 106965. [doi: 10.1016/j.patcog.2019.106965]
    [18] Tian Z, Shen CH, Chen H, He T. Fcos: Fully convolutional one-stage object detection. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9626–9635.
    [19] Yu Z, Yu J, Xiang CC, Zhao Z, Tian Q, Tao DC. Rethinking diversified and discriminative proposal generation for visual grounding. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: IJCAI, 2018. 1114–1120.
    [20] Zhang L, Xia T, Zhang YD, Li JT. Hollow TV logo detection. In: Proc. of the 18th IEEE Int’l Conf. on Image Processing. Brussels: IEEE, 2011. 3581–3584.
    [21] Su H, Zhu XT, Gong SG. Deep learning logo detection with data expansion by synthesising context. In: Proc. of the 2017 IEEE Winter Conf. on Applications of Computer Vision. Santa Rosa: IEEE, 2017. 530–539.
    [22] Su H, Zhu XT, Gong SG. Open logo detection challenge. In: Proc. of the British Machine Vision Conf. Newcastle: BMVA, 2018. 16.
    [23] Montserrat DM, Lin Q, Allebach J, Delp EJ. Logo detection and recognition with synthetic images. Electronic Imaging, 2018, 30(10): 3371–3377. [doi: 10.2352/issn.2470-1173.2018.10.imawm-337](查阅所有网上资料, 未能确认文献类型, 请联系作者确认文献类型及格式是否正确)
    [24] Jiang YC, Gao C, Ji LX, Wu YC. Context-based synthetic data for logo recognition. In: Proc. of the 2019 Int’l Conf. on Artificial Intelligence and Advanced Manufacturing. Dublin: IEEE, 2019. 60–65.
    [25] Romberg S, Pueyo LG, Lienhart R, Van Zwol R. Scalable logo recognition in real-world images. In: Proc. of the 1st ACM Int’l Conf. on Multimedia Retrieval. Trento: ACM Press, 2011. 25.
    [26] He KM, Gkioxari G, Dollár P, Girshick R. Mask R-CNN. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2980–2988.
    [27] Lee Y, Park J. CenterMask: Real-time anchor-free instance segmentation. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 13903–13912.
    [28] Yu J, Yao JH, Zhang J, Yu Z, Tao DC. SPRNet: Single-pixel reconstruction for one-stage instance segmentation. IEEE Transactions on Cybernetics, 2021, 51(4): 1731–1742. [doi: 10.1109/TCYB.2020.2969046]
    [29] Bolya D, Zhou C, Xiao FY, Lee YJ. YOLACT: Real-time instance segmentation. In: Proc. of the 2019 IEEE/CVF Int’l Conf. on Computer Vision. Seoul: IEEE, 2019. 9156–9165.
    [30] Wu CY, Manmatha R, Smola AJ, Krähenbühl P. Sampling matters in deep embedding learning. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision. Venice: IEEE, 2017. 2859–2867.
    [31] Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal speed and accuracy of object detection. arXiv: 2004.10934, 2020.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张广朋,张冬明,张菁,王川宁,王立冬,邹学强.基于数据合成和度量学习的台标检测与识别.软件学报,2022,33(9):3180-3194

复制
分享
文章指标
  • 点击次数:1305
  • 下载次数: 4472
  • HTML阅读次数: 3589
  • 引用次数: 0
历史
  • 收稿日期:2021-06-23
  • 最后修改日期:2021-08-15
  • 在线发布日期: 2022-02-22
  • 出版日期: 2022-09-06
文章二维码
您是第19728360位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号