基于掩码信息熵迁移的场景文本检测知识蒸馏
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP18

基金项目:

厦门市自然科学基金面上项目(3502Z20227180); 国家自然科学基金面上项目(62173282)


Knowledge Distillation for Scene Text Detection via Mask Information Entropy Transfer
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    自然场景文本检测的主流方法大多使用复杂且层数较多的网络来提升检测精度, 需要较大的计算量和存储空间, 难以部署到计算资源有限的嵌入式设备上. 知识蒸馏可通过引入与教师网络相关的软目标信息, 辅助训练轻量级的学生网络, 实现模型压缩. 然而, 现有的知识蒸馏方法主要为图像分类任务而设计, 提取教师网络输出的软化概率分布作为知识, 其携带的信息量与类别数目高度相关, 当应用于文本检测的二分类任务时会存在信息量不足的问题. 为此, 针对场景文本检测问题, 定义一种新的信息熵知识, 并以此为基础提出基于掩码信息熵迁移的知识蒸馏方法(mask entropy transfer, MaskET). MaskET在传统蒸馏方法的基础上引入信息熵知识, 以增加迁移到学生网络的信息量; 同时, 为了消除图像中背景信息的干扰, MaskET通过添加掩码的方法, 仅提取文本区域的信息熵知识. 在ICDAR 2013、ICDAR 2015、TD500、TD-TR、Total-Text和CASIA-10K这6个公开标准数据集上的实验表明, MaskET方法优于基线模型和其他知识蒸馏方法. 例如, MaskET在CASIA-10K 数据集上将基于MobileNetV3的DBNet的F1得分从65.3%提高到67.2%.

    Abstract:

    Mainstream methods for scene text detection often use complex networks with plenty of layers to improve detection accuracy, which requires high computational costs and large storage space, thus making them difficult to deploy on embedded devices with limited computing resources. Knowledge distillation assists in training lightweight student networks by introducing soft target information related to teacher networks, thus achieving model compression. However, existing knowledge distillation methods are mostly designed for image classification and extract the soft probability distributions from teacher networks as knowledge. The amount of information carried by such methods is highly correlated with the number of categories, resulting in insufficient information when directly applied to the binary classification task in text detection. To address the problem of scene text detection, this study introduces a novel concept of information entropy and proposes a knowledge distillation method based on mask entropy transfer (MaskET). MaskET combines information entropy with traditional knowledge distillation methods to increase the amount of information transferred to student networks. Moreover, to eliminate the interference of background information in images, MaskET only extracts the knowledge within the text area by adding mask operations. Experiments conducted on six public benchmark datasets, namely ICDAR 2013, ICDAR 2015, TD500, TD-TR, Total-Text and CASIA-10K, show that MaskET outperforms the baseline model and other knowledge distillation methods. For example, MaskET improves the F1 score of MobileNetV3-based DBNet from 65.3% to 67.2% on the CASIA-10K dataset.

    参考文献
    相似文献
    引证文献
引用本文

陈建炜,沈英龙,杨帆,赖永炫.基于掩码信息熵迁移的场景文本检测知识蒸馏.软件学报,,():1-20

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-03-28
  • 最后修改日期:2024-01-11
  • 录用日期:
  • 在线发布日期: 2025-01-15
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号