基于最大差异化竞争的通用视觉难样本挖掘
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391

基金项目:

国家自然科学基金(62461028, U24A20220, 62562034, 62402201); 江西省自然科学基金(20243BCE51139, 20232BAB202001, 20252BAC240197)


General Visual Hard Sample Mining Based on Maximum Discrepancy Competition
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    近年来, 深度学习发展迅速, 在计算机视觉研究中取得了巨大的成功. 在发展过程中, 模型的测试和改进方向是研究者们关注的核心. 然而, 视觉模型比较范式是封闭数据集上训练(验证)和测试, 然后通过测试结果和真实标签的偏差来获得难样本, 用于反馈模型的问题和改进方向. 这种方式存在的问题包括: 1) 数据集中少量的数据无法真实反映模型的问题; 2)模型预训练等一些操作可能导致数据泄露, 因此展现的性能可能有偏差. 提出基于最大差异化竞争的通用视觉难样本挖掘算法, 自动挖掘真实的难样本, 用于指出模型的问题. 所提算法遵循“通过模型博弈来比较模型”的思想, 联合视觉任务内和多视觉任务间预测结果的“不相似性”优化挖掘潜在的难样本, 旨在以可控的、高效的方式为计算机视觉领域提供新的测试基准. 实验证明, 所构建的测试基准GHS-CV相比于单视觉任务的难样本挖掘(语义分割难样本集SS-C, 显著目标检测难样本集SOD-C)更能暴露出模型的缺陷. 其中, 相对DeepLabv3+模型在SS-C数据集上的性能, DeepLabv3+在GHS-CV数据集上的mIoU 下降了约 20%; 相对VST模型在SOD-C 数据集上的性能, VST在GHS-CV数据集上的Fβ下降了约 36%.

    Abstract:

    In recent years, deep learning has developed rapidly and achieved significant success in computer vision, with model evaluation and improvement remaining central concerns for researchers. However, the commonly used model comparison paradigm relies on training (or validation) and testing on closed datasets, and then identifies hard samples based on discrepancies between predictions and ground-truth labels, which provide feedback on model weaknesses and directions for improvement. This paradigm suffers from two major limitations: 1) the limited size and coverage of datasets often fail to faithfully reflect the true weaknesses of models; 2) procedures such as pretraining may introduce data leakage, resulting in potential biases in the demonstrated performance. To address these issues, this study proposes a general visual hard sample mining algorithm based on maximum discrepancy competition, which automatically mines real hard samples to reveal models’ deficiencies. The proposed algorithm follows the principle of “comparing models through competition” and optimizes the discovery of potential hard samples by jointly exploiting the intra-task and cross-task prediction dissimilarities, aiming to provide new test benchmarks for the field of computer vision in a controllable and efficient manner. Experimental results demonstrate that the constructed benchmark named GHS-CV exposes models’ weaknesses more effectively than single-task hard sample benchmarks (i.e., the semantic segmentation hard sample set SS-C and the salient object detection hard sample set SOD-C). Specifically, compared to DeepLabv3+ on SS-C, the mIoU drops by about 20% on GHS-CV, while compared to VST on SOD-C, the Fβ decreases by about 36%.

    参考文献
    相似文献
    引证文献
引用本文

鄢杰斌,祝文涛,刘学林,陈俊杰,钱峰,方玉明.基于最大差异化竞争的通用视觉难样本挖掘.软件学报,,():1-19

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-02-24
  • 最后修改日期:2025-07-27
  • 录用日期:
  • 在线发布日期: 2026-02-11
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号