Efficient Sample Retrieval Techniques for Multimodal Model Training
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [30]
  • |
  • Related [20]
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Training multimodal models in deep learning often requires a large amount of high-quality annotated data from diverse modalities such as images, text, and audio. However, acquiring such data in large quantities can be challenging and costly. Active learning has emerged as a powerful paradigm to address this issue by selectively annotating the most informative samples, thereby reducing annotation costs and improving model performance. However, existing active learning methods encounter limitations in terms of inefficient data scanning and costly maintenance when dealing with large-scale updates. To overcome these challenges, this study proposes a novel approach called So-CBI (semi-ordered class boundary index) that efficiently retrieves samples for multimodal model training. So-CBI incorporates inter-class boundary perception and a semi-ordered indexing structure to minimize maintenance costs and enhance retrieval efficiency. Experimental evaluations on various datasets demonstrate the effectiveness of So-CBI in the context of active learning.

    Reference
    [1] Yin C, Menglin J, Tsung-Yi L, et al. Class-balanced loss based on effective number of samples. In:Proc. of the CVPR. 2019. 9268-9277.
    [2] Du PF, Li XY, Gao YL. Survey on multimodal visual language representation learning. Ruan Jian Xue Bao/Journal of Software, 2021, 32(2):327-348(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6125.htm[doi:10.13328/j.cnki.jos.006125]
    [3] Zhao WZ, Ma HF, Li ZQ, Shi ZZ. Efficiently active learning for semi-supervised document clustering. Ruan Jian Xue Bao/Journal of Software, 2012, 23(6):1486-1499(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4073.htm[doi:10.3724/SP.J.1001.2012.04073]
    [4] Xie Y, Tomizuka M, Zhan W. Towards general and efficient active learning. arXiv:10.48550, 2021.
    [5] Bengar J, Weijer J, Fuentes L, et al. Class-balanced active learning for image classification. In:Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. 2022. 1536-1545.
    [6] Emam Z, Chu H, Chiang P, et al. Active learning at the ImageNet scale. arXiv:2111.12880, 2021.
    [7] Dan W, Yi S. A new active labeling method for deep learning. In:Proc. of the Int'l Joint Conf. on Neural Networks (IJCNN). 2014. 112-119.
    [8] Claude ES. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1):3-55.
    [9] Jordan TA, Zhang CCh, Akshay K, et al. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv:1906. 03671, 2019.
    [10] Daniel G, Shai SS. Discriminative active learning. arXiv:1907.06347, 2019.
    [11] Bengar JZ, Joost VDW, Fuentes LL, et al. Class-balanced active learning for image classification. In:Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. 2021. 1536-1545.
    [12] Tang X, Wu S, Chen G, et al. A learning to tune framework for LSH. In:Proc. of the Int'l Conf. on Data Engineering (ICDE). 2021. 2201-2206.
    [13] Gordo A, Almazan J, Revaud J, et al. End-to-end learning of deep visual representations for image retrieval. Int'l Journal of Computer Vision, 2017, 124(2):237-254.
    [14] Girdhar R, Ramanan D. Attentional pooling for action recognition. arXiv:1711.01467, 2017.
    [15] Fukunaga K, Narendra PM. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. on Computers, 1975, 100(7):750-753.
    [16] Muja M, Lowe DG. Fast approximate nearest neighbors with automatic algorithm configuration. In:Proc. of the VISAPP. 2009. 331-340.
    [17] Jégou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010, 33(1):117-128.
    [18] Zhang H, Li F, Liu S, et al. DINO:DETR with improved DeNoising anchor boxes for end-to-end object detection. arXiv:2203. 03605, 2022.
    [19] Malkov YA, Yashunin DA. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018, 42(4):824-836.
    [20] Zhang J, Wu S, Tan Z, et al. S3:A scalable in-memory skip-list index for key-value store. Proc. of the VLDB Endowment, 2019, 12(12):2183-2194.
    [21] Workman M, Hoffman C. An evaluation of the role the Internet site Petfinder plays in cat adoptions. Journal of Applied Animal Welfare Science, 2015, 18(4):388-397.
    [22] Kiela D, Firooz H, Mohan A, et al. The hateful memes challenge:Detecting hate speech in multimodal memes. In:Proc. of the Advances in Neural Information Processing Systems. 2020. 2611-2624.
    [23] Krishna R, Zhu Y, Groth O, et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations. Int'l Journal of Computer Vision, 2017, 123:32-73.
    [24] Dong C, Loy CC, He KM, et al. Learning a deep convolutional network for image super-resolution. In:Proc. of the 13th European Conf. on Computer Vision (ECCV 2014). Springer, 2014. 184-199.
    [25] Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision. In:Proc. of the Int'l Conf. on Machine Learning. 2021. 8748-8763.
    [26] Odusami M, Maskeliūnas R, Damaševičius R, et al. Analysis of features of Alzheimer's disease:Detection of early stage from functional brain changes in magnetic resonance images using a finetuned ResNet18 network. Diagnostics, 2021, 11(6):Article No. 1071.
    [27] Le Y, Yang X. Tiny ImageNet Visual Recognition Challenge. CS 231N, 2015, 7(7):Article No. 3.
    附中文参考文献:
    [2] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述. 软件学报, 2021, 32(2):327-348. http://www.jos.org.cn/1000-9825/6125.htm[doi:10.13328/j.cnki.jos.006125]
    [3] 赵卫中, 马慧芳, 李志清, 史忠植. 一种结合主动学习的半监督文档聚类算法. 软件学报, 2012, 23(6):1486-1499. http://www.jos.org.cn/1000-9825/4073.htm[doi:10.3724/SP.J.1001.2012.04073]
    Cited by
Get Citation

唐秀,伍赛,侯捷,陈刚.面向多模态模型训练的高效样本检索技术.软件学报,2024,35(3):1125-1139

Copy
Share
Article Metrics
  • Abstract:930
  • PDF: 3675
  • HTML: 1609
  • Cited by: 0
History
  • Received:July 17,2023
  • Revised:September 05,2023
  • Online: November 08,2023
  • Published: March 06,2024
You are the first2033294Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063