Efficient Sample Retrieval Techniques for Multimodal Model Training

doi:10.13328/j.cnki.jos.007073

微信服务号

微信订阅号

2025-4-6- 5

Home > Archive>Volume 35, Issue 3, 2024 >1125-1139. DOI:10.13328/j.cnki.jos.007073

PDF HTML XML Export Cite reminder

Efficient Sample Retrieval Techniques for Multimodal Model Training
DOI:
                        10.13328/j.cnki.jos.007073
                    
Author:
                        TANG XiuTANG Xiu
College of Software, Zhejiang University, Ningbo 315103, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU SaiWU Sai
College of Software, Zhejiang University, Ningbo 315103, China;College of Computer Science and Technology and College of Software, Zhejiang University, Hangzhou 310027, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HOU JieHOU Jie
College of Software, Zhejiang University, Ningbo 315103, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN GangCHEN Gang
College of Software, Zhejiang University, Ningbo 315103, China;College of Computer Science and Technology and College of Software, Zhejiang University, Hangzhou 310027, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [30]

Related [20]

Cited by

Materials

Comments

Abstract:

Training multimodal models in deep learning often requires a large amount of high-quality annotated data from diverse modalities such as images, text, and audio. However, acquiring such data in large quantities can be challenging and costly. Active learning has emerged as a powerful paradigm to address this issue by selectively annotating the most informative samples, thereby reducing annotation costs and improving model performance. However, existing active learning methods encounter limitations in terms of inefficient data scanning and costly maintenance when dealing with large-scale updates. To overcome these challenges, this study proposes a novel approach called So-CBI (semi-ordered class boundary index) that efficiently retrieves samples for multimodal model training. So-CBI incorporates inter-class boundary perception and a semi-ordered indexing structure to minimize maintenance costs and enhance retrieval efficiency. Experimental evaluations on various datasets demonstrate the effectiveness of So-CBI in the context of active learning.

Key words:multimodal model training;active learning;sample retrieval

Reference

[1] Yin C, Menglin J, Tsung-Yi L, et al. Class-balanced loss based on effective number of samples. In:Proc. of the CVPR. 2019. 9268-9277.

[2] Du PF, Li XY, Gao YL. Survey on multimodal visual language representation learning. Ruan Jian Xue Bao/Journal of Software, 2021, 32(2):327-348(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6125.htm[doi:10.13328/j.cnki.jos.006125]

[3] Zhao WZ, Ma HF, Li ZQ, Shi ZZ. Efficiently active learning for semi-supervised document clustering. Ruan Jian Xue Bao/Journal of Software, 2012, 23(6):1486-1499(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4073.htm[doi:10.3724/SP.J.1001.2012.04073]

[4] Xie Y, Tomizuka M, Zhan W. Towards general and efficient active learning. arXiv:10.48550, 2021.

[5] Bengar J, Weijer J, Fuentes L, et al. Class-balanced active learning for image classification. In:Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. 2022. 1536-1545.

[6] Emam Z, Chu H, Chiang P, et al. Active learning at the ImageNet scale. arXiv:2111.12880, 2021.

[7] Dan W, Yi S. A new active labeling method for deep learning. In:Proc. of the Int'l Joint Conf. on Neural Networks (IJCNN). 2014. 112-119.

[8] Claude ES. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 2001, 5(1):3-55.

[9] Jordan TA, Zhang CCh, Akshay K, et al. Deep batch active learning by diverse, uncertain gradient lower bounds. arXiv:1906. 03671, 2019.

[10] Daniel G, Shai SS. Discriminative active learning. arXiv:1907.06347, 2019.

[11] Bengar JZ, Joost VDW, Fuentes LL, et al. Class-balanced active learning for image classification. In:Proc. of the IEEE/CVF Winter Conf. on Applications of Computer Vision. 2021. 1536-1545.

[12] Tang X, Wu S, Chen G, et al. A learning to tune framework for LSH. In:Proc. of the Int'l Conf. on Data Engineering (ICDE). 2021. 2201-2206.

[13] Gordo A, Almazan J, Revaud J, et al. End-to-end learning of deep visual representations for image retrieval. Int'l Journal of Computer Vision, 2017, 124(2):237-254.

[14] Girdhar R, Ramanan D. Attentional pooling for action recognition. arXiv:1711.01467, 2017.

[15] Fukunaga K, Narendra PM. A branch and bound algorithm for computing k-nearest neighbors. IEEE Trans. on Computers, 1975, 100(7):750-753.

[16] Muja M, Lowe DG. Fast approximate nearest neighbors with automatic algorithm configuration. In:Proc. of the VISAPP. 2009. 331-340.

[17] Jégou H, Douze M, Schmid C. Product quantization for nearest neighbor search. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2010, 33(1):117-128.

[18] Zhang H, Li F, Liu S, et al. DINO:DETR with improved DeNoising anchor boxes for end-to-end object detection. arXiv:2203. 03605, 2022.

[19] Malkov YA, Yashunin DA. Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018, 42(4):824-836.

[20] Zhang J, Wu S, Tan Z, et al. S3:A scalable in-memory skip-list index for key-value store. Proc. of the VLDB Endowment, 2019, 12(12):2183-2194.

[21] Workman M, Hoffman C. An evaluation of the role the Internet site Petfinder plays in cat adoptions. Journal of Applied Animal Welfare Science, 2015, 18(4):388-397.

[22] Kiela D, Firooz H, Mohan A, et al. The hateful memes challenge:Detecting hate speech in multimodal memes. In:Proc. of the Advances in Neural Information Processing Systems. 2020. 2611-2624.

[23] Krishna R, Zhu Y, Groth O, et al. Visual genome:Connecting language and vision using crowdsourced dense image annotations. Int'l Journal of Computer Vision, 2017, 123:32-73.

[24] Dong C, Loy CC, He KM, et al. Learning a deep convolutional network for image super-resolution. In:Proc. of the 13th European Conf. on Computer Vision (ECCV 2014). Springer, 2014. 184-199.

[25] Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision. In:Proc. of the Int'l Conf. on Machine Learning. 2021. 8748-8763.

[26] Odusami M, Maskeliūnas R, Damaševičius R, et al. Analysis of features of Alzheimer's disease:Detection of early stage from functional brain changes in magnetic resonance images using a finetuned ResNet18 network. Diagnostics, 2021, 11(6):Article No. 1071.

[27] Le Y, Yang X. Tiny ImageNet Visual Recognition Challenge. CS 231N, 2015, 7(7):Article No. 3.

附中文参考文献:

[2] 杜鹏飞, 李小勇, 高雅丽. 多模态视觉语言表征学习研究综述. 软件学报, 2021, 32(2):327-348. http://www.jos.org.cn/1000-9825/6125.htm[doi:10.13328/j.cnki.jos.006125]

[3] 赵卫中, 马慧芳, 李志清, 史忠植. 一种结合主动学习的半监督文档聚类算法. 软件学报, 2012, 23(6):1486-1499. http://www.jos.org.cn/1000-9825/4073.htm[doi:10.3724/SP.J.1001.2012.04073]

Get Citation

唐秀,伍赛,侯捷,陈刚.面向多模态模型训练的高效样本检索技术.软件学报,2024,35(3):1125-1139

Copy

Article Metrics

Abstract:930
PDF: 3675
HTML: 1609
Cited by: 0

History

Received:July 17,2023
Revised:September 05,2023
Adopted:
Online: November 08,2023
Published: March 06,2024

You are the first2033294Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History