可靠多模态学习综述

doi:10.13328/j.cnki.jos.006167

微信服务号

微信订阅号

2025年8月7日 13:41 星期四

首页 > 过刊浏览>2021年第32卷第4期 >1067-1081. DOI:10.13328/j.cnki.jos.006167

PDF HTML阅读 XML下载导出引用引用提醒

可靠多模态学习综述
DOI:
                        10.13328/j.cnki.jos.006167
                    
CSTR:
                        
                    
作者:
                        杨杨杨杨
南京理工大学 计算机科学与工程学院, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找
詹德川詹德川
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
姜远姜远
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
熊辉熊辉
Rutgers Business School, Newark, NJ 07012, USA
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:杨杨(1991-),男,博士,教授,CCF专业会员,主要研究领域为机器学习,数据挖掘.
詹德川(1982-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为人工智能,机器学习,数据挖掘.
姜远(1976-),女,博士,教授,博士生导师,CCF专业会员,主要研究领域为人工智能,机器学习,数据挖掘.
熊辉(1972-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据挖掘,商业智能.
通讯作者:詹德川,E-mail:zhandc@nju.edu.cn
中图分类号:
基金项目:国家自然科学基金（61673201，62006118，61773198，61632004）；江苏省自然科学基金（BK20200460）；CCF-百度松果基金（CCF-BAIDU OF2020011）；百度TIC项目基金

Reliable Multi-modal Learning: A Survey

Author:

YANG Yang
YANG Yang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAN De-Chuan
ZHAN De-Chuan
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
JIANG Yuan
JIANG Yuan
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
XIONG Hui
XIONG Hui
Rutgers Business School, Newark, NJ 07012, USA
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61673201, 62006118, 61773198, 61632004); Natural Science Foundation of Jiangsu Province, China (BK20200460); CCF-BAIDU Songguo Foundation (CCF-BAIDU OF2020011); BAIDU TIC Foundation

摘要

图/表

访问统计

参考文献 [47]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

近年来，多模态学习逐步成为机器学习、数据挖掘领域的研究热点之一，并成功地应用于诸多现实场景中，如跨媒介搜索、多语言处理、辅助信息点击率预估等.传统多模态学习方法通常利用模态间的一致性或互补性设计相应的损失函数或正则化项进行联合训练，进而提升单模态及集成的性能.而在开放环境下，受数据缺失及噪声等因素的影响，多模态数据呈现不均衡性.具体表现为单模态信息不充分或缺失，从而导致“模态表示强弱不一致”“模态对齐关联不一致”两大挑战，而针对不均衡多模态数据直接利用传统的多模态方法甚至会退化单模态和集成的性能.针对这类问题，可靠多模态学习被提出并进行了广泛研究，系统地总结和分析了目前国内外学者针对可靠多模态学习取得的进展，并对未来研究可能面临的挑战进行展望.

关键词:不均衡多模态数据;模态表示强弱不一致;模态对齐关联不一致;可靠多模态学习

Abstract:

Recently, multi-modal learning is one of the important research fields of machine learning and data mining, and it has a wide range of practical applications, such as cross-media search, multi-language processing, auxiliary information click-through rate estimation, etc. Traditional multi-modal learning methods usually use the consistency or complementarity among modalities to design corresponding loss functions or regularization terms for joint training, thereby improving the single-modal and ensemble performance. However, in the open environment, affected by factors such as data missing and noise, multi-modal data is imbalanced, specifically manifested as insufficient or incomplete, resulting in “inconsistency modal feature representations” and “inconsistent modal alignment relationships”. Direct use of traditional multi-modal methods will even degrade single-modal and ensemble performance. To solve these problems, reliable multi-modal learning has been proposed and studied. This paper systematically summarizes and analyzes the progress made by domestic and international scholars on reliable multi-modal research, and the challenges that future research may face.

Key words:imbalanced multi-modal data;inconsistent modal feature representations;inconsistent modal alignment relationships;reliable multi-modal learning

参考文献

[1] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In:Proc. of the COLT. 1998. 92-100.

[2] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training. In:Proc. of the CIKM. 2000. 86-93.

[3] Yu S, Krishnapuram B, Rosales R, Rao RB. Bayesian co-training. In:Proc. of the NeurIPS. 2007. 1-7.

[4] Zhang ML, Zhou ZH. CoTrade:Confident co-training with data editing. IEEE TMC, 2011,Part B41(6):1612-1626.

[5] Sindhwani V, Niyogi P, Belkin M. A co-regularization approach to semi-supervised learning with multiple views. In:Proc. of the ICML Workshop. 2005.

[6] Farquhar J, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S. Two view learning:SVM-2K, theory and practice. In:Proc. of the NeurIPS. 2005. 355-362.

[7] Xia T, Tao D, Mei T, Zhang Y. Multiview spectral embedding. IEEE TSMC, 2010,40(6):1438-1446.

[8] Hotelling H. Relations between two sets of variates. Biometrika, 1936,28(3/4):321-377.

[9] Gonen M, Alpaydın E. Multiple kernel learning algorithms. JMLR, 2011,12:2211-2268.

[10] Wang X, Guo X, Lei Z, Zhang C, Li SZ. Exclusivity-consistency regularized multi-view subspace clustering. In:Proc. of the CVPR. 2017. 1-9.

[11] Wang W, Zhou ZH. A new analysis of co-training. In:Proc. of the ICML. 2010. 1135-1142.

[12] Sridharan K, Kakade SM. An information theoretic framework for multi-view learning. In:Proc of the COLT. 2008. 403-414.

[13] Dietterichl TG. Ensemble learning. In:Arbib MA, ed. The Handbook of Brain Theory and Neural Networks. Cambridge:MIT Press, 2002.

[14] Wei B, Pal C. Cross lingual adaptation:An experiment on sentiment classifications. In:Proc. of the ACL. 2010. 258-262.

[15] Muslea I, Minton S, Knoblock CA. Selective sampling with naive co-testing:Preliminary results. In:Proc. of the CRM Workshop. 2000.

[16] Wang W, Zhou ZH. Co-training with insufficient views. In:Proc. of the ACML. 2013. 467-482.

[17] Li S, Jiang Y, Zhou Z. Partial multi-view clustering. In:Proc. of the AAAI. 2014. 1968-1974.

[18] Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. Devise:A deep visual-semantic embedding model. In:Proc. of the NeurIPS. 2013. 2121-2129.

[19] Mulsea I, Minton S, Knoblock CA. Active learning with strong and weak views:A case study on wrapper induction. In:Proc. of the IJCAI. 2003. 415-420.

[20] Yang Y, Ye HJ, Zhan DC, Jiang Y. Auxiliary information regularized machine for multiple modality feature learning. In:Proc. of the IJCAI. 2015. 1033-1039.

[21] Iwata T, Yamada M. Multi-view anomaly detection via robust probabilistic latent variable models. In:Proc. of the NeurIPS. 2016. 1136-1144.

[22] Zhao HD, Fu Y. Dual-regularized multi-view outlier detection. In:Proc. of the IJCAI. 2015. 4077-4083.

[23] Yang Y, Wu YF, Zhan DC, Liu ZB, Jiang Y. Complex object classification:A multi-modal multi-instance multi-label deep network with optimal transport. In:Proc. of the KDD. 2018. 2594-2603.

[24] Baltrusaitis T, Ahuja C, Morency LP. Multimodal machine learning:A survey and taxonomy. PAMI, 2019,41(2):423-443.

[25] Ramachandram D, Taylor GW. Deep multimodal learning:A survey on recent advances and trends. In:Proc. of the SPM. 2017. 96-108.

[26] Sun SL. A survey of multi-view machine learning. Neural Computing and Applications, 2013,23(7-8):2031-2038.

[27] Balcan MF, Blum A, Ke Y. Co-training and expansion:Towards bridging theory and practice. In:Proc. of the NeurIPS. 2004. 89-96.

[28] Wang W, Zhou ZH. Analyzing co-training style algorithms. In:Proc. of the ECML. 2007. 454-465.

[29] Dasgupta S, Littman ML, McAllester D. PAC generalization bounds for co-training. In:Proc. of the NeurIPS. 2002. 375-382.

[30] Chen DD, Wang W, Gao W, Zhou ZH. Tri-net for semi-supervised deep learning. In:Proc. of the IJCAI. 2018. 2014-2020.

[31] Zhou ZH, Li M. Tri-training:Exploiting unlabeled data using three classifiers. IEEE TKDE, 2005,17(11):1529-1541.

[32] Sindhwani V, Rosenberg DS. An RKHS for multi-view learning and manifold coregularization. In:Proc. of the ICML. 2008. 976-983.

[33] Andrew G, Arora R, Bilmes JA, Livescu K. Deep canonical correlation analysis. In:Proc. of the ICML. 2015. 1247-1255.

[34] Wang W, Arora R, Livescu K, Bilmes JA. On deep multi-view representation learning. In:Proc. of the ICML. 2015. 1083-1092.

[35] Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY. Multimodal deep learning. In:Proc. of the ICML. 2010. 689-696.

[36] Sethuraman J. A constructive definition of dirichlet priors. Statistica Sinica, 1994, 639-650.

[37] Yang Y, Wu YF, Zhan DC, Jiang Y. Deep robust unsupervised multi-modal network. In:Proc. of the AAAI. 2019. 5652-5659.

[38] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion. JMLR, 2010,11:3371-3408.

[39] Zhai S, Cheng Y, Lu W, Zhang Z. Deep structured energy based models for anomaly detection. In:Proc. of the ICML. 2016. 1100-1109.

[40] Tao H, Hou C, Liu X, Liu T, Yi D, Zhu J. Reliable multi-view clustering. In:Proc. of the AAAI. 2018. 4123-4130.

[41] Yang Y, Wang KT, Zhan DC, Xiong H, Jiang Y. Comprehensive semi-supervised multi-modal learning. In:Proc. of the IJCAI. 2019. 4092-4098.

[42] Huber PJ. Robust estimation of a location parameter. AMS, 1964,35(1):73-101.

[43] Yang Y, Zhan D, Sheng X, Jiang Y. Semi-supervised multi-modal learning with incomplete modalities. In:Proc. of the IJCAI. 2018. 2998-3004.

[44] Song G, Tan XY. Sequential learning for cross-modal retrieval. In:Proc. of the CVPR Workshop. 2019. 4531-4539.

[45] Wang W, Zhou ZH. The utility of multi-views in learning with unlabeled data. In:Zhang CH, Yang Q, eds. Machine Learning and its Applications. Beijing:Tsinghua University Press, 2013. 27-45(in Chinese with English abstract).

附中文参考文献:

[45] 王魏,周志华.多视图在利用未标记数据学习中的效用.见:张长水,杨强,主编.机器学习及其应用.北京:清华大学出版社,2013. 27-45.

引用本文

杨杨,詹德川,姜远,熊辉.可靠多模态学习综述.软件学报,2021,32(4):1067-1081

复制

文章指标

点击次数:4411
下载次数: 12938
HTML阅读次数: 7290
引用次数: 0

历史

收稿日期:2019-06-17
最后修改日期:2020-04-28
录用日期:
在线发布日期: 2020-12-02
出版日期: 2021-04-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码