可靠多模态学习综述
作者:
作者简介:

杨杨(1991-),男,博士,教授,CCF专业会员,主要研究领域为机器学习,数据挖掘.
詹德川(1982-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为人工智能,机器学习,数据挖掘.
姜远(1976-),女,博士,教授,博士生导师,CCF专业会员,主要研究领域为人工智能,机器学习,数据挖掘.
熊辉(1972-),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为数据挖掘,商业智能.

通讯作者:

詹德川,E-mail:zhandc@nju.edu.cn

基金项目:

国家自然科学基金(61673201,62006118,61773198,61632004);江苏省自然科学基金(BK20200460);CCF-百度松果基金(CCF-BAIDU OF2020011);百度TIC项目基金


Reliable Multi-modal Learning: A Survey
Author:
Fund Project:

National Natural Science Foundation of China (61673201, 62006118, 61773198, 61632004); Natural Science Foundation of Jiangsu Province, China (BK20200460); CCF-BAIDU Songguo Foundation (CCF-BAIDU OF2020011); BAIDU TIC Foundation

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [47]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    近年来,多模态学习逐步成为机器学习、数据挖掘领域的研究热点之一,并成功地应用于诸多现实场景中,如跨媒介搜索、多语言处理、辅助信息点击率预估等.传统多模态学习方法通常利用模态间的一致性或互补性设计相应的损失函数或正则化项进行联合训练,进而提升单模态及集成的性能.而在开放环境下,受数据缺失及噪声等因素的影响,多模态数据呈现不均衡性.具体表现为单模态信息不充分或缺失,从而导致“模态表示强弱不一致”“模态对齐关联不一致”两大挑战,而针对不均衡多模态数据直接利用传统的多模态方法甚至会退化单模态和集成的性能.针对这类问题,可靠多模态学习被提出并进行了广泛研究,系统地总结和分析了目前国内外学者针对可靠多模态学习取得的进展,并对未来研究可能面临的挑战进行展望.

    Abstract:

    Recently, multi-modal learning is one of the important research fields of machine learning and data mining, and it has a wide range of practical applications, such as cross-media search, multi-language processing, auxiliary information click-through rate estimation, etc. Traditional multi-modal learning methods usually use the consistency or complementarity among modalities to design corresponding loss functions or regularization terms for joint training, thereby improving the single-modal and ensemble performance. However, in the open environment, affected by factors such as data missing and noise, multi-modal data is imbalanced, specifically manifested as insufficient or incomplete, resulting in “inconsistency modal feature representations” and “inconsistent modal alignment relationships”. Direct use of traditional multi-modal methods will even degrade single-modal and ensemble performance. To solve these problems, reliable multi-modal learning has been proposed and studied. This paper systematically summarizes and analyzes the progress made by domestic and international scholars on reliable multi-modal research, and the challenges that future research may face.

    参考文献
    [1] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In:Proc. of the COLT. 1998. 92-100.
    [2] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training. In:Proc. of the CIKM. 2000. 86-93.
    [3] Yu S, Krishnapuram B, Rosales R, Rao RB. Bayesian co-training. In:Proc. of the NeurIPS. 2007. 1-7.
    [4] Zhang ML, Zhou ZH. CoTrade:Confident co-training with data editing. IEEE TMC, 2011,Part B41(6):1612-1626.
    [5] Sindhwani V, Niyogi P, Belkin M. A co-regularization approach to semi-supervised learning with multiple views. In:Proc. of the ICML Workshop. 2005.
    [6] Farquhar J, Hardoon D, Meng H, Shawe-Taylor J, Szedmak S. Two view learning:SVM-2K, theory and practice. In:Proc. of the NeurIPS. 2005. 355-362.
    [7] Xia T, Tao D, Mei T, Zhang Y. Multiview spectral embedding. IEEE TSMC, 2010,40(6):1438-1446.
    [8] Hotelling H. Relations between two sets of variates. Biometrika, 1936,28(3/4):321-377.
    [9] Gonen M, Alpaydın E. Multiple kernel learning algorithms. JMLR, 2011,12:2211-2268.
    [10] Wang X, Guo X, Lei Z, Zhang C, Li SZ. Exclusivity-consistency regularized multi-view subspace clustering. In:Proc. of the CVPR. 2017. 1-9.
    [11] Wang W, Zhou ZH. A new analysis of co-training. In:Proc. of the ICML. 2010. 1135-1142.
    [12] Sridharan K, Kakade SM. An information theoretic framework for multi-view learning. In:Proc of the COLT. 2008. 403-414.
    [13] Dietterichl TG. Ensemble learning. In:Arbib MA, ed. The Handbook of Brain Theory and Neural Networks. Cambridge:MIT Press, 2002.
    [14] Wei B, Pal C. Cross lingual adaptation:An experiment on sentiment classifications. In:Proc. of the ACL. 2010. 258-262.
    [15] Muslea I, Minton S, Knoblock CA. Selective sampling with naive co-testing:Preliminary results. In:Proc. of the CRM Workshop. 2000.
    [16] Wang W, Zhou ZH. Co-training with insufficient views. In:Proc. of the ACML. 2013. 467-482.
    [17] Li S, Jiang Y, Zhou Z. Partial multi-view clustering. In:Proc. of the AAAI. 2014. 1968-1974.
    [18] Frome A, Corrado GS, Shlens J, Bengio S, Dean J, Ranzato M, Mikolov T. Devise:A deep visual-semantic embedding model. In:Proc. of the NeurIPS. 2013. 2121-2129.
    [19] Mulsea I, Minton S, Knoblock CA. Active learning with strong and weak views:A case study on wrapper induction. In:Proc. of the IJCAI. 2003. 415-420.
    [20] Yang Y, Ye HJ, Zhan DC, Jiang Y. Auxiliary information regularized machine for multiple modality feature learning. In:Proc. of the IJCAI. 2015. 1033-1039.
    [21] Iwata T, Yamada M. Multi-view anomaly detection via robust probabilistic latent variable models. In:Proc. of the NeurIPS. 2016. 1136-1144.
    [22] Zhao HD, Fu Y. Dual-regularized multi-view outlier detection. In:Proc. of the IJCAI. 2015. 4077-4083.
    [23] Yang Y, Wu YF, Zhan DC, Liu ZB, Jiang Y. Complex object classification:A multi-modal multi-instance multi-label deep network with optimal transport. In:Proc. of the KDD. 2018. 2594-2603.
    [24] Baltrusaitis T, Ahuja C, Morency LP. Multimodal machine learning:A survey and taxonomy. PAMI, 2019,41(2):423-443.
    [25] Ramachandram D, Taylor GW. Deep multimodal learning:A survey on recent advances and trends. In:Proc. of the SPM. 2017. 96-108.
    [26] Sun SL. A survey of multi-view machine learning. Neural Computing and Applications, 2013,23(7-8):2031-2038.
    [27] Balcan MF, Blum A, Ke Y. Co-training and expansion:Towards bridging theory and practice. In:Proc. of the NeurIPS. 2004. 89-96.
    [28] Wang W, Zhou ZH. Analyzing co-training style algorithms. In:Proc. of the ECML. 2007. 454-465.
    [29] Dasgupta S, Littman ML, McAllester D. PAC generalization bounds for co-training. In:Proc. of the NeurIPS. 2002. 375-382.
    [30] Chen DD, Wang W, Gao W, Zhou ZH. Tri-net for semi-supervised deep learning. In:Proc. of the IJCAI. 2018. 2014-2020.
    [31] Zhou ZH, Li M. Tri-training:Exploiting unlabeled data using three classifiers. IEEE TKDE, 2005,17(11):1529-1541.
    [32] Sindhwani V, Rosenberg DS. An RKHS for multi-view learning and manifold coregularization. In:Proc. of the ICML. 2008. 976-983.
    [33] Andrew G, Arora R, Bilmes JA, Livescu K. Deep canonical correlation analysis. In:Proc. of the ICML. 2015. 1247-1255.
    [34] Wang W, Arora R, Livescu K, Bilmes JA. On deep multi-view representation learning. In:Proc. of the ICML. 2015. 1083-1092.
    [35] Ngiam J, Khosla A, Kim M, Nam J, Lee H, Ng AY. Multimodal deep learning. In:Proc. of the ICML. 2010. 689-696.
    [36] Sethuraman J. A constructive definition of dirichlet priors. Statistica Sinica, 1994, 639-650.
    [37] Yang Y, Wu YF, Zhan DC, Jiang Y. Deep robust unsupervised multi-modal network. In:Proc. of the AAAI. 2019. 5652-5659.
    [38] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders:Learning useful representations in a deep network with a local denoising criterion. JMLR, 2010,11:3371-3408.
    [39] Zhai S, Cheng Y, Lu W, Zhang Z. Deep structured energy based models for anomaly detection. In:Proc. of the ICML. 2016. 1100-1109.
    [40] Tao H, Hou C, Liu X, Liu T, Yi D, Zhu J. Reliable multi-view clustering. In:Proc. of the AAAI. 2018. 4123-4130.
    [41] Yang Y, Wang KT, Zhan DC, Xiong H, Jiang Y. Comprehensive semi-supervised multi-modal learning. In:Proc. of the IJCAI. 2019. 4092-4098.
    [42] Huber PJ. Robust estimation of a location parameter. AMS, 1964,35(1):73-101.
    [43] Yang Y, Zhan D, Sheng X, Jiang Y. Semi-supervised multi-modal learning with incomplete modalities. In:Proc. of the IJCAI. 2018. 2998-3004.
    [44] Song G, Tan XY. Sequential learning for cross-modal retrieval. In:Proc. of the CVPR Workshop. 2019. 4531-4539.
    [45] Wang W, Zhou ZH. The utility of multi-views in learning with unlabeled data. In:Zhang CH, Yang Q, eds. Machine Learning and its Applications. Beijing:Tsinghua University Press, 2013. 27-45(in Chinese with English abstract).
    附中文参考文献:
    [45] 王魏,周志华.多视图在利用未标记数据学习中的效用.见:张长水,杨强,主编.机器学习及其应用.北京:清华大学出版社,2013. 27-45.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

杨杨,詹德川,姜远,熊辉.可靠多模态学习综述.软件学报,2021,32(4):1067-1081

复制
分享
文章指标
  • 点击次数:4411
  • 下载次数: 12938
  • HTML阅读次数: 7290
  • 引用次数: 0
历史
  • 收稿日期:2019-06-17
  • 最后修改日期:2020-04-28
  • 在线发布日期: 2020-12-02
  • 出版日期: 2021-04-06
文章二维码
您是第20474675位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号