融合任务知识的多模态知识图谱补全
作者:
中图分类号:

TP18

基金项目:

国家自然科学基金(62206193, 62076176, 62076175)


Task Knowledge Fusion for Multimodal Knowledge Graph Completion
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [47]
  • | |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    知识图谱补全任务旨在根据已有的事实三元组(头实体、关系、尾实体)来挖掘知识图谱中缺失的事实三元组. 现有的研究工作主要致力于利用知识图谱中的结构信息来进行知识图谱补全任务. 然而, 这些工作忽略了知识图谱中蕴含的其他模态的信息也可能对知识图谱补全有帮助. 并且, 由于基于特定任务的知识通常没有被注入通用的预训练模型, 因而如何在抽取模态信息的过程中融合任务的相关知识变得至关重要. 此外, 因为不同模态特征对于知识图谱补全的贡献不一样, 所以如何有效地保留有用的多模态信息也是一大挑战. 为了解决上述问题, 提出一种融合任务知识的多模态知识图谱补全方法. 利用在当前任务上微调过的多模态编码器, 来获取不同模态下的实体向量表示. 并且, 通过一个基于循环神经网络的模态融合过滤模块, 去除与任务无关的多模态特征. 最后, 利用同构图网络表征并更新所有特征, 从而有效地完成多模态知识图谱补全任务. 实验结果表明, 所提出的方法能有效地抽取不同模态的信息, 并且能够通过进一步的多模态过滤融合来增强实体的表征能力, 进而提高多模态知识图谱补全任务的性能.

    Abstract:

    The task of completing knowledge graphs aims to reveal the missing fact triples within the knowledge graph based on existing fact triples (head entity, relation, tail entity). Existing research primarily focuses on utilizing the structural information within the knowledge graph. However, these efforts overlook that other modal information contained within the knowledge graph may also be helpful for knowledge graph completion. In addition, since task-specific knowledge is typically not integrated into general pre-training models, the process of incorporating task-related knowledge into modal information extraction becomes crucial. Moreover, given that different modal features contribute uniquely to knowledge graph completion, effectively preserving useful multimodal information poses a significant challenge. To address these issues, this study proposes a multimodal knowledge graph completion method that incorporates task knowledge. It utilizes a fine-tuned multimodal encoder tailored to the current task to acquire entity vector representations across different modalities. Subsequently, a modal fusion-filtering module based on recurrent neural networks is utilized to eliminate task-independent multimodal features. Finally, the study utilizes a simple isomorphic graph network to represent and update all features, thus effectively accomplishing multimodal knowledge graph completion. Experimental results demonstrate the effectiveness of our approach in extracting information from different modalities. Furthermore, it shows that our method enhances entity representation capability through additional multimodal filtering and fusion, consequently improving the performance of multimodal knowledge graph completion tasks.

    参考文献
    [1] Bollacker K, Cook R, Tufts P. FreeBase: A shared database of structured general human knowledge. In: Proc. of the 22nd National Conf. on Artificial Intelligence. Vancouver: AAAI Press, 2007. 1962–1963.
    [2] Miller GA. WordNet: A lexical database for English. Communications of the ACM, 1995, 38(11): 39–41.
    [3] Chen Y, Wu LF, Zaki MJ. Bidirectional attentive memory networks for question answering over knowledge bases. arXiv:1903.02188, 2019.
    [4] 陈跃鹤, 谈川源, 陈文亮, 贾永辉, 何正球. 结合多重嵌入表示的中文知识图谱补全. 中文信息学报, 2023, 37(1): 54–63.
    Chen YH, Tan CY, Chen WL, Jia YH, He ZQ. Chinese knowledge graph complementation with multiple embeddings. Journal of Chinese Information Processing, 2023, 37(1): 54–63 (in Chinese with English abstract).
    [5] Zhang FZ, Yuan NJ, Lian DF, Xie X, Ma WY. Collaborative knowledge base embedding for recommender systems. In: Proc. of the 22nd ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016. 353–362.
    [6] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 2787–2795.
    [7] Yang BS, Yih WT, He XD, Gao JF, Deng L. Embedding entities and relations for learning and inference in knowledge bases. arXiv:1412.6575, 2015.
    [8] Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In: Proc. of the 33rd Int’l Conf. on Machine Learning. New York: JMLR.org, 2016. 2071–2080.
    [9] Yao L, Mao CS, Luo Y. KG-BERT: BERT for knowledge graph completion. arXiv:1909.03193, 2019.
    [10] Wang M, Wang S, Yang H, Zhang Z, Chen X, Qi GL. Is visual context really helpful for knowledge graph? A representation learning perspective. In: Proc. of the 29th ACM Int’l Conf. on Multimedia. ACM, 2021. 2735–2743.
    [11] 张宁豫, 谢辛, 陈想, 邓淑敏, 叶宏彬, 陈华钧. 基于知识协同微调的低资源知识图谱补全方法. 软件学报, 2022, 33(10): 3531–3545. http://www.jos.org.cn/1000-9825/6628.htm
    Zhang NY, Xie X, Chen X, Deng SM, Ye HB, Chen HJ. Knowledge collaborative fine-tuning for low-resource knowledge graph completion. Ruan Jian Xue Bao/Journal of Software, 2022, 33(10): 3531–3545 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6628.htm
    [12] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2017.
    [13] Wang Z, Zhang JW, Feng JL, Chen Z. Knowledge graph embedding by translating on hyperplanes. In: Proc. of the 28th AAAI Conf. on Artificial Intelligence. Québec City: AAAI, 2014. 1112–1119. [doi: 10.1609/aaai.v28i1.8870]
    [14] Dettmers T, Minervini P, Stenetorp P, Riedel S. Convolutional 2D knowledge graph embeddings. In: Proc. of the 32nd AAAI Conf. on Artificial Intelligence. New Orleans: AAAI, 2018. 1811–1818. [doi: 10.1609/aaai.v32i1.11573]
    [15] Nguyen DQ, Nguyen TD, Nguyen DQ, Phung D. A novel embedding model for knowledge base completion based on convolutional neural network. In: Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 (Short Papers). New Orleans: Association for Computational Linguistics, 2018. 327–333.
    [16] Chen X, Zhang NY, Li L, Deng SM, Tan CQ, Xu CL, Huang F, Si L, Chen HJ. Hybrid Transformer with multi-level fusion for multimodal knowledge graph completion. In: Proc. of the 45th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Madrid: ACM, 2022. 904–915. [doi: 10.1145/3477495.3531992]
    [17] Zhang YC, Chen MY, Zhang W. Modality-aware negative sampling for multi-modal knowledge graph embedding. In: Proc. of the 2023 Int’l Joint Conf. on Neural Networks (IJCNN). Gold Coast: IEEE, 2023. 1–8. [doi: 10.1109/IJCNN54540.2023.10191314]
    [18] Liang WX, Jiang YH, Liu ZX. GraghVQA: Language-guided graph neural networks for graph-based visual question answering. arXiv:2104.10283, 2021.
    [19] Ghosal D, Majumder N, Poria S, Chhaya N, Gelbukh A. DialogueGCN: A graph convolutional neural network for emotion recognition in conversation. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: Association for Computational Linguistics, 2019. 154–164.
    [20] Ying CX, Cai TL, Luo SJ, Zheng SX, Ke GL, He D, Shen YM, Liu TY. Do Transformers really perform bad for graph representation? arXiv:2106.05234, 2021.
    [21] Schlichtkrull M, Kipf TN, Bloem P, van den Berg R, Titov I, Welling M. Modeling relational data with graph convolutional networks. In: Proc. of the 15th Int’l Conf. on the Semantic Web. Heraklion: Springer, 2018. 593–607. [doi: 10.1007/978-3-319-93417-4_38]
    [22] Vashishth S, Sanyal S, Nitin V, Talukdar P. Composition-based multi-relational graph convolutional networks. arXiv:1911.03082, 2020.
    [23] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [24] Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J, Krueger G, Sutskever I. Learning transferable visual models from natural language supervision. In: Proc. of the 38th Int’l Conf. on Machine Learning. 2021. 8748–8763.
    [25] Zhang D, Wei SZ, Li SS, Wu HQ, Zhu QM, Zhou GD. Multi-modal graph fusion for named entity recognition with targeted visual guidance. In: Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI, 2021. 14347–14355.
    [26] Zheng CM, Feng JH, Fu Z, Cai Y, Li Q, Wang T. Multimodal relation extraction with efficient graph alignment. In: Proc. of the 29th ACM Int’l Conf. on Multimedia. ACM, 2021. 5298–5306. [doi: 10.1145/3474085.3476968]
    [27] Sun H, Wang HY, Liu JQ, Chen YW, Lin LF. CubeMLP: An MLP-based model for multimodal sentiment analysis and depression estimation. In: Proc. of the 30th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2022. 3722–3729. [doi: 10.1145/3503161.3548025]
    [28] Gers FA, Schmidhuber J, Cummins F. Learning to forget: Continual prediction with LSTM. Neural Computation, 2000, 12(10): 2451–2471.
    [29] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555, 2014.
    [30] Zhang CX, Song DJ, Huang C, Swami A, Chawla NV. Heterogeneous graph neural network. In: Proc. of the 25th ACM SIGKDD Int’l Conf. on Knowledge Discovery & Data Mining. Anchorage: ACM, 2019. 793–803. [doi: 10.1145/3292500.3330961]
    [31] Nickel M, Rosasco L, Poggio T. Holographic embeddings of knowledge graphs. In: Proc. of the 30th AAAI Conf. on Artificial Intelligence. Phoenix: AAAI, 2016. 1955–1961. [doi: 10.1609/aaai.v30i1.10314]
    [32] Toutanova K, Chen DQ, Pantel P, Poon H, Choudhury P, Gamon M. Representing text for joint embedding of text and knowledge bases. In: Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics, 2015. 1499–1509. [doi: 10.18653/v1/D15-1174]
    [33] Xie RB, Liu ZY, Luan HB, Sun MS. Image-embodied knowledge representation learning. arXiv:1609.07028, 2017.
    [34] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2017.
    [35] Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580, 2012.
    [36] 陈子睿, 王鑫, 王晨旭, 张少伟, 闫浩宇. 面向时间感知的知识超图链接预测. 软件学报, 2023, 34(10): 4533–4547. http://www.jos.org.cn/1000-9825/6888.htm
    Chen ZR, Wang X, Wang CX, Zhang SW, Yan HY. Towards time-aware knowledge hypergraph link prediction. Ruan Jian Xue Bao/Journal of Software, 2023, 34(10): 4533–4547 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6888.htm
    [37] 庞俊, 刘小琪, 谷峪, 王鑫, 赵宇海, 张晓龙, 于戈. 基于多粒度注意力网络的知识超图链接预测. 软件学报, 2023, 34(3): 1259–1276. http://www.jos.org.cn/1000-9825/6788.htm
    Pang J, Liu XQ, Gu Y, Wang X, Zhao YH, Zhang XL, Yu G. Knowledge hypergraph link prediction based on multi-granular attention network. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1259–1276 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6788.htm
    [38] Li LH, Yatskar M, Yin D, Hsieh CJ, Chang KW. VisualBERT: A simple and performant baseline for vision and language. arXiv:1908.03557, 2019.
    [39] Lu JS, Batra D, Parikh D, Lee S. ViLBERT: Pretraining task-agnostic visiolinguistic representations for vision-and-language tasks. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 13–23.
    [40] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2015.
    [41] He KM, Zhang XY, Ren SQ, Sun J. Deep residual learning for image recognition. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 770–778. [doi: 10.1109/CVPR.2016.90]
    [42] Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai XH, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv:2010.11929, 2021.
    [43] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. arXiv:1810.04805, 2019.
    相似文献
    引证文献
引用本文

陈强,张栋,李寿山,周国栋.融合任务知识的多模态知识图谱补全.软件学报,2025,36(4):1590-1603

复制
分享
文章指标
  • 点击次数:447
  • 下载次数: 2156
  • HTML阅读次数: 7
  • 引用次数: 0
历史
  • 收稿日期:2023-08-25
  • 最后修改日期:2023-11-03
  • 在线发布日期: 2024-07-03
文章二维码
您是第19780495位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号