Bidirectional Imitation Distillation for Efficient Incremental Pre-training of E-commerce Social Knowledge Graph
Author:
Affiliation:

Clc Number:

TP181

  • Article
  • | |
  • Metrics
  • |
  • Reference [61]
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    Pre-training knowledge graph (KG) models facilitate various downstream tasks in e-commerce applications. However, large-scale social KGs are highly dynamic, and the pre-training models need to be updated regularly to reflect the changes in node features caused by user interactions. This study proposes an efficient incremental update framework for the pre-training KG models. The framework mainly includes a bidirectional imitation distillation method to fully use the different types of facts in new data, and a sampling strategy based on samples’ normality and abnormality is proposed to sample the most valuable facts from all new facts to reduce the training data size, and a reverse replay mechanism is proposed to generate high-quality negative facts that are more suitable for the incremental training of social KGs in e-commerce. Experimental results on real-world e-commerce datasets and related downstream tasks demonstrate that the proposed framework can incrementally update the pre-training KG models more effectively and efficiently compared to state-of-the-art methods.

    Reference
    [1] Wakil K, Alyari F, Ghasvari M, Lesani Z, Rajabion L. A new model for assessing the role of customer behavior history, product classification, and prices on the success of the recommender systems in e-commerce. Kybernetes, 2019, 49(5): 1325–1346.
    [2] Zhao Q, Chen JL, Chen MM, Jain S, Beutel A, Belletti F, Chi EH. Categorical-attributes-based item classification for recommender systems. In: Proc. of the 12th ACM Conf. on Recommender Systems. New York: Association for Computing Machinery, 2018. 320–328. [doi: 10.1145/3240323.3240367]
    [3] Zhu YS, Zhao HY, Zhang W, Ye GQ, Chen H, Zhang NY, Chen HJ. Knowledge perceived multi-modal pretraining in E-commerce. In: Proc. of the 29th ACM Int’l Conf. on Multimedia. New York: Association for Computing Machinery, 2021. 2744–2752.
    [4] Stein D, Shterionov D, Way A. Towards language-agnostic alignment of product titles and descriptions: A neural approach. In: Companion Proc. of the 2019 World Wide Web Conf. (WWW2019). New York: Association for Computing Machinery, 2019. 387–392.
    [5] Neve J, Palomares I. Hybrid reciprocal recommender systems: Integrating item-to-user principles in reciprocal recommendation. In: Companion Proc. of the 2020 Web Conf. (WWW2020). New York: Association for Computing Machinery, 2020. 848–853.
    [6] Zhang W, Wong CM, Ye GQ, Wen B, Zhang W, Chen HJ. Billion-scale pre-trained e-commerce product knowledge graph model. In: Proc. of the 37th Int’l Conf. on Data Engineering (ICDE). Chania: IEEE, 2021. 2476–2487. [doi: 10.1109/ICDE51399.2021.00280]
    [7] Wong CM, Feng F, Zhang W, Vong CM, Chen H, Zhang YC, He P, Chen H, Zhao K, Chen HJ. Improving conversational recommender system by pretraining billion-scale knowledge graph. In: Proc. of the 37th Int’l Conf. on Data Engineering (ICDE). Chania: IEEE, 2021. 2607–2612. [doi: 10.1109/ICDE51399.2021.00291]
    [8] Li ZX, Jin XL, Li W, Guan SP, Guo JF, Shen HW, Wang YZ, Cheng XQ. Temporal knowledge graph reasoning based on evolutional representation learning. In: Proc. of the 44th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: Association for Computing Machinery, 2021. 408–417.
    [9] Wu JP, Xu YS, Zhang YX, Ma C, Coates M, Cheung JCK. TIE: A framework for embedding-based incremental temporal knowledge graph completion. In: Proc. of the 44th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. New York: Association for Computing Machinery, 2021. 428–437.
    [10] Losing V, Hammer B, Wersing H. Incremental on-line learning: A review and comparison of state of the art algorithms. Neurocomputing, 2018, 275: 1261–1274.
    [11] Yoon J, Yang E, Lee J, Hwang SJ. Lifelong learning with dynamically expandable networks. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
    [12] Nguyen CV, Li YZ, Bui TD, Turner RE. Variational continual learning. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
    [13] Li ZZ, Hoiem D. Learning without forgetting. In: Leibe B, Matas J, Sebe N, Welling M, eds. Computer Vision—ECCV 2016. Cham: Springer, 2016. 614–629. [doi: 10.1007/978-3-319-46493-0_37]
    [14] Rannen A, Aljundi R, Blaschko MB, Tuytelaars T. Encoder based lifelong learning. In: Proc. of the 2017 IEEE Int’l Conf. on Computer Vision (ICCV). Venice: IEEE, 2017. 1329–1337. [doi: 10.1109/ICCV.2017.148]
    [15] Liu XL, Masana M, Herranz L, Van De Weijer J, López AM, Bagdanov AD. Rotate your networks: Better weight consolidation and less catastrophic forgetting. In: Proc. of the 24th Int’l Conf. on Pattern Recognition (ICPR). Beijing: IEEE, 2018. 2262–2268.
    [16] Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. iCaRL: Incremental classifier and representation learning. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 5533–5542. [doi: 10.1109/CVPR.2017.587]
    [17] Lopez-Paz D, Ranzato MA. Gradient episodic memory for continual learning. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6470–6479.
    [18] Chaudhry A, Ranzato M, Rohrbach M, Elhoseiny M. Efficient lifelong learning with A-GEM. In: Proc. of the 2019 Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019.
    [19] Liu L, Bai X, Zhang HG, Zhou J, Tang WZ. Describing and learning of related parts based on latent structural model in big data. Neurocomputing, 2016, 173: 355–363.
    [20] Malik ZK, Hussain A, Wu J. An online generalized eigenvalue version of Laplacian eigenmaps for visual big data. Neurocomputing, 2016, 173: 127–136.
    [21] 梁峥, 王宏志, 戴加佳, 邵心玥, 丁小欧, 穆添愉. 预训练语言模型实体匹配的可解释性. 软件学报, 2023, 34(3): 1087–1108. http://www.jos.org.cn/1000-9825/6794.htm
    Liang Z, Wang HZ, Dai JJ, Shao XY, Ding XO, Mu TY. Interpretability of entity matching based on pre-trained language model. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1087–1108 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6794.htm
    [22] 孙凯丽, 罗旭东, 罗有容. 预训练语言模型的应用综述. 计算机科学, 2023, 50(1): 176–184.
    Sun KL, Luo XD, Luo YR. Survey of applications of pretrained language models. Computer Science, 2023, 50(1): 176–184 (in Chinese with English abstract).
    [23] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego: ICLR, 2015.
    [24] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016. 2818–2826. [doi: 10.1109/CVPR.2016.308]
    [25] Xie SN, Girshick R, Dollár P, Tu ZW, He KM. Aggregated residual transformations for deep neural networks. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017. 5987–5995. [doi: 10.1109/CVPR.2017.634]
    [26] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2019. 4171–4186. [doi: 10.18653/v1/N19-1423]
    [27] Yang ZL, Dai ZH, Yang YM, Carbonell J, Salakhutdinov RR, Le QV. XLNet: Generalized autoregressive pretraining for language understanding. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 517.
    [28] Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language models are few-shot learners. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 159. [doi: 10.5555/3495724.3495883]
    [29] Liu WJ, Zhou P, Zhao Z, Wang ZR, Ju Q, Deng HT, Wang P. K-bert: Enabling language representation with knowledge graph. Proc. of the AAAI Conf. on Artificial Intelligence, 2020, 34(3): 2901–2908.
    [30] Zhang ZY, Han X, Liu ZY, Jiang X, Sun MS, Liu Q. ERNIE: Enhanced language representation with informative entities. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 1441–1451. [doi: 10.18653/v1/P19-1139]
    [31] Peters ME, Neumann M, Logan R, Schwartz R, Joshi V, Singh S, Smith NA. Knowledge enhanced contextual word representations. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: Association for Computational Linguistics, 2019. 43–54. [doi: 10.18653/v1/D19-1005]
    [32] Wang RZ, Tang DY, Duan N, Wei ZY, Huang XJ, Ji JS, Cao GH, Jiang DX, Zhou M. K-adapter: Infusing knowledge into pre-trained models with adapters. In: Proc. of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Association for Computational Linguistics, 2021. 1405–1418.
    [33] Xiong WH, Wang H, Wang WY. Progressively pretrained dense corpus index for open-domain question answering. In: Proc. of the 16th Conf. of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, 2021. 2803–2815.
    [34] 熊凯, 杜理, 丁效, 刘挺, 秦兵, 付博. 面向文本推理的知识增强预训练语言模型. 中文信息学报, 2022, 36(12): 27–35.
    Xiong K, Du L, Ding X, Liu T, Qin B, Fu B. Knowledge enhanced pre-trained language model for textual inference. Journal of Chinese Information Processing, 2022, 36(12): 27–35 (in Chinese with English abstract).
    [35] Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K. End-to-end incremental learning. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision—ECCV 2018. Cham: Springer, 2018. 241–257. [doi: 10.1007/978-3-030-01258-8_15]
    [36] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
    [37] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R. Overcoming catastrophic forgetting in neural networks. Proc. of the National Academy of Sciences of the United States of America, 2017, 114(13): 3521–3526.
    [38] Dhar P, Singh RV, Peng KC, Wu ZY, Chellappa R. Learning without memorizing. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 5133–5141. [doi: 10.1109/CVPR.2019.00528]
    [39] Aljundi R, Lin M, Goujaud B, Bengio Y. Gradient based sample selection for online continual learning. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 1058.
    [40] Shin H, Lee JK, Kim J, Kim J. Continual learning with deep generative replay. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 2994–3003.
    [41] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Montreal: MIT Press, 2014. 2672–2680.
    [42] Rolnick D, Ahuja A, Schwarz J, Lillicrap T, Wayne G. Experience replay for continual learning. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 32.
    [43] Wu Y, Chen YP, Wang LJ, Ye YC, Liu ZC, Guo YD, Fu Y. Large scale incremental learning. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019. 374–382. [doi: 10.1109/CVPR.2019.00046]
    [44] Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks. In: Proc. of the 5th Int’l Conf. on Learning Representations. Toulon: OpenReview.net, 2017.
    [45] Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph attention networks. In: Proc. of the 6th Int’l Conf. on Learning Representations. Vancouver: OpenReview.net, 2018.
    [46] Xu YS, Zhang YX, Guo W, Guo HF, Tang RM, Coates M. GraphSAIL: graph structure aware incremental learning for recommender systems. In: Proc. of the 29th ACM Int’l Conf. on Information & Knowledge Management. New York: Association for Computing Machinery, 2020. 2861–2868.
    [47] Zhou F, Cao CT. Overcoming catastrophic forgetting in graph neural networks with experience replay. In: Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI Press, 2021. 4714–4722.
    [48] Liu HH, Yang YD, Wang XC. Overcoming catastrophic forgetting in graph neural networks. In: Proc. of the 35th AAAI Conf. on Artificial Intelligence. AAAI Press, 2021. 8653–8661.
    [49] 张翼, 朱永利. 结合知识蒸馏和图神经网络的局部放电增量识别方法. 电工技术学报, 2023, 38(5): 1390–1400.
    Zhang Y, Zhu YL. Incremental partial discharge recognition method combining knowledge distillation with graph neural network. Trans. of China Electrotechnical Society, 2023, 38(5): 1390–1400 (in Chinese with English abstract).
    [50] Song HJ, Park SB. Enriching translation-based knowledge graph embeddings through continual learning. IEEE Access, 2018, 6: 60489–60497.
    [51] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 2787–2795.
    [52] Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In: Proc. of the 33rd Int’l Conf. on Machine Learning. New York: JMLR.org, 2016. 2071–2080.
    [53] Sun ZQ, Deng ZH, Nie JY, Tang J. RotatE: Knowledge graph embedding by relational rotation in complex space. In: Proc. of the 2019 Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019.
    [54] Cipolla R, Gal Y, Kendall A. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. 2018. 7482–7491. [doi: 10.1109/CVPR.2018.00781]
    [55] Kingma DP, Ba J. Adam: A method for stochastic optimization. In: Proc. of the 3rd Int’l Conf. on Learning Representations. San Diego: ICLR, 2015.
    [56] Zhu YS, Tong X, Fan D, Wang X. Identifying privacy leakage from user-generated content in an online health community. In: Chen HC, Zeng D, Yan XB, Xing CX, eds. Smart Health. Cham: Springer, 2019. 257–268. [doi: 10.1007/978-3-030-34482-5_23]
    [57] He XN, Liao LZ, Zhang HW, Nie LQ, Hu X, Chua TS. Neural collaborative filtering. In: Proc. of the 26th Int’l Conf. on World Wide Web. Perth: Int’l World Wide Web Conf. Steering Committee, 2017. 173–182. [doi: 10.1145/3038912.3052569]
    Related
    Cited by
Get Citation

朱渝珊,张文,王晓珂,李志宇,陈名杨,姚祯,陈辉,陈华钧.面向电子商务社交知识图谱高效增量预训练的双向模仿蒸馏.软件学报,2025,36(3):1218-1239

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:June 25,2023
  • Revised:October 10,2023
  • Online: June 14,2024
You are the first2035080Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063