针对低资源场景下连续情感分析任务的持续注意力建模
作者:
作者简介:

张涵(1998-), 男, 硕士生, CCF学生会员, 主要研究领域为自然语言处理. ;王晶晶(1990-), 男, 博士, 副教授, CCF专业会员, 主要研究领域为自然语言处理. ;罗佳敏(1997-), 女, 博士生, CCF学生会员, 主要研究领域为自然语言处理. ;周国栋(1967-), 男, 博士, 教授, 博士生导师, CCF杰出会员, 主要研究领域为自然语言处理.

通讯作者:

王晶晶, E-mail: djingwang@suda.edu.cn

中图分类号:

TP18

基金项目:

国家自然科学基金(62006166, 62076175, 62076176); 江苏高校优势学科建设工程


Continual Attention Modeling for Successive Sentiment Analysis in Low-resource Scenarios
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [43]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    目前情感分析的研究普遍基于大数据驱动型模型, 严重依赖高昂的标注成本和算力成本, 因此针对低资源场景下的情感分析研究显得尤为迫切. 然而, 存在的低资源场景下的情感分析研究主要集中在单个任务上, 这导致模型难以获取外部任务知识. 因此构建低资源场景下的连续情感分析任务, 旨在利用持续学习方法, 让模型随时间步学习多个情感分析任务. 这样可以充分利用不同任务的数据, 并学习不同任务的情感信息, 从而缓解单个任务训练数据匮乏问题. 认为低资源场景下的连续情感分析任务面临两大核心问题, 一方面是单个任务的情感信息保留问题, 另一方面是不同任务间的情感信息融合问题. 为了解决上述两大问题, 提出针对低资源场景下连续情感分析任务的持续注意力建模方法. 所提方法首先构建情感掩码Adapter, 用于为不同任务生成硬注意力情感掩码, 这可以保留不同任务的情感信息, 从而缓解灾难性遗忘问题. 其次, 所提方法构建动态情感注意力, 根据当前时间步和任务相似度动态融合不同Adapter抽取的特征, 这可以融合不同任务间的情感信息. 在多个数据集上的实验结果表明: 所提方法的性能显著超过了目前最先进的基准方法. 此外, 实验分析表明, 所提方法较其他基准方法具有最优的情感信息能力和情感信息融合能力, 并且能同时保持较高的运行效率.

    Abstract:

    Currently, sentiment analysis research is generally based on big data-driven models, which heavily rely on expensive annotation and computational costs. Therefore, research on sentiment analysis in low-resource scenarios is particularly urgent. However, existing research on sentiment analysis in low-resource scenarios mainly focuses on a single task, making it difficult for models to acquire external task knowledge. Therefore, this study constructs successive sentiment analysis in low-resource scenarios, aiming to allow models to learn multiple sentiment analysis tasks over time by continual learning methods. This can make full use of data from different tasks and learn sentiment information from different tasks, thus alleviating the problem of insufficient training data for a single task. There are two core problems with successive sentiment analysis in low-resource scenarios. One is preserving sentiment information for a single task, and the other is fusing sentiment information between different tasks. To solve these two problems, this study proposes continual attention modeling for successive sentiment analysis in low-resource scenarios. Sentiment masked Adapter (SMA) is first constructed, which is used to generate hard attention emotion masks for different tasks. This can preserve sentiment information for different tasks and mitigate catastrophic forgetting. Secondly, dynamic sentiment attention (DSA) is proposed, which dynamically fuses features extracted by different Adapters based on the current time step and task similarity. This can fuse sentiment information between different tasks. Experimental results on multiple datasets show that the proposed approach significantly outperforms the state-of-the-art benchmark approaches. Additionally, experimental analysis indicates that the proposed approach has the best sentiment information retention ability and sentiment information fusion ability compared to other benchmark approaches while maintaining high operational efficiency.

    参考文献
    [1] Hedderich MA, Lange L, Adel H, Strötgen J, Klakow D. A survey on recent approaches for natural language processing in low-resource scenarios. In: Proc. of the 2021 Conf. of North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021. 2545–2568.
    [2] Lohar P, Xie GD, Bendechache M, Brennan R, Celeste E, Trestian R, Tal I. Irish attitudes toward COVID tracker APP & privacy: Sentiment analysis on Twitter and survey data. In: Proc. of the 16th Int’l Conf. on Availability, Reliability and Security. Vienna: ACM, 2021. 37.
    [3] McCloskey M, Cohen NJ. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of Learning and Motivation, 1989, 24: 109–165.
    [4] French RM. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 1999, 3(4): 128–135.
    [5] Garrette D, Baldridge J. Learning A part-of-speech tagger from two hours of annotation. In: Proc. of the 2013 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Atlanta: Association for Computational Linguistics, 2013. 138–147.
    [6] Yang Z, Wu W, Yang J, Xu C, Li ZJ. Low-resource response generation with template prior. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019. 1886–1897.
    [7] Ke ZX, Liu B, Ma NZ, Xu H, Shu L. Achieving forgetting prevention and knowledge transfer in continual learning. In: Proc. of the 35th Conf. on Neural Information Processing Systems. NeurIPS, 2021. 22443–22456.
    [8] Wei J, Zou K. EDA: Easy data augmentation techniques for boosting performance on text classification tasks. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: Association for Computational Linguistics, 2019. 6382–6388.
    [9] Raiman J, Miller J. Globally normalized reader. In: Proc. of the 2017 Conf. on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics, 2017. 1059–1069.
    [10] Xie QZ, Dai ZH, Hovy E, Luong MT, Le QV. Unsupervised data augmentation for consistency training. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 525.
    [11] Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C. Learning word vectors for sentiment analysis. In: Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language technologies. Portland: Association for Computational Linguistics, 2011. 142–150.
    [12] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1. Minneapolis: Association for Computational Linguistics, 2019. 4171–4186.
    [13] Liu YH, Ott M, Goyal N, Du JF, Joshi M, Chen DQ, Levy O, Lewis M, Zettlemoyer L. RoBERTa: A robustly optimized BERT pretraining approach. arXiv:1907.11692, 2019.
    [14] Cruz JCB, Cheng C. Evaluating language model finetuning techniques for low-resource languages. arXiv:1907.00409, 2019.
    [15] Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, Attariyan M, Gelly S. Parameter-efficient transfer learning for NLP. In: Proc. of the 36th Int’l Conf. on Machine Learning. Long Beach: PMLR, 2019. 2790–2799.
    [16] Phang J, Févry T, Bowman SR. Sentence encoders on STILTs: Supplementary training on intermediate labeled-data tasks. arXiv:1811.01088, 2018.
    [17] Liu XD, He PC, Chen WZ, Gao JF. Multi-task deep neural networks for natural language understanding. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 4487–4496.
    [18] Zhang Y, Yang Q. A survey on multi-task learning. IEEE Trans. on Knowledge and Data Engineering, 2022, 34(12): 5586–5609.
    [19] Pfeiffer J, Kamath A, Rücklé A, Cho K, Gurevych I. AdapterFusion: Non-destructive task composition for transfer learning. In: Proc. of the 16th Conf. of the European Chapter of the Association for Computational Linguistics: Main Volume. Association for Computational Linguistics, 2021. 487–503.
    [20] Wang RZ, Tang DY, Duan N, Wei ZY, Huang XJ, Ji JS, Cao GH, Jiang DX, Zhou M. K-Adapter: Infusing knowledge into pre-trained models with Adapters. In: Proc. of the 2021 Findings of the Association for Computational Linguistics. Association for Computational Linguistics, 2021. 1405–1418.
    [21] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [22] Rücklé A, Geigle G, Glockner M, Beck T, Pfeiffer J, Reimers N, Gurevych I. AdapterDrop: On the efficiency of Adapters in Transformers. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2021. 7930–7946.
    [23] Parisi GI, Kemker R, Part JL, Kanan C, Wermter S. Continual lifelong learning with neural networks: A review. Neural Networks, 2019, 113: 54–71.
    [24] Mermillod M, Bugaiska A, Bonin P. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in Psychology, 2013, 4: 504.
    [25] van de Ven GM, Tolias AS. Three scenarios for continual learning. arXiv:1904.07734, 2019.
    [26] Li ZZ, Hoiem D. Learning without forgetting. IEEE Trans. on Pattern Analysis and Machine Intelligence, 2018, 40(12): 2935–2947.
    [27] Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015.
    [28] Kirkpatrick J, Pascanu R, Rabinowitz N, Veness J, Desjardins G, Rusu AA, Milan K, Quan J, Ramalho T, Grabska-Barwinska A, Hassabis D, Clopath C, Kumaran D, Hadsell R. Overcoming catastrophic forgetting in neural networks. Proc. of the National Academy of Sciences of the United States of America, 2017, 114(13): 3521–3526.
    [29] Liu XL, Masana M, Herranz L, van de Weijer J, López AM, Bagdanov AD. Rotate your networks: Better weight consolidation and less catastrophic forgetting. In: Proc. of the 24th Int’l Conf. on Pattern Recognition. Beijing: IEEE, 2018. 2262–2268.
    [30] Rebuffi SA, Kolesnikov A, Sperl G, Lampert CH. iCaRL: Incremental classifier and representation learning. In: Proc. of the 2017 IEEE Conf. on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017. 5533–5542.
    [31] Lopez-Paz D, Ranzato MA. Gradient episodic memory for continual learning. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6470–6479.
    [32] Chaudhry A, Ranzato MA, Rohrbach M, Elhoseiny M. Efficient lifelong learning with A-GEM. In: Proc. of the 7th Int’l Conf. on Learning Representations. New Orleans: OpenReview.net, 2019.
    [33] Buzzega P, Boschini M, Porrello A, Abati D, Calderara S. Dark experience for general continual learning: A strong, simple baseline. In: Proc. of the 34th Conf. on Neural Information Processing Systems. Vancouver: NeurIPS, 2020. 15920–15930.
    [34] Mallya A, Lazebnik S. PackNet: Adding multiple tasks to a single network by iterative pruning. In: Proc. of the 2018 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018. 7765–7773.
    [35] Serra J, Suris D, Miron M, Karatzoglou A. Overcoming catastrophic forgetting with hard attention to the task. In: Proc. of the 35th Int’l Conf. on Machine Learning. Stockholm: PMLR, 2018. 4548–4557.
    [36] Ke ZX, Liu B, Huang XC. Continual learning of a mixed sequence of similar and dissimilar tasks. In: Proc. of the 34th Conf. on Neural Information Processing Systems. Vancouver: NeurIPS, 2020. 18493–18504.
    [37] Ke ZX, Xu H, Liu B. Adapting BERT for continual learning of a sequence of aspect sentiment classification tasks. In: Proc. of the 2021 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, 2021. 4746–4755.
    [38] McAuley J, Targett C, Shi QF, van den Hengel A. Image-based recommendations on styles and substitutes. In: Proc. of the 38th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Santiago: ACM, 2015. 43–52.
    [39] Liu PF, Qiu XP, Huang XJ. Adversarial multi-task learning for text classification. In: Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver: Association for Computational Linguistics, 2017. 1–10.
    [40] Pang B, Lee L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In: Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics. Ann Arbor: Association for Computational Linguistics, 2005. 115–124.
    [41] Ke ZX, Liu B, Wang H, Shu L. Continual learning with knowledge transfer for sentiment classification. In: Proc. of the 2021 Joint European Conf. on Machine Learning and Knowledge Discovery in Databases. Ghent: Springer, 2021. 683–698.
    [42] Liu YY, Su YT, Liu AA, Schiele B, Sun QR. Mnemonics training: Multi-class incremental learning without forgetting. In: Proc. of the 2020 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020. 12242–12251.
    [43] Yang YM, Liu X. A re-examination of text categorization methods. In: Proc. of the 22nd Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Berkeley: ACM, 1999. 42–49.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

张涵,王晶晶,罗佳敏,周国栋.针对低资源场景下连续情感分析任务的持续注意力建模.软件学报,2024,35(12):5470-5486

复制
分享
文章指标
  • 点击次数:765
  • 下载次数: 2410
  • HTML阅读次数: 524
  • 引用次数: 0
历史
  • 收稿日期:2023-03-31
  • 最后修改日期:2023-08-20
  • 在线发布日期: 2024-01-03
  • 出版日期: 2024-12-06
文章二维码
您是第19985802位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号