属性抽取研究综述
作者:
作者简介:

徐庆婷(1994—),女,硕士,CCF学生会员,主要研究领域为属性抽取;洪宇(1978—),男,博士,教授,博士生导师,CCF专业会员,主要研究领域为信息抽取,智能问答,低资源机器翻译,语篇分析;潘雨晨(1996—),男,硕士生,主要研究领域为属性抽取;姚建民(1971—),男,博士,主任研究员,主要研究领域为机器翻译,信息抽取;周国栋(1967—),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为自然语言处理

通讯作者:

洪宇,hongy@suda.edu.cn

基金项目:

国家重点研发计划(2020YFB1313601);国家自然科学基金(62076174,62076175);江苏省研究生科研与实践创新计划(KYCX21_2955)


Survey on Aspect Term Extraction
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [79]
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    属性抽取是一种自动识别和提取属性表述文字的自然语言处理任务.首先重温了属性抽取的基本任务、权威数据资源和通用评测规范,并在此基础上全面回顾了现有前沿技术,包括基于统计策略和特征工程的传统抽取技术以及利用深度学习的神经抽取技术.特别地,以属性表述语言的本质为出发点,结合现有技术暴露出的不足,对该领域的技术难点和推演方向给出了详细解释.

    Abstract:

    Aspect term extraction is a natural language processing task that automatically recognizes and extracts aspect term from free expression text. The study first goes over the basic task of aspect term extraction, the authoritative datasets of it and general evaluation specifications on it. Based on these, the study comprehensively reviews on the state-of-the-art techniques for the task, including traditional extraction techniques based on statistical strategies and feature engineering, and the neural extraction techniques using deep learning. In particular, the study takes the essence of expression language as the starting point, combines with the limitations of existing techniques and gives an elaboration of the technical difficulties and the future development prospects of aspect term extraction.

    参考文献
    [1] Liu B. Sentiment analysis and opinion mining. In:Proc. of the Synthesis Lectures on Human Language Technologies. 2012. 1-167.
    [2] Wagner J, Arora P, Cortes S, Barman U, Dasha B, Jennifer F, Lamia T. DCU:Aspect-based polarity classification for semeval task 4. In:Proc. of the 8th Int'l Wokshop on Semantic Evaluation (SemEval 2014). 2014. 223-229.
    [3] Dai HL, Song YQ. Neural aspect and opinion term extraction with mined rules as weak supervision. In:Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. 5268-5277.
    [4] Pontiki M, Galanis D, Pavlopoulos J, Papageorgiou H, Androutsopoulos I, Manandhar S. SemEval-2014 task 4:Aspect based sentiment analysis. In:Proc. of the 8th Int'l Workshop on Semantic Evaluation (SemEval 2014). Dublin:Association for Computational Linguistics, 2014. 27-35.
    [5] Pontiki M, Galanis D, Papageorgiou H, Manandhar S, Androutsopoulos I. Semeval-2015 task 12:Aspect based sentiment analysis. In:Proc. of the 9th Int'l Workshop on Semantic Evaluation (SemEval 2015). Denver:Association for Computational Linguistics, 2015. 486-495.
    [6] Pontiki M, Galanis D, Papageorgiou H, Androutsopoulos I, Manandhar S, Al-Smadi M, Al-Ayyoub M, Zhao YY, Qin B, De CO. SemEval-2016 task 5:Aspect based sentiment analysis. In:Proc. of the 10th Int'l Workshop on Semantic Evaluation (SemEval 2016). San Diego:Association for Computational Linguistics, 2016. 19-30.
    [7] Liu Q, Gao ZQ, Liu B, Zhang YL. Automated rule selection for aspect extraction in opinion mining. In:Proc. of the 24th Int'l Joint Conf. on Artificial Intelligence. 2015. 1291-1297.
    [8] Mukherjee A, Liu B. Aspect extraction through semi-supervised modeling. In:Proc. of the 50th Annual Meeting of the Association for Computational Linguistics. 2012. 339-348.
    [9] Chen ZY, Mukherjee A, Liu B. Aspect extraction with automated prior knowledge learning. In:Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. 2014. 347-358.
    [10] Jiang SY, Guo LD, Wang LX, Fu SH. Survey on opinion target extraction. Acta Automatica Sinica, 2018, 44(7):1165-1182 (in Chinese with English abstract). 蒋盛益, 郭林东, 王连喜, 符斯慧. 评价对象抽取研究综述. 自动化学报, 2018, 44(7):1165-1182.
    [11] Wang WY, Pan SJ, Dahlmeier D, Xiao XK. Coupled multilayer attentions for co-extraction of aspect and opinion terms. In:Proc. of the 31st AAAI Conf. on Artificial Intelligence. 2017. 3316-3322.
    [12] Le R, Roth D. Design challenges and misconceptions in named entity recognition. In:Proc. of the 13th Conf. on Computational Natural Language. 2009. 147-155.
    [13] Fan Z, Wu Z, Dai XY, Huang S, Chen J. Target-oriented opinion words extraction with target-fused neural sequence labeling. In:Proc. of the 2019 Association for Computational Linguistics. 2019. 2509-2519.
    [14] Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification usingmachine learning techniques. In:Proc. of the 2002 Conf. on Empirical Methods in Natural Language Processing. 2002. 79-86.
    [15] Hu MQ, Liu B. Mining and summarizing customer reviews. In:Proc. of the ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. 2004. 168-177.
    [16] Farra N, Mckeown K, Habash N. Annotating targets of opinions in Arabic using crowdsourcing. In:Proc. of the ACL-2015 Workshop on Arabic Natural Language. 2015. 89-98.
    [17] Xu R, Xia Y, Wong KF, Li WJ. Opinion annotation in online Chinese product reviews. In:Proc. of the 6th Int'l Conf. on Language Resources and Evaluation. 2008. 1625-1632.
    [18] Akhtar MS, Ekbal A, Bhattacharyya P. Aspect based sentiment analysis in Hindi:Resource creation and evaluation. In:Proc. of the 10th Int'l Conf. on Language Resources and Evaluation. 2016. 2703-2709.
    [19] Bhattacharya A, Debnath A, Shrivastava M. Enhancing aspect extraction in Hindi. In:Proc. of the 59th Annual Meeting of the Association for Computational Linguistics. 2021. 140-149.
    [20] Yang H, Zeng B, Yang JH, Song YW, Xu RY. A multi-task learning model for chinese-oriented aspect polarity classification and aspect term extraction. Neurocomputing, 2021, 419:344-356.
    [21] Cruz I, Gelbukh AF, Sidorov G. Implicit aspect indicator extraction for aspect based opinion mining. Journal of Computational Linguistics and Application, 2014, 5(2):135-152.
    [22] Lafferty J, McCallum A, Pereira F. Conditional random fields:Probabilistic models for segmenting and labeling sequence data. In:Proc. of the 18th Int'l Conf. on Machine Learning. 2001. 282-289.
    [23] Hubel DH, Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 1962, 160(1):106-154.
    [24] Elman JL. Finding structure in time. Cognitive Science, 1990, 14(2):179-211.
    [25] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.
    [26] Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In:Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing. 2014. 1724-1734.
    [27] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In:Advances in Neural Information Processing Systems. 2017, 30:6000-6010.
    [28] Devlin J, Chang MW, Lee K, Toutanova K. BERT:Pre-training of deep bidirectional transformers for language understanding. In:Proc. of the 2019 Conf. of the North American Chapter of the Association of Computational Linguistics:Human Language Technologies. 2018. 4171-4186.
    [29] Fan ZF, Wu Z, Dai XY, Huang SJ, Chen JJ. Target-oriented opinion words extraction with target-fused neural sequence labeling. In:Proc. of the Annual Conf. of the North American Chapter of the Association for Computational Linguistics. 2019. 2509-2518.
    [30] Cho K, Courville A, Bengio Y. Describing multimedia content using attention-based encoder-decoder networks. IEEE Trans. on Multimedia, 2015, 17(11):1875-1886.
    [31] Xu S, Li H, Yuan P, Wu YZ, He XD, Zhou BW. Self-Attention guided copy mechanism for abstractive summarization. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. 1355-1362.
    [32] Wang YX, Guo Y, Zhu SQ. Slot attention with value normalization for multi-domain dialogue state tracking. In:Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing. 2020. 3019-3028.
    [33] Dai ZH, Yang ZL, Yang YM, Carbonell JM, Le QV, Salakhutdinov R. Transformer-xl:Attentive language models beyond a fixed-length context. In:Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. 2978-2988.
    [34] Child R, Gray S, Radford A, Sutskever I. Generating long sequences with sparse transformers. arXiv:1904.10509, 2019.
    [35] Dehghani M, Gouws S, Vinyals O, Uszkoreit J, Kaiser L. Universal transformers. arXiv:1807.03819, 2018.
    [36] Zhao H, Huang LT, Zhang R, Lu Q, Xue H. Spanmlt:A span-based multi-task learning framework for pair-wise aspect and opinion terms extraction. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. 3239-3248.
    [37] Wei ZK, Hong Y, Zou BW, Cheng M, Yao JM. Don't eclipse your arts due to small discrepancies:Boundary repositioning with a pointer network for aspect extraction. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. 3678-3684.
    [38] Li X, Lam W. Deep multi-task learning for aspect term extraction with memory interaction. In:Proc. of the 2017 Conf. on Empirical Methods in Natural Language Processing. 2017. 2886-2892.
    [39] Li X, Bing LD, Li P, Lam W, Yang ZM. Aspect term extraction with history attention and selective transformation. In:Proc. of the 27th Int'l Joint Conf. on Artificial Intelligence. 2018. 4194-4200.
    [40] George AM. Wordnet:A lexical database for English. Communications of the ACM, 1995, 38(11):39-41.
    [41] Li Z, Feng J, Zhu XY. Movie review mining and summarization. In:Proc. of the ACM 15th Conf. on Information and Knowledge Management. ACM, 2006. 43-50.
    [42] Qiu G, Liu B, Bu J, Chen CL. Opinion word expansion and target extraction through double propagation. Computational Linguistics, 2011, 37(1):9-27.
    [43] Jakob N, Darmstadt TU, Gurevych I. Extraction opinion targets in a single and cross-domain setting with conditional random fields. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. 2010. 1035-1045.
    [44] Xu B, Zhao TJ, Wang SY, Zheng DQ. Extraction of opinion targets based on shallow parsing features. Acta Automatica Sinica, 2011, 37(10):1241-1247 (in Chinese with English abstract). 徐冰, 赵铁军, 王山雨, 郑德权. 基于浅层句法特征的评价对象抽取研究. 自动化学报, 2011, 37(10):1241-1247.
    [45] Chernyshevich M. IHS R & D Belarus:Cross-domain extraction of product features using conditional random fields. In:Proc. of the 8th Int'l Workshop on Semantic Evaluation (SemEval 2014). 2014. 309-313.
    [46] Liu PF, Joty S, Meng H. Fine-grained opinion mining with recurrent neural networks and word embedding. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing. 2015. 1433-1443.
    [47] Toh Z, Su J. NLANGP at SemEval-2016 task 5:Improving aspect based sentiment analysis using neural network features. In:Proc. of the 10th Int'l Workshop on Semantic Evaluation (SemEval-2016). 2016. 282-288.
    [48] Xu H, Liu B, Shu L, Philip SY. Double embeddings and CNN-based sequence labeling for aspect extraction. In:Proc. of the 56th Annual Meeting of the Association for Computational Linguistics. 2018. 592-598.
    [49] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
    [50] Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In:Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. 2015. 2539-2544.
    [51] Ma DL, Li SJ, Wu FZ, Xie X, Wang HF. Exploring sequence-to-sequence learning in aspect term extraction. In:Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. 3538-3547.
    [52] Hasegawa T, Sekine S, Grishman R. Discovering relations among named entities from large corpora. In:Proc. of the 42th Annual Meeting of the Association for Computational Linguistics. 2004. 415-422.
    [53] Chen JX, Ji DH, Tan CL, Niu ZY. Unsupervised feature selection for relation extraction. In:Proc. of the Conf. on Including Posters/Demos and Tutorial Abstracts. 2005. 262-267.
    [54] Popescu AM, Etzioni O. Extracting product features and opinions from reviews. In:Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing. Stroudsburg:ACL, 2007. 9-28.
    [55] García-Pablos A, Cuadros M, Rigau G. V3:Unsupervised aspect based sentiment analysis for semeval2015 task 12. In:Proc. of the 9th Int'l Workshop on Semantic Evaluation (SemEval 2015). 2015. 714-718.
    [56] Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 1998, 30(1-7):107-117.
    [57] Liu Q, Gao Z, Liu B, Zhang YL. Automated rule selection for opinion target extraction. Knowledge-based Systems, 2016, 104:74-88.
    [58] Jin W, Ho HH. A novel lexicalized HMM-based learning framework for web opinion mining. In:Proc. of the 26th Annual Int'l Conf. on Machine Learning. New York:ACM, 2009. 465-472.
    [59] Poria S, Cambria E, Gelbukh A. Aspect extraction for opinion mining with a deep convolutional neural network. Knowledge-based Systems, 2016, 108:42-49.
    [60] Shu L, Xu H, Liu B. Lifelong learning CRF for supervised aspect extraction. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 148-154.
    [61] Chen Z, Liu B. Lifelong machine learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2018, 12(3):1-207.
    [62] Wei ZK, Cheng M, Zhou XB, Li ZF, Zou BW, Hong Y, Yao JM. Convolutional interactive attention mechanism for aspect term. Journal of Computer Research and Development, 2020, 57(11):208-218. (in Chinese with English abstract) 尉桢楷, 程梦, 周夏冰, 李志峰, 邹博伟, 洪宇, 姚建民. 基于类卷积交互式注意力机制的属性抽取研究. 计算机研究与发展, 2020, 57(11):208-218.
    [63] Venugopalan M, Gupta D, Bhatia V. A supervised approach to aspect term extraction using minimal robust features for sentiment analysis. In:Proc. of the Progress in Advanced Computing and Intelligent Engineering. Singapore:Springer, 2021. 237-251.
    [64] Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. Smote:Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 2002, 16:321-357.
    [65] Ho TK. Random decision forests. In:Proc. of the 3th Int'l Conf. on Document Analysis and Recognition, Vol. 1. IEEE, 1995. 278-282.
    [66] Su FL, Xie QH, Huang QA, Qiu JY, Yue ZJ. Semi-supervised method for attribute extraction based on transductive learning. Journal of Shandong University (Natural Science), 2016, 51(3):111-115 (in Chinese with English abstract). 苏丰龙, 谢庆华, 黄清泉, 邱继远, 岳振军. 基于直推式学习的半监督属性抽取. 山东大学学报(理学版), 2016, 51(3):111-115.
    [67] Qu SW, Wu CY, Wang XR. Aspects extraction based on semi-supervised self-training. CAAI Trans. on Intelligent Systems, 2019, 14(04):635-641 (in Chinese with English abstract). 曲昭伟, 吴春叶, 王晓茹. 半监督自训练的方面提取. 智能系统学报, 2019, 14(4):635-641.
    [68] Ansari G, Saxena C, Ahmad T, Doja MN. Aspect term extraction using graph-based semi-supervised learning. Procedia Computer Science, 2020, 167:2080-2090.
    [69] Kingma DP, Welling M. Autoencoding variational bayes. arXiv:1312.6114. 2013, 2013.
    [70] Liao M, Li J, Zhang H, Wang LZ, Wu XX, Wong KF. Coupling global and local context for unsupervised aspect extraction. In:Proc. of the 2019 Conf. on Empirical Methods in Natural Language and the 9th Int'l Joint Conf. on Natural Language Processing. 2019. 4579-4589.
    [71] Zhang L, Qian LF, Xu X. Comment target extraction based on nuclear sentences and syntactic relations. Journal of Chinese Information Processing, 2011, 25(3):23-30 (in Chinese with English abstract). 张莉, 钱玲飞, 许鑫. 基于核心句及句法关系的评价对象抽取. 中文信息学报, 2011, 25(3):23-30.
    [72] Li K, Chen CB, Quan XJ, Ling Q, Song Y. Conditional augmentation for aspect term extraction via masked sequence-to-sequence generation. In:Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. 7056-7066.
    [73] Chen Z, Qian TY. Bridge-based active domain adaptation for aspect term extraction. In:Proc. of the 59th Annual Meeting of the Association for Computational Linguistics. 2021. 317-327.
    [74] Zhang X, Zhao JB, Cun YL. Character-level convolutional networks for text classification. In:Advances in Neural Information Processing Systems. 2015. 649-657.
    [75] Wang WY, Yang DY. That's so annoying!!! A lexical and frame-semantic embedding based data augmentation approach to automatic categorization of annoying behaviors using #petpeeve Tweets. In:Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. 2015. 2557-2563.
    [76] Sosuke K. Contextual augmentation:Data augmentation by words with paradigmatic relations. In:Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies (Short Papers), Vol. 2. 2018. 452-457.
    [77] Wu X, Lv SW, Zang LJ, Han JZ, Hu SL. Conditional BERT contextual augmentation. In:Proc. of the Computational Science. 2019. 84-95.
    [78] Wang WY, Pan SJ, Dahlmeier D, Xiao XK. Recursive neural conditional random fields for aspect-based sentiment analysis. In:Proc. of the 2016 Conf. on Empirical Methods in Natural Language Processing. 2016. 616-626.
    [79] Wu Z, Zhao F, Dai XY, Huang SJ, Chen JJ. Latent opinions transfer network for target-oriented opinion words extraction. In:Proc. of the 34th Association for the Advancement of Artificial Intelligence. 2020, 34(5):9298-9305.
    相似文献
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

徐庆婷,洪宇,潘雨晨,姚建民,周国栋.属性抽取研究综述.软件学报,2023,34(2):690-711

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-08-12
  • 最后修改日期:2022-04-14
  • 在线发布日期: 2022-07-22
  • 出版日期: 2023-02-06
文章二维码
您是第19892860位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号