短文本分类的ResLCNN模型

微信服务号

微信订阅号

2025年2月19日 0:34 星期三

首页 > 过刊浏览>2017年第28卷第s2期 >61-69

PDF HTML阅读 XML下载导出引用引用提醒

短文本分类的ResLCNN模型
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        王俊丽王俊丽
同济大学 电子与信息工程学院 计算机科学与技术系, 上海 201804
在期刊界中查找
在百度中查找
在本站中查找
杨亚星杨亚星
同济大学 电子与信息工程学院 计算机科学与技术系, 上海 201804
在期刊界中查找
在百度中查找
在本站中查找
王小敏王小敏
同济大学 电子与信息工程学院 计算机科学与技术系, 上海 201804
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家高技术研究发展计划（863）（2015IM030300）；上海市科技创新计划（15DZ1101202）；上海市科委项目（14JC1405800）；同济大学中央高校基本科研业务费

ResLCNN Model for Short Text Classification

Author:

WANG Jun-Li
WANG Jun-Li
Department of Computer Science and Technology, College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
在期刊界中查找
在百度中查找
在本站中查找
YANG Ya-Xing
YANG Ya-Xing
Department of Computer Science and Technology, College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Xiao-Min
WANG Xiao-Min
Department of Computer Science and Technology, College of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [39]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

短文本分类是互联网文本数据处理中的关键任务之一.长短时记忆网络LSTM（long short-term memory）和卷积神经网络CNN（convolutional neural network）是广泛应用于短文本分类任务的两种深度学习模型.在计算机视觉和语音识别领域的深度学习研究表明，深层次的神经网络模型具有较好的表达数据特征的能力.受此启发，面向文本深度学习分类问题，提出基于3层LSTM和CNN网络结构的ResLCNN（residual-LSTM-CNN）深度学习模型.该模型有效结合LSTM获取文本序列数据的长距离依赖特征和CNN通过卷积操作获取句子局部特征的优势，同时借鉴残差模型理论，在第1层LSTM层与CNN层之间加入恒等映射，构建残差层，缓解深层模型梯度消失问题.为了探究深层短文本分类中ResLCNN模型的文本分类能力，在多种数据集上将其与LSTM、CNN及其组合模型进行对比实验.结果表明，相比于单层LSTM与CNN组合模型，ResLCNN深层模型在MR、SST-2和SST-5数据集上分别提高了1.0%、0.5%、0.47%的准确率，取得了更好的分类效果.

关键词:深度学习模型;短文本分类;长短时记忆网络;卷积神经网络;残差网络

Abstract:

The short text classification is a key task in the field of Internet text data processing. The long short-term memory (LSTM) and convolutional neural network (CNN) are the two most important deep learning models for short text classification. The research on the deep learning in the field of computer vision and speech recognition shows that the deep level of neural network model has better ability to express data features. Inspired by this, a deep learning model named ResLCNN (residual-LSTM-CNN) is proposed based on the structure of three LSTM layers and one CNN layer for text deep learning classification problem. In this model, the LSTM layer is used to capture long distance dependency features of the sequence data and the CNN layer can extract local features of the sentence by convolution operators. The ResLCNN model combines the advantages of LSTM and CNN effectively. At the same time, based on the residual model theory, the ResLCNN model adds an identity mapping between the first LSTM layer and CNN layer to alleviate the problem of vanishing gradients. In order to explore the ability of ResLCNN model for deep short text classification, some experiments are made on several data sets to compare with LSTM, CNN and their combination models. The result shows that compared with the single LSTM and CNN combination model, the ResLCNN deep model improves the accuracy rate by 1.0%, 0.5% and 0.47% respectively on the data sets of MR, SST-2 and SST-5 and achieves better classification results.

Key words:deep learning model;short text classification;long short-term memory networks;convolutional neural networks;residual network

参考文献

[1] Huang FL, Yu G, Zhang JL, Li CX, Yuan CA, Lu JL. Mining topic sentiment in micro-blogging based on microblogger social relation. Ruan Jian Xue Bao/Journal of Software, 2017,28(3):694-707(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5157.htm[doi:10.13328/j.cnki.jos.005157]

[2] He YX, Sun ST, Niu FF, Li F. A deep learning model enhanced with emotion semantics for microblog sentiment analysis. Chinese Journal of Computers, 2017,40(4):773-790(in Chinese with English abstract).

[3] Peng Y, Wan CX, Jiang TJ, Liu DX, Liu XP, Liao GQ. Extracting product aspects and user opinions based on semantic constrained LDA model. Ruan Jian Xue Bao/Journal of Software, 2017,28(3):676-693(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5154.htm[doi:10.13328/j.cnki.jos.005154]

[4] Hinton GE. Learning distributed representations of concepts. In:Proc. of the 8th Annual Conf. of the Cognitive Science Society. 1986. 1-12.

[5] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301. 3781, 2013.

[6] Gu JX, Wang ZH, Kuen J, Ma LY, Shahroudy A, Shuai B, Liu T, Wang XX, Wang L, Wang G, Cai JF, Chen T. Recent advances in convolutional neural networks. arXiv preprint arXiv:1512.07108, 2015.

[7] Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Darrell T, Saenko K. Long-Term recurrent convolutional networks for visual recognition and description. In:Proc. of the 2015 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Washington:IEEE Computer Society, 2015. 2625-2634.

[8] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In:Proc. of the 2016 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR). Washington:IEEE Computer Society, 2016. 770-778.

[9] Pennington J, Socher R, Manning C. Glove:Global vectors for word representation. In:Moschitti A, et al., eds. Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg:Association for Computational Linguistics, 2014. 1532-1543.

[10] Cun YL, Boser B, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD. Handwritten digit recognition with a back-propagation network. Advances in Neural Information Processing Systems, 1990,2(2):396-404.

[11] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011,12:2493-2537.

[12] Kalchbrenner N, Grefenstette E, Blunsom P. A convolutional neural network for modelling sentences. In:Proc. of the 52nd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2014. 655-665.

[13] Kim Y. Convolutional neural networks for sentence classification. In:Moschitti A, et al., eds. Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg:Association for Computational Linguistics, 2014. 1746-1751.

[14] Wang P, Xu JX, Xu B, Liu CL, Zhang H, Wang FY, Hao HW. Semantic clustering and convolutional neural network for short text categorization. In:Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int'l Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2015. 352-357.

[15] Ma M, Huang L, Zhou B, Xiang B. Dependency-Based convolutional neural networks for sentence embedding. In:Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int'l Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2015. 174-179.

[16] Mou L, Peng H, Li G, Xu Y, Zhang L, Jin Z. Discriminative neural sentence modeling by tree-based convolution. In:Màrquez L, et al., eds. Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg:Association for Computational Linguistics, 2015. 2315-2325.

[17] Mou L, Li G, Zhang L, Wang T, Jin Z. Convolutional neural networks over tree structures for programming language processing. In:Schuurmans D, et al., eds. Proc. of the 13th AAAI Conf. on Artificial Intelligence. New York:AAAI Press, 2016. 1287-1293.

[18] Yin W, Schütze H. Multichannel variable-size convolution for sentence classification. In:Alishahi A, et al., eds. Proc. of the 19th Conf. on Computational Natural Language Learning. Stroudsburg:Association for Computational Linguistics, 2015. 204-214.

[19] Kim J, Rousseau F, Vazirgiannis M. Convolutional sentence kernel from word embeddings for short text categorization. In:Màrquez L, et al., eds, Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg:Association for Computational Linguistics, 2015. 775-780.

[20] Irsoy O, Cardie C. Deep recursive neural networks for compositionality in language. Advances in Neural Information Processing Systems, 2014(3):2096-2104.

[21] Graves A, Jaitly N, Mohamed AR. Hybrid speech recognition with deep bidirectional LSTM. In:Proc. of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Piscataway:IEEE, 2013. 273-278.

[22] Tai KS, Socher R, Manning CD. Improved semantic representations from tree-structured long short-term memory networks. In:Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int'l Joint Conf. on Natural Language Processing of the Asian Federation of Natural Language Processing. Stroudsburg:Association for Computational Linguistics, 2015. 1556-1566.

[23] Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y. Show, attend and tell:Neural image caption generation with visual attention. In:Bach FR, et al., eds. Proc. of the 32nd Int'l Conf. on Machine Learning (ICML). Cambridge:JMLR, 2015. 2048-2057.

[24] Sainath TN, Vinyals O, Senior AW, Sak H. Convolutional, long short-term memory, fully connected deep neural networks. In:Proc. of the 2015 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing. Piscataway:IEEE, 2015. 4580-4584.

[25] Chiu JPC, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. In:Nagata M, ed. Proc. of the Trans. of the Association for Computational Linguistics. 2016. 357-370.

[26] Kim Y, Jernite Y, Sontag D, Rush AM. Character-Aware neural language models. In:Schuurmans D, et al., eds. Proc. of the 13th AAAI Conf. on Artificial Intelligence. New York:AAAI Press, 2016. 2741-2749

[27] Zhou C, Sun C, Liu Z, Lau FCM. A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630, 2015.

[28] Zhang R, Lee H, Radev D. Dependency sensitive convolutional neural networks for modeling sentences and documents. In:Knight K, et al., eds. Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. Stroudsburg:Association for Computational Linguistics, 2016. 1512-1521.

[29] Srivastava RK, Greff K, Schmidhuber J. Training very deep networks. arXiv preprint arXiv:1507.06228, 2015.

[30] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. arXiv preprint arXiv:1603.05027, 2016.

[31] Bo P, Lee L. Seeing stars:Exploiting class relationships for sentiment categorization with respect to rating scales. In:Knight K, et al., eds. Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2005. 115-124.

[32] Socher R, Perelygin A, Wu J, Chuang J, Manning C, Ng A, Potts C. Recursive deep models for semantic compositionality over a sentiment treebank. In:Proc. of the Conf. on Empirical Methods in Natural Language Processing (EMNLP 2013). 2013. 1631-1642.

[33] Li X, Roth D. Learning question classifiers. In:Proc. of the 19th Int'l Conf. on Computational Linguistics. Stroudsburg:Association for Computational Linguistics, 2002. 1-7.

[34] Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In:Proc. of the 25th Int'l Conf. on Neural Information Processing Systems. Curran Associates Inc., 2012. 1097-1105.

[35] Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.

附中文参考文献:

[1] 黄发良,于戈,张继连,李超雄,元昌安,卢景丽.基于社交关系的微博主题情感挖掘.软件学报,2017,28(3):694-707. http://www.jos.org.cn/1000-9825/5157.htm[doi:10.13328/j.cnki.jos.005157]

[2] 何炎祥,孙松涛,牛菲菲,李飞.用于微博情感分析的一种情感语义增强的深度学习模型.计算机学报,2017,40(4):773-790.

[3] 彭云,万常选,江腾蛟,刘德喜,刘喜平,廖国琼.基于语义约束LDA的商品特征和情感词提取.软件学报,2017,28(3):676-693. http://www.jos.org.cn/1000-9825/5154.htm[doi:10.13328/j.cnki.jos.005154]

引用本文

王俊丽,杨亚星,王小敏.短文本分类的ResLCNN模型.软件学报,2017,28(s2):61-69

复制

文章指标

点击次数:2805
下载次数: 6116
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2017-06-30
最后修改日期:
录用日期:
在线发布日期: 2018-01-05
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码