Chinese Word Sense Disambiguation Based on Maximum Entropy Model with Feature Selection

微信服务号

微信订阅号

2025-5-15- 17

Home > Archive>Volume 21, Issue 6, 2010 >1287-1295

Chinese Word Sense Disambiguation Based on Maximum Entropy Model with Feature Selection
DOI:
                        
                    
Author:
                        HE Jing-ZhouHE Jing-Zhou

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Hou-FengWANG Hou-Feng

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [19]

Related [20]

Cited by

Materials

Comments

Abstract:

Word sense disambiguation (WSD) can be thought as a classification problem. Feature selection is of great importance in such a task. In general, features are selected manually, which requires a deep understanding of the task itself and the employed classification model. In this paper, the effect of feature template on Chinese WSD is studied, and an automatic feature selection algorithm based on maximum entropy model (MEM) is proposed, including uniform feature template selection for all ambiguous words and customized feature template selection for each word. Experimental result shows that automatic feature selection can reduce feature size and improve Chinese WSD performance. Compared with the best evaluation results of SemEval 2007: task #5, this method gets MicroAve (micro-average accuracy）) increase 3.10% and MacroAve (macro-average accuracy）) 2.96% respectively.

Key words:maximum entropy model; classification feature; automatic feature selection; Chinese word sense disambiguation

Reference

[1] Bar-Hillel. The present status of automatic translations of languages. Advances in Computers, 1960,1:91-163.

[2] Pedersen T. A simple approach to building ensembles of naive Bayesian classifiers for word sense disambiguation. In: Proc. of the North American Chapter of the Association for Computational Linguistics (NAACL). 2000. 63-69. http://www.d.umn.edu/～tpederse/pubs/naacl00.pdf

[3] Xing Y. SRCB-WSD: Supervised Chinese word sense disambiguation with key features. In: Proc. of the 4th Int’l Workshop on Semantic Evaluations (SemEval-2007). 2007. 300-303. http://aclweb.org/anthology-new/S/S07/S07-1065.pdf

[4] Yee KO. CITYU-HIF: WSD with human-informed feature preference. In: Proc. of the 4th Int’l Workshop on Semantic Evaluations (SemEval-2007), 2007. 109-112. http://aclweb.org/anthology-new/S/S07/S07-1020.pdf

[5] Jin P, Wu YF, Yu SW. SemEval-2007 Task 5: Multilingual Chinese-English lexical sample. In: Proc. of the 4th Int’l Workshop on Semantic Evaluations (SemEval-2007). 2007. 19-23. http://aclweb.org/anthology-new/S/S07/S07-1004.pdf

[6] Mihalcea R. Co-Training and self-training for word sense disambiguation. In: Proc. of the CoNLL 2004. http:// www.cse.unt.edu/ ~rada/papers/mihalcea.conll04.pdf

[7] Mihalcea R, Chklovski T, Killgariff A. The Senseval-3 English lexical sample task. In: Proc. of the 3rd Int’l Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3). 2004. http://www.cse.unt.edu/～rada/papers/mihalcea2. senseval04.pdf

[8] Pham TP, Ng HT, Lee WS. Word sense disambiguation with semisupervised learning. In: Proc. of the 20th AAAI Conf. on Artificial Intelligence (AAAI-2005). 2005. http://www.comp.nus.edu.sg/～nght/pubs/aaai05-wsd-ssup.pdf

[9] Yarowsky D. Unsupervised word sense disambiguation rivaling supervised methods. In: Proc. of the 33rd Annual Meeting of the Association for Computational Linguistics (ACL 1995). 1995. 189-196. http://www.cs.jhu.edu/～yarowsky/acl95.ps

[10] Quan CQ, He TT, Ji DH, Yu SW. Word sense disambiguation based on multi-classifier decision. Journal of Computer Research and Development, 2006,43(5):933-939 (in Chinese with English abstract).

[11] Wu YF, Wang M, Jin P, Yu SW. Ensemble of classifiers for chinese word sense disambiguation. Journal of Computer Research and Development, 2008,45(8):1354-1361 (in Chinese with English abstract).

[12] Liu FC, Huang DG, Jiang P. Chinese word sense disambiguation with AdaBoost.MH Algorithm. Journal of Chinese Information Processing, 2006,20(3):6-13 (in Chinese with English abstract).

[13] Xu Y, Li JT, Wang B, Sun CM. A category resolve power-based feature selection method. Journal of Software, 2008,19(1):82-89 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/19/82.htm [doi: 10.3724/SP.J.1001.2008.00082]

[14] Vincent Ng, Claire Cardie. Weakly supervised natural language learning without redundant views. In: Proc. of the HLT-NAACL. 2003. 94-101. http://www.hlt.utdallas.edu/~vince/papers.hlt-naacl03.pdf

[15] Berger AL, Pietray SAD, Pietray VJD. A maximum entropy approach to natural language processing. Computational Linguistics, 1996,22(1):1-36.

附中文参考文献: [10] 全昌勤,何婷婷,姬东鸿,余绍文.基于多分类器决策的词义消歧方法，.计算机研究与发展,2006,43(5):933-939.

[11] 吴云芳,王淼,金澎,俞士汶.多分类器集成的汉语词义消歧研究，.计算机研究与发展,2008,45(8):1354-1361.

[12] 刘风成,黄德根,姜鹏.基于AdaBoost.MH算法的汉语多义词消歧，.中文信息学报,2006,20(3):6-13.

[13] 徐燕,李锦涛,王斌,孙春明.基于区分类别能力的高性能特征选择方法，.软件学报,2008,19(1):82-89. http://www.jos.org.cn/1000-9825/19/82.htm [doi: 10.3724/SP.J.1001.2008.00082]

Get Citation

何径舟,王厚峰.基于特征选择和最大熵模型的汉语词义消歧.软件学报,2010,21(6):1287-1295

Copy

Article Metrics

Abstract:6068
PDF: 8051
HTML: 0
Cited by: 0

History

Received:
Revised:February 24,2009
Adopted:
Online:
Published:

You are the first2044680Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History