A Study and Improvement of Minimum Sample Risk Methods for Language Modeling

微信服务号

微信订阅号

2025-4-25- 8

Home > Archive>Volume 18, Issue 2, 2007 >196-204

A Study and Improvement of Minimum Sample Risk Methods for Language Modeling
DOI:
                        
                    
Author:
                        YUAN WeiYUAN Wei

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GAO Jian-FengGAO Jian-Feng

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BU Feng-LinBU Feng-Lin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [13]

Related [20]

Cited by

Materials

Comments

Abstract:

Most existing discriminative training methods adopt smooth loss functions that could be optimized directly. In natural language processing (NLP), however, many applications adopt evaluation metrics taking a form as a step function, such as character error rate (CER). To address the problem, a newly-proposed discriminative training method is analyzed, which is called minimum sample risk (MSR). Unlike other discriminative methods, MSR directly takes a step function as its loss function. MSR is firstly analyzed and improved in time/space complexity. Then an improved version MSR-II is proposed, which makes the computation of interference in the step of feature selection more stable. In addition, experiments on domain adaptation are conducted to investigate the robustness of MSR-II. Evaluations on the task of Japanese text input show that: (1) MSR/MSR-II significantly outperforms a traditional trigram model, reducing CER by 20.9%; (2) MSR/MSR-II is comparable to the other two state-of-the-art discriminative algorithms, Boosting and Perceptron; (3) MSR-II outperforms MSR not only in time/space complexity but also in the stability of feature selection; (4) Experimental results of domain adaptation show the robustness of MSR-II. In all, MSR/MSR-II is a quite effective algorithm. Given its step loss function, MSR/MSR-II could be widely applied to many fields of NLP, such as spelling check and machine translation.

Key words:language modeling;discriminative training method;input method editor;minimum sample risk;domain adaptation modelingv

Reference

[1]Jelinek F.Self-Organized language modeling for speech recognition.In:Waibel A,Lee KF,eds.Readings in Speech Recognition.San Mateo:Morgan-Kaufmann Publishers,1990.450-506.

[2]Brown PF,Cocke J,Pietra SAD,Pietra VJD,Jelinek F,Lafferty JD,Mercer RL,Roossin PS.A statistical approach to machine translation.Computational Linguistics,1990,16(2):79-85.

[3]Gao JF,Suzuki H,Wen Y.Exploring headword dependency and predictive clustering for language modeling.In:Hajic J,Matsumoto Y,eds.Proc.of the Empirical Methods in Natural Language Processing (EMNLP).MACL,2002.248-256.

[4]Collins M.Discriminative reranking for natural language parsing.In:Langley P,ed.Proc.of the 17th Int'l Conf.on Machine Learning (ICML 2000).San Francisco:Morgan Kaufmann Publishers,2000.175-182.

[5]Collins M.Discriminative training methods for hidden Markov models:Theory and experiments with perceptron algorithms.In:Hajic J,Matsumoto Y,eds.Proc.of the Empirical Methods in Natural Language Processing (EMNLP).MACL,2002.1-8.

[6]Gao JF,Yu H,Yuan W,Xu P.Minimum sample risk methods for language modeling.In:Proc.of the Empirical Methods in Natural Language Processing (EMNLP).2005.209-216.http://research.microsoft.com/～jfgao/

[7]Duda RO,Hart PE,Stork DG.Pattern Classification.2nd ed.,Wiley-Interscience,2000.117-120.

[8]Press WH,Flannery BP,Teukolsky SA,Vetterling WT.Numerical Recipes in C:The Art of Scientific Computing.2nd ed.,Cambridge:Cambridge University Press,1992.412-419.

[9]Quirk C,Menezes A,Cherry C.Dependency tree translation:Syntactically informed phrasal SMT.In:Proc.of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL).2005.271-279.http://www.cs.ualberta.ca/～colinc/papers/ ms_acl05.pdf

[10]Och FJ.Minimum error rate training in statistical machine translation.In:Proc.of the 41st Annual Meeting of the Association for Computational Linguistics (ACL).2003.160-167.http://acl.ldc.upenn.edu/acl2003/main/pdfs/Och.pdf

[11]Theodoridis S,Koutroumbas K.Pattern Recognition.2nd ed.,Academic Press,2003.182-183.

[12]Yu H,Gao JF,Bu FL.One new discriminative training method for language modeling.Chinese Journal of Computers,2005,28(10):1708-1715 (in Chinese with English abstract).

[12]于浩,高剑峰,步丰林.一种新的语言模型判别训练方法.计算机学报,2005,28(10):1708-1715. [1]前期实验证明:当0(≤)α＜0.7时,MSR/MSR-Ⅱ的效果都比α=1时要差,因此,我们没有列出这些结果.

Get Citation

袁伟,高剑峰,步丰林.语言建模中最小化样本风险算法的研究和改进.软件学报,2007,18(2):196-204

Copy

Article Metrics

Abstract:4219
PDF: 5110
HTML: 0
Cited by: 0

History

Received:January 04,2006
Revised:June 12,2006
Adopted:
Online:
Published:

You are the first2038537Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History