• Article
  • | |
  • Metrics
  • |
  • Reference [13]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Most existing discriminative training methods adopt smooth loss functions that could be optimized directly. In natural language processing (NLP), however, many applications adopt evaluation metrics taking a form as a step function, such as character error rate (CER). To address the problem, a newly-proposed discriminative training method is analyzed, which is called minimum sample risk (MSR). Unlike other discriminative methods, MSR directly takes a step function as its loss function. MSR is firstly analyzed and improved in time/space complexity. Then an improved version MSR-II is proposed, which makes the computation of interference in the step of feature selection more stable. In addition, experiments on domain adaptation are conducted to investigate the robustness of MSR-II. Evaluations on the task of Japanese text input show that: (1) MSR/MSR-II significantly outperforms a traditional trigram model, reducing CER by 20.9%; (2) MSR/MSR-II is comparable to the other two state-of-the-art discriminative algorithms, Boosting and Perceptron; (3) MSR-II outperforms MSR not only in time/space complexity but also in the stability of feature selection; (4) Experimental results of domain adaptation show the robustness of MSR-II. In all, MSR/MSR-II is a quite effective algorithm. Given its step loss function, MSR/MSR-II could be widely applied to many fields of NLP, such as spelling check and machine translation.

    Reference
    [1]Jelinek F.Self-Organized language modeling for speech recognition.In:Waibel A,Lee KF,eds.Readings in Speech Recognition.San Mateo:Morgan-Kaufmann Publishers,1990.450-506.
    [2]Brown PF,Cocke J,Pietra SAD,Pietra VJD,Jelinek F,Lafferty JD,Mercer RL,Roossin PS.A statistical approach to machine translation.Computational Linguistics,1990,16(2):79-85.
    [3]Gao JF,Suzuki H,Wen Y.Exploring headword dependency and predictive clustering for language modeling.In:Hajic J,Matsumoto Y,eds.Proc.of the Empirical Methods in Natural Language Processing (EMNLP).MACL,2002.248-256.
    [4]Collins M.Discriminative reranking for natural language parsing.In:Langley P,ed.Proc.of the 17th Int'l Conf.on Machine Learning (ICML 2000).San Francisco:Morgan Kaufmann Publishers,2000.175-182.
    [5]Collins M.Discriminative training methods for hidden Markov models:Theory and experiments with perceptron algorithms.In:Hajic J,Matsumoto Y,eds.Proc.of the Empirical Methods in Natural Language Processing (EMNLP).MACL,2002.1-8.
    [6]Gao JF,Yu H,Yuan W,Xu P.Minimum sample risk methods for language modeling.In:Proc.of the Empirical Methods in Natural Language Processing (EMNLP).2005.209-216.http://research.microsoft.com/~jfgao/
    [7]Duda RO,Hart PE,Stork DG.Pattern Classification.2nd ed.,Wiley-Interscience,2000.117-120.
    [8]Press WH,Flannery BP,Teukolsky SA,Vetterling WT.Numerical Recipes in C:The Art of Scientific Computing.2nd ed.,Cambridge:Cambridge University Press,1992.412-419.
    [9]Quirk C,Menezes A,Cherry C.Dependency tree translation:Syntactically informed phrasal SMT.In:Proc.of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL).2005.271-279.http://www.cs.ualberta.ca/~colinc/papers/ ms_acl05.pdf
    [10]Och FJ.Minimum error rate training in statistical machine translation.In:Proc.of the 41st Annual Meeting of the Association for Computational Linguistics (ACL).2003.160-167.http://acl.ldc.upenn.edu/acl2003/main/pdfs/Och.pdf
    [11]Theodoridis S,Koutroumbas K.Pattern Recognition.2nd ed.,Academic Press,2003.182-183.
    [12]Yu H,Gao JF,Bu FL.One new discriminative training method for language modeling.Chinese Journal of Computers,2005,28(10):1708-1715 (in Chinese with English abstract).
    [12]于浩,高剑峰,步丰林.一种新的语言模型判别训练方法.计算机学报,2005,28(10):1708-1715. [1]前期实验证明:当0(≤)α<0.7时,MSR/MSR-Ⅱ的效果都比α=1时要差,因此,我们没有列出这些结果.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

袁伟,高剑峰,步丰林.语言建模中最小化样本风险算法的研究和改进.软件学报,2007,18(2):196-204

Copy
Share
Article Metrics
  • Abstract:4219
  • PDF: 5110
  • HTML: 0
  • Cited by: 0
History
  • Received:January 04,2006
  • Revised:June 12,2006
You are the first2038537Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063