Co-Training Framework for Feature Weight Optimization of Statistic Machine Translation

doi:10.3724/SP.J.1001.2012.04208

微信服务号

微信订阅号

2025-4-6- 5

Home > Archive>Volume 23, Issue 12, 2012 >3101-3114. DOI:10.3724/SP.J.1001.2012.04208

PDF HTML XML Export Cite reminder

Co-Training Framework for Feature Weight Optimization of Statistic Machine Translation
DOI:
                        10.3724/SP.J.1001.2012.04208
                    
Author:
                        LIU Shu-JieLIU Shu-Jie
School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Chi-HoLI Chi-Ho
Microsoft Research Asia, Beijing 100080, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI MuLI Mu
Microsoft Research Asia, Beijing 100080, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHOU MingZHOU Ming
Microsoft Research Asia, Beijing 100080, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [25]

Related [20]

Cited by [1]

Materials

Comments

Abstract:

In this paper, based on the investigation of domain adaptation for feature weight, the study proposes to use a co-training framework to handle domain adaptation for feature weight, i.e. The study uses the translation results from another heterogeneous decoder as pseudo references and adds them to the development data set for minimum error rate training to bias the feature weight to the domain of test data set. Furthermore, the study uses a minimum Bayes- Risk combination for pseudo reference selection, which can pick proper translation results from the translation candidates from both decoders to smooth the training process. Experimental results show that this co-training method with a minimum Bayes-Risk combination can yield significant improvements in target domain.

Key words:statistical machine translation; minimum error rate training; domain adaptation; co-training; minimum Bayes-risk combination

Reference

[1] Zhao TJ, et al. The Principle of Machine Translation. Harbin: Harbin Institute of Technology Press, 2001 (in Chinese).

[2] Brown P, Pietra S, Peitra V, Mercer R. The mathematics of statistical machine translation: Parameter estimation. ComputationalLinguistics, 1993,19(2):263-311.

[3] Och FJ, Ney H. Discriminative training and maximum entropy models for statistical machine translation. In: Proc. of theAssociation for Computational Linguistics (ACL). 2002. 295-302. [doi: 10.3115/1073083.1073133]

[4] Koehn P, Schroeder J. Experiments in domain adaptation for statistical machine translation. In: Proc. of the 2nd Workshop onStatistical Machine Translation. 2007. 224-227.

[5] Lü YJ, Huang J, Liu Q. Improving statistical machine translation performance by training data selection and optimization. In: Proc.of the Empirical Methods in Natural Language Processing (EMNLP). 2007. 343-350.

[6] Sanchis-Trilles G, Cettolo M. Online language model adaptation via n-gram mixtures for statistical machine translation. In: Proc. ofthe European Association for Machine Translation (EAMT). 2010.

[7] Wu H, Wang H, Zong C. Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In:Proc. of the Int'l Conf. on Computational Linguistics (COLING). 2008. 993-1000.

[8] Cao J, Lü YJ, Su JS, Liu Q. SMT domain adaptation based on monolingual context information. Journal of Chinese InformationProcessing, 2010,24(6):50-56 (in Chinese with English abstract).

[9] Ueffing N, Haffari G, Sarkar A. Transductive learning for statistical machine translation. In: Proc. of the 2nd Workshop onStatistical Machine Translation. 2007. 25-32.

[10] Li M, Zhao Y, Zhang D, Zhou M. Adaptive development data selection for log-linear model in statistical machine translation. In:Proc. of the Int'l Conf. on Computational Linguistics (COLING). 2010. 662-670.

[11] Wu D. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics, 1997,23(3):377-403.

[12] Xiong D, Liu Q, Lin S. Maximum entropy based phrase reordering model for statistical machine translation. In: Proc. of theAssociation for Computational Linguistics (ACL). 2006. 521-528. [doi: 10.3115/1220175.1220241]

[13] Chiang W. Hierarchical phrase-based translation. Computational Linguistics, 2007,33(2):201-228. [doi: 10.1162/coli.2007.33.2.201]

[14] Och FJ, Ney H. A systematic comparison of various statistical alignment models. Computational Linguistics, 2003,29(1):19-51.

[doi: 10.1162/089120103321337421]

[15] Och FJ. Minimum error rate training in statistical machine translation. In: Proc. of the Association for Computational Linguistics(ACL). 2003. 160-167. [doi: 10.3115/1075096.1075117]

[16] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training. In: Proc. of the Annual Conf. on Learning Theory(COLT). 1998. 92-100. [doi: 10.1145/279943.279962]

[17] Pierce D, Cardie C. Limitations of co-training for natural language learning from large data sets. In: Proc. of the Empirical Methodsin Natural Language Processing (EMNLP). 2001. 1-9.

[18] Sarkar A. Applying co-training methods to statistical parsing. In: Proc. of the North American Chapter of the Association forComputational Linguistics (NAACL). 2001. 1-8. [doi: 10.3115/1073336.1073359]

[19] Hwa R, Osborne M, Anoop S, Mark S. Corrected co-training for statistical parsers. In: Proc. of the ICML 2003 Workshop on theContinuum from Labeled to Unlabeled Data in Machine Learning and Data Mining. 2003.

[20] Wang W, Zhou Z. Analyzing co-training style algorithms. In: Proc. of the European Conf. on Machine Learning (ECML). 2007.454-465. [doi: 10.1007/978-3-540-74958-5_42]

[21] Liang P, Alexandre B, Dan K, Ben T. An end-to-end discriminative approach to machine translation. In: Proc. of the Int'l Conf. onComputational Linguistics (COLING) and the Association for Computational Linguistics (ACL). Sydney, 2006. 761-768. [doi: 10.3115/1220175.1220271]

[22] Watanabe T, Suzuki J, Tsukada H, Isozaki H. Online large-margin training for statistical machine translation. In: Proc. of theEmpirical Methods in Natural Language Processing (EMNLP) and the Special Interest Group on Natural Language Learning of theAssociation for Computational Linguistics (CoNLL). 2007. 764-773.

[23] Rosti AV, Ayan NF, Xiang B, Matsoukas S, Schwartz R, Dorr BJ. Combining outputs from multiple machine translation systems.In: Proc. of the North American Chapter of the Association for Computational Linguistics (NAACL) and Human LanguageTechnologies (HLT). 2007. 228-235.

[24] Koehn P. Statistical significance tests for machine translation evaluation. In: Proc. of the Empirical Methods in Natural LanguageProcessing (EMNLP). 2004. 388-395.

Get Citation

刘树杰,李志灏,李沐,周明.一种面向统计机器翻译的协同权重训练方法.软件学报,2012,23(12):3101-3114

Copy

Article Metrics

Abstract:3862
PDF: 6510
HTML: 0
Cited by: 0

History

Received:September 01,2011
Revised:March 15,2012
Adopted:
Online: December 05,2012
Published:

You are the first2033292Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History