融合句法解析树的汉-越卷积神经机器翻译
作者:
作者简介:

王振晗(1993-),男,学士,CCF学生会员,主要研究领域为自然语言处理,机器翻译.
何建雅琳(1993-),女,硕士,主要研究领域为自然语言处理,机器翻译.
余正涛(1970-),男,博士,教授,博士生导师,CCF高级会员,主要研究领域为自然语言处理,机器翻译,信息检索.
文永华(1979-),男,讲师,CCF学生会员,主要研究领域为自然语言处理,机器翻译.
郭军军(1987-),男,博士,讲师,CCF专业会员,主要研究领域为自然语言处理,神经机器翻译,信息检索.
高盛祥(1977-),女,博士,副教授,CCF专业会员,主要研究领域为自然语言处理,机器翻译,信息检索.

通讯作者:

余正涛,E-mail:ztyu@hotmail.com

基金项目:

国家自然科学基金(61732005,61672271,61761026,61866020);云南省自然科学基金(2018FB04);云南省省级人才培养计划项目(KKSY201703005,KKSY201703015)


Chinese-Vietnamese Convolutional Neural Machine Translation with Incorporating Syntactic Parsing Tree
Author:
Fund Project:

National Natural Science Foundation of China (61732005, 61672271, 61761026, 61866020); National Natural Science Foundation of Yunnan Province (2018FB04); Personal Training Project of the Yunnan Science and Technology Department (KKSY201703005, KKSY201703015)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [27]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    神经机器翻译是目前应用最广泛的机器翻译方法,在语料资源丰富的语种上取得了良好的效果.但是在汉语-越南语这类缺乏双语数据的语种上表现不佳.考虑汉语和越南语在语法结构上的差异性,提出一种融合源语言句法解析树的汉越神经机器翻译方法,利用深度优先遍历得到源语言的句法解析树的向量化表示,将句法向量与源语言词嵌入相加作为输入,训练翻译模型.在汉-越语言对上进行了实验,相较于基准系统,获得了0.6个BLUE值的提高.实验结果表明,融合句法解析树可以有效提高在资源稀缺情况下机器翻译模型的性能.

    Abstract:

    Neural machine translation is the most widely used machine translation method at present, and has sound performance in languages with rich corpus resources. However, it does not work well in languages that lack of bilingual data, such as Chinese-Vietnamese. Taking the difference in grammatical structure between different languages into consideration, this study proposes a neural machine translation method that incorporates syntactic parse tree. In this method, a depth-first search is used to obtain the vectorized representation of the syntactic parse tree of the source language, and the translation model is trained by embedding the obtained vectors and the source language embedding as inputs. This method is implemented on Chinese-Vietnamese, language pair and achieves 0.6 BLUE values improvement compared to the baseline system. This experiment shows that the incorporating syntax parse tree can effectively improve the performance of the machine translation model under the resource scarcity.

    参考文献
    [1] Sutskever I, Vinyals O, Le QV. Sequence to sequence learning with neural networks. In:Proc. of the Advances in Neural Information Processing Systems 27(NIPS 2014). 2014. 3104-3112.
    [2] Eriguchi A, Hashimoto K, Tsuruoka Y. Tree-to-Sequence attentional neural machine translation. In:Proc. of the 54th Annual Meeting of the Association for Computational Linguistics. 2016. 823-833.
    [3] Eriguchi A, Tsuruoka Y, Cho K. Learning to parse and translate improves neural machine translation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 72-78.
    [4] Aharoni R, Goldberg Y. Towards string-to-tree neural machine translation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 132-140.
    [5] Pust M, Hermjakob U, Knight K, Marcu D, May J. Using syntax-based machine translation to parse English into abstract meaning representation. Computer Science, 2015, 482-489.
    [6] Wu NR, Su YL, Liu WW, Ren QDEJ. Mongolian-Chinese machine translation base on CNN etyma morphological selection model. Journal of Chinese Information Processing, 2018,32(5):42-48(in Chinese with English abstract).
    [7] Bao WGDL, Zhao XB. Mongolian-Chinese neural machine translation base on RNN and CNN. Journal of Chinese Information Processing, 2018,32(8):60-67(in Chinese with English abstract).
    [8] Gehring J, Auli M, Grangier D, Grangier D, Yarats D, Dauphin YN. Convolutional sequence to sequence learning. In:Proc. of the 34th Int'l Conf. on Machine Learning (ICML 2017), Vol.70. 2017. 1243-1252.
    [9] Meng FD, Lu ZD, Wang MX, Li H, Jiang WB, Liu Q. Encoding source language with convolutional neural network for machine translation. In:Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int'l Joint Conf. on Natural Language Processing. 2015. 20-30.
    [10] Marcheggiani D, Titov I. Encoding sentences with graph convolutional networks for semantic role labeling. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 1506-1515.
    [11] Gehring J, Auli M, Grangier D, Dauphin Y. A convolutional encoder model for neural machine translation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 123-135
    [12] Trinh M, Tran P, Tran N. Collecting Chinese-Vietnamese texts from bilingual websites. In:Proc. of the 5th NAFOSTED Conf. on Information and Computer Science (NICS). 2018. 260-264.
    [13] Tran P, Dinh D, Nguyen LHB. Word re-segmentation in Chinese-Vietnamese machine translation. ACM Trans. on Asian and Low-Resource Language Information Processing, 2016,16(2):1-22.
    [14] Huu AT, Huang HY, Guo Y, Shi SM, Jian P. Integrating pronunciation into Chinese-Vietnamese statistical machine translation. Tsinghua Science and Technology, 2018,23(6):83-91.
    [15] Phuoc T, Dien D, Nguyen HT. A character level based and word level based approach for Chinese-Vietnamese machine translation. Computational Intelligence and Neuroscience, 2016,2016(2):1-11.
    [16] Tran P, Le T, Dinh D, et al. Handling organization name unknown word in Chinese-Vietnamese machine translation. In:Proc. of the 2013 RIVF Int'l Conf. on Computing & Communication Technologies-Research, Innovation, and Vision for Future (RIVF). 2013. 242-247.
    [17] He YJL, Yu ZT, Lv CT, Lai H, Gao SX, Zhang Y. Language post positioned characteristic based Chinese-Vietnamese statistical machine translation method. In:Proc. of the 21st Int'l Conf. on Asian Language Processing (IALP). 2017.
    [18] Wu SZ, Zhang DD, Yang N, Li M, Zhou M. Sequence-to-dependency neural machine translation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 698-707.
    [19] Chen HD, Huang SJ, Chiang D, Chen JJ. Improved neural machine translation with a syntax-aware encoder and decoder. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 1936-1945.
    [20] Zhang MS, Li ZH, Fu GH, Zhang M. Syntax-Enhanced neural machine translation with syntax-aware word representations. In:Proc. of the NAACL 2019. 2019. 1151-1161.
    [21] Li JH, Xiong DY, Tu ZP, Zhu MH, Zhang M, Zhou GD. Modeling source syntax for neural machine translation. In:Proc. of the 55th Annual Meeting of the Association for Computational Linguistics. 2017. 688-697.
    [22] Levy R, Manning C. Is it harder to parse Chinese, or the Chinese treebank? In:Proc. of the 41st Annual Meeting on Association for Computational Linguistics. 2003. 439-446.
    [23] Li Y, Guo JY, Yu ZT, XianYT, Chen W. Construction the Vietnamese phrase treebank by fusion of cietnamese grammatical features and improved PCFG. Journal of Nanjing University (Natural Sciences), 2017,(2):155-165(in Chinese with English abstract).
    附中文参考文献:
    [6] 乌尼尔,苏依拉,刘婉婉,仁庆道尔吉.基于CNN词根形态选择模型的改进蒙汉机器翻译研究.中文信息学报,2018,32(5):42-48.
    [7] 包乌格德勒,赵小兵.基于RNN和CNN的蒙汉神经机器翻译研究.中文信息学报,2018,32(8):60-67.
    [23] 李英,郭剑毅,余正涛,等.融合越南语语言特征与改进PCFG的越南语短语树库构建.南京大学学报(自然科学),2017,(2):155-165.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王振晗,何建雅琳,余正涛,文永华,郭军军,高盛祥.融合句法解析树的汉-越卷积神经机器翻译.软件学报,2020,31(12):3797-3807

复制
分享
文章指标
  • 点击次数:1267
  • 下载次数: 4001
  • HTML阅读次数: 1793
  • 引用次数: 0
历史
  • 收稿日期:2019-04-24
  • 最后修改日期:2019-07-20
  • 在线发布日期: 2020-12-03
  • 出版日期: 2020-12-06
文章二维码
您是第19754386位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号