主页期刊介绍编委会编辑部服务介绍道德声明在线审稿编委办公编辑办公English
2018-2019年专刊出版计划 微信服务介绍 最新一期:2019年第10期
     
在线出版
各期目录
纸质出版
分辑系列
论文检索
论文排行
综述文章
专刊文章
美文分享
各期封面
E-mail Alerts
RSS
旧版入口
中国科学院软件研究所
  
投稿指南 问题解答 下载区 收费标准 在线投稿
刘胜奇,朱东华.基于多策略融合Giza++的术语对齐法.软件学报,2015,26(7):1650-1661
基于多策略融合Giza++的术语对齐法
Automatic Term Alignment Based on Advanced Multi-Strategy and Giza++ Integration
投稿时间:2013-11-03  修订日期:2014-04-09
DOI:10.13328/j.cnki.jos.004615
中文关键词:  术语对齐  多语言术语抽取  跨语言  跨语系
英文关键词:term alignment  multilingual term extraction  cross-language  cross-phylum
基金项目:国防基础科学研究计划(Q172011A001)
作者单位E-mail
刘胜奇 北京理工大学 管理与经济学院, 北京 100081 shengqiliu@126.com 
朱东华 北京理工大学 管理与经济学院, 北京 100081  
摘要点击次数: 1960
全文下载次数: 1969
中文摘要:
      跨语系术语对齐质量不高,原因在于其依赖于低质量的术语抽取与对齐.提出的多策略融合Giza++ (AGiza)的术语对齐法,为提高术语抽取质量,用首尾词性规则提高召回率,用独立过滤、停用过滤提高准确率,再识别共句术语对.为提高术语对齐的对准率:基于独立度、停用度,提出独立相关度、停用相关度;由种子对相关度和单词关联度概率加组合成语义相关度;根据首尾对齐情况,提出首尾相关度,并去除值为0者;基于词性组成特征,构造词性相似度;由GIZA++计算得到g值;经过属性的相关系数分析后,乘法组合各属性构造术语对齐度a;最后,过滤a超过术语对齐阈值(由召回率设定)的术语对.实验结果表明,AGiza术语对齐,可有效地处理跨语系术语对齐,质量高于GIZA++,Dice, F2,LLR,K-VEC及DKVEC.
英文摘要:
      The quality of cross-phylum term alignment depends on the quality of term extraction and alignment method. This paper proposes an automatic term alignment based on advanced multi-strategy and Giza++ (AGiza) integration. By analyzing the properties of the term extraction performed by using some existing methodologies in the literature, the rules of the first and the last part of speech of strings are designed to increase the recall rate. Methods that are applied for the purpose of increasing the precision of the term extraction include: (1) independence filter; (2) stopping filter; and (3) recognition of the co-occurrence of terms in the sentence pairs. The following steps are also implemmented to increase the alignment quality: (1) design the degree of the independence correspondence based on the degree of independence; (2) construct the degree of the stopping correspondence based on the degree of stopping usage; (3) propose the degree of semantic correspondence that computed by the seed pairs' correspondence and word pairs' similarity based on additivity of probability; (4) construct the alignment correspondence degree of the first part and last part between the term pairs in order to cancel the term pairs whose value is equal to zero; (5) present the similarity degree of the part of speech between the term pairs considering the patterns that define the morphosyntactic structures of terms; and (6) obtain the value of g based on GIZA++. The term-aligned degree (a) is computed by the six attributes of term pairs based on multiplication of probability after analyzing their correlations. Term pairs is extracted by select the term-aligned pairs based on the candidate term pairs whose a is more than the term-aligned threshold that make the tolerance of recall is less than 1%. The simulation results of Chinese-English term alignment show that automatic term alignment based on AGiza can be used to extract cross-phylum term pairs effectively. Furthermore, it outperforms GIZA++, the Dice coefficient, the F2 coefficient, the log-likelihood ratio, K-VEC and DKVEC.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器
 

京公网安备 11040202500064号

主办单位:中国科学院软件研究所 中国计算机学会 京ICP备05046678号-4
编辑部电话:+86-10-62562563 E-mail: jos@iscas.ac.cn
Copyright 中国科学院软件研究所《软件学报》版权所有 All Rights Reserved
本刊全文数据库版权所有,未经许可,不得转载,本刊保留追究法律责任的权利