Automatic Term Alignment Based on Advanced Multi-Strategy and Giza++ Integration
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The quality of cross-phylum term alignment depends on the quality of term extraction and alignment method. This paper proposes an automatic term alignment based on advanced multi-strategy and Giza++ (AGiza) integration. By analyzing the properties of the term extraction performed by using some existing methodologies in the literature, the rules of the first and the last part of speech of strings are designed to increase the recall rate. Methods that are applied for the purpose of increasing the precision of the term extraction include: (1) independence filter; (2) stopping filter; and (3) recognition of the co-occurrence of terms in the sentence pairs. The following steps are also implemmented to increase the alignment quality: (1) design the degree of the independence correspondence based on the degree of independence; (2) construct the degree of the stopping correspondence based on the degree of stopping usage; (3) propose the degree of semantic correspondence that computed by the seed pairs' correspondence and word pairs' similarity based on additivity of probability; (4) construct the alignment correspondence degree of the first part and last part between the term pairs in order to cancel the term pairs whose value is equal to zero; (5) present the similarity degree of the part of speech between the term pairs considering the patterns that define the morphosyntactic structures of terms; and (6) obtain the value of g based on GIZA++. The term-aligned degree (a) is computed by the six attributes of term pairs based on multiplication of probability after analyzing their correlations. Term pairs is extracted by select the term-aligned pairs based on the candidate term pairs whose a is more than the term-aligned threshold that make the tolerance of recall is less than 1%. The simulation results of Chinese-English term alignment show that automatic term alignment based on AGiza can be used to extract cross-phylum term pairs effectively. Furthermore, it outperforms GIZA++, the Dice coefficient, the F2 coefficient, the log-likelihood ratio, K-VEC and DKVEC.

    Reference
    Related
    Cited by
Get Citation

刘胜奇,朱东华.基于多策略融合Giza++的术语对齐法.软件学报,2015,26(7):1650-1661

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:November 03,2013
  • Revised:April 09,2014
  • Adopted:
  • Online: July 02,2015
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063