主页期刊介绍编委会编辑部服务介绍道德声明在线审稿编委办公编辑办公English
2018-2019年专刊出版计划 微信服务介绍 最新一期:2019年第10期
     
在线出版
各期目录
纸质出版
分辑系列
论文检索
论文排行
综述文章
专刊文章
美文分享
各期封面
E-mail Alerts
RSS
旧版入口
中国科学院软件研究所
  
投稿指南 问题解答 下载区 收费标准 在线投稿
王厚峰,王波.基于句子对齐的汉语句法结构推导的计算模型.软件学报,2007,18(3):538-546
基于句子对齐的汉语句法结构推导的计算模型
A Computational Model for Chinese Syntactic Structure Induction Based on Sentence Alignment
投稿时间:2004-01-26  修订日期:2006-04-12
DOI:
中文关键词:  句子对齐  无指导学习  边界摩擦  相同优先  相异优先  汉语句法结构推导
英文关键词:sentence alignment  unsupervised learning  boundary friction  similarity priority  difference priority  Chinese syntactic structure induction
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60473138,60675035(国家自然科学基金)
作者单位
王厚峰 北京大学,信息科学技术学院,计算语言学研究所,北京,100871 
王波 北京大学,信息科学技术学院,计算语言学研究所,北京,100871 
摘要点击次数: 3211
全文下载次数: 2750
中文摘要:
      基于句子的相似性,提出了无指导的汉语句法结构推导方法.基本思想是:首先,在汉语句子库的基础上,通过句对之间的对齐,得到交替的相同片断和相异片断.然后,根据相同片断优先或相异片断优先策略,选取相应的对齐片断作为句子成分候选,并对可能因片断交叉而导致边界摩擦的候选进行歧义消解.最后,通过逐步归约句子成分,推导出汉语句法结构树.为了避免对齐过程中词的稀疏问题,还对部分具有明显规律的词事先作了归类处理.分别以词、词性以及词联合词性作为句子基本构成单元,评测了推导的句法结果.测试结果表明:对于3种构成单元,相异片断
英文摘要:
      This paper introduces an unsupervised learning framework of Chinese syntactic structure based sentences similarity. First, all sentence pairs in the Chinese sentence corpus are aligned, and each pair is partitioned into similarity segmentations and different ones which alternately occur, Then, aligned similarity segmentations or different ones are selected as potential constituent candidates based on the strategy of similarity priority or of difference priority respectively. As the boundary friction may be introduced in the later step, its disambiguation is further carried out. Finally, by inducing sentence constituents, the syntactic structures are learned. In order to reduce word sparseness in the process, some words are replaced by classes in advance. Three forms of the sentence units, such as the sequence of words, the sequence of POS (part of speech)-tags and the sequence of words with POS-tag, are examined and the learned syntactic structures are evaluated respectively. The results show that different priority strategy achieves a better performance than the similarity one, and the Fs are above 46% for all three forms, with the best one being 49.52%, which is better than those having been reported.
HTML  下载PDF全文  查看/发表评论  下载PDF阅读器
 

京公网安备 11040202500064号

主办单位:中国科学院软件研究所 中国计算机学会 京ICP备05046678号-4
编辑部电话:+86-10-62562563 E-mail: jos@iscas.ac.cn
Copyright 中国科学院软件研究所《软件学报》版权所有 All Rights Reserved
本刊全文数据库版权所有,未经许可,不得转载,本刊保留追究法律责任的权利