统计与规则并举的汉语词性自动标注算法

微信服务号

微信订阅号

2025年4月6日 14:53 星期日

首页 > 过刊浏览>1998年第9卷第2期 >134-138

统计与规则并举的汉语词性自动标注算法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        张 民张 民
哈尔滨工业大学计算机科学与工程系,150001
在期刊界中查找
在百度中查找
在本站中查找
李 生李 生
哈尔滨工业大学计算机科学与工程系,150001
在期刊界中查找
在百度中查找
在本站中查找
赵铁军赵铁军
哈尔滨工业大学计算机科学与工程系,150001
在期刊界中查找
在百度中查找
在本站中查找
张艳风张艳风
哈尔滨工业大学计算机科学与工程系,150001
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:本文研究得到国家863高科技项目基金资助.

Part of Speech Tagging Chinese Corpus Based on Statistics and Rules

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [1]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

本文提出并实现了一种基于定量统计分析优先的统计和规则并举的汉语词性自动标注算法.本算法引入置信区间的概念,优先采用高准确率的定量统计分析技术,然后利用规则标注剩余语料和校正部分统计标注错误.封闭和开放测试表明,在未考虑生词和汉语词错误切分的情况下,本算法的准确率为98.9%和98.1%.

关键词:汉语,词性标注,隐马尔可夫模型,规则,置信区间.

Abstract:

This paper proposes an algorithm of automaticallytagging the POS(part of speech) of Chinese words which is based on integration of the statistical technique and the rule technique with the priority of the quantitative statistical analysis. The confidence intervals in the estimation of parameters is employed in the algorithm, and this makes the high-accuracy quantitative statistical technique as the top priority of tagging a corpus. Then the untagging part of the corpus is tagged in terms of rules, and some errors by statistics can be corrected by rules. Both closed and opened tests indicated that the accuracies of the algorithm are 98.9% and 98.1% respectively without consideration of both unknown words and segmentation errors.

Key words:Chinese, part of speech tagging, hidden Markov model, rule, confidence intervals.

引用本文

张民,李生,赵铁军,张艳风.统计与规则并举的汉语词性自动标注算法.软件学报,1998,9(2):134-138

复制

文章指标

点击次数:3811
下载次数: 4989
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:1996-08-21
最后修改日期:1997-03-20
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码