基于统计的汉语词性标注方法的分析与改进

微信服务号

微信订阅号

首页 > 过刊浏览>2000年第11卷第4期 >473-480

基于统计的汉语词性标注方法的分析与改进
DOI:
                        
CSTR:
                        
作者:
                        
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:本文研究得到国家“九五”重点科技攻关项目基金（Nos.96-B08-1-3,98-779-01-02)资助.

Analysis and Improvement of Statistics-Based Chinese Part-of-Speech Tagging

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

从词性概率矩阵与词汇概率矩阵的结构和数值变化等方面,对目前常用的基于统计的汉语词性标注方法中训练语料规模与标注正确率之间所存在的非线性关系作了分析.为了充分利用训练语料库,提高标注正确率,从利用词语相关的语法属性和加强对未知词的处理两个方面加以改进,提高了标注性能.封闭测试和开放测试的正确率分别达到96.5%和96%.

Abstract:

In this paper, a popular statistics－based training and tagging method for Chinese texts is studied, and the nonlinear relation between training set and tagging accuracy is analyzed from the aspects of the structure and numerical value of the matrix of transition probabilities and the matrix of symbol probabilities. In order to make use of training corpus sufficiently and get the higher tagging accuracy, the training and tagging method is improved from two aspects: using other grammatical attributes of words, and strengthening the processing of unknown words. With the improved method, open test and close test showed that the overall accuracies are about 96.5% and 96% respectively.

参考文献

相似文献

引证文献

引用本文

魏欧,吴健,孙玉芳.基于统计的汉语词性标注方法的分析与改进.软件学报,2000,11(4):473-480

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:1998-11-23
最后修改日期:1999-04-21
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码