一种基于判别式重排序的拼写校正方法

微信服务号

微信订阅号

2025年6月2日 22:43 星期一

首页 > 过刊浏览>2008年第19卷第3期 >557-564

一种基于判别式重排序的拼写校正方法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        张 扬张 扬
天津大学 计算机科学与技术学院,天津 300072
在期刊界中查找
在百度中查找
在本站中查找
何丕廉何丕廉
天津大学 计算机科学与技术学院,天津 300072
在期刊界中查找
在百度中查找
在本站中查找
向 伟向 伟
香港科技大学 计算机系,香港
在期刊界中查找
在百度中查找
在本站中查找
李 沐李 沐
微软亚洲研究院,北京 100080
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60603027(国家自然科学基金);the Science Technology Development Projeet of Tianjin of China under Grant No.04310941R(天津市科技发展计划):the Applied Basic Research project of Tianjin of China under Grant No.05YFJMJC11700(天津市应用基础研究计划)

A Discriminative Reranking Approach to Spelling Correction

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

提出一种基于判别模型的拼写校正方法.它针对已有拼写校正系统Aspell的输出进行重排序,使用判别模型Ranking SVM来改进其性能.将现今较为成熟的拼写校正技术(包括编辑距离、基于字母的n元语法、发音相似度和噪音信道模型)以特征的形式整合到该模型中来,显著地提高了基准系统Aspell的初始排序质量,同时性能也超过了一些商用系统(如Microsoft Word 2003)的拼写校正模块.此外,还提出了一种在搜索引擎查询日志链中自动抽取拼写校正训练对的方法.基于这种方法训练的模型获得了基于人工标注数据所得结果相近的性能,它们分别将基准系统的错误率降低了32.2%和32.6%.

关键词:拼写校正;判别模型;重排序;日志挖掘;查询链

Abstract:

This paper proposes an approach to spelling correction. It reranks the output of an existing spelling corrector, Aspell. A discriminative model (Ranking SVM) is employed to improve upon the initial ranking, using additional features as evidence. These features are derived from state-of-the-art techniques in spelling correction, including edit distance, letter-based n-gram, phonetic similarity and noisy channel model. This paper also presents a method to automatically extract training samples from the query log chain. The system outperforms the baseline Aspell greatly, as well as the previous models and several off-the-shelf systems (e.g. spelling corrector in Microsoft Word 2003). The experimental results based on query chain pairs are comparable to that based on manually-annotated pairs, with 32.2%/32.6% reduction in error rate, respectively.

Key words:spelling correction; discriminative model; reranking; log mining; query chain

引用本文

张扬,何丕廉,向伟,李沐.一种基于判别式重排序的拼写校正方法.软件学报,2008,19(3):557-564

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2006-05-03
最后修改日期:2007-02-05
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码