基于序列的文本自动分类算法

微信服务号

微信订阅号

首页 > 过刊浏览>2002年第13卷第4期 >783-789

基于序列的文本自动分类算法
DOI:
                        
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家"九五"重点科技攻关项目(96-743-01-05-01)

A Sequence-Based Automatic Text Classification Algorithm

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

提出了一种基于序列的文本自动分类算法.该算法利用了文本中两个层次的语义相关性:句子(子模式)之间的相关性和句子内代表特定含义的关键词(概念节点)之间的相关性,这样就实现了对关键词的动态加权.对于不含有关键词的子模式,采用Markov模型来对其信号幅度进行估计,从而生成一个待分类文本的特征序列.在中文文本分类实验中,可以达到83%的BEP值.此外,该算法在实际系统中容易实现.

Abstract:

An automatic text-classification algorithm based on sequence is presented in this paper. It utilizes the semantic relevance on two levels: relevance between sentences (subpattern) and between keywords which represent specific meaning (concept node) in one sentence. In this way, each keyword can be combined with dynamic weight. For subpatterns which contain no keywords, Markov model is used to estimate the amplitude of their signals, thereby the feature sequence for the text which needs to be classified is created.In the experiment of classifying Chinese documents,it is BEP value is about 83％.Furthermore,it is easy to implement in actual system.

参考文献

相似文献

引证文献

引用本文

解冲锋,李星.基于序列的文本自动分类算法.软件学报,2002,13(4):783-789

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2000-08-01
最后修改日期:2000-10-30
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史