基于分类规则树的频繁模式文本分类
DOI:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

Supported by the National Natural Science Foundation of China under Grant No.60173027(国家自然科学基金);the Science and Technology Foundation of Education Office of Fujian Province of China under Grant No.JB02069(福建省教育厅科技基金)


Text Categorization Based on Classification Rules Tree by Frequent Patterns
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    基于频繁模式的关联分类是近年来出现的一种分类方法,该方法利用各类别频繁出现的模式构造分类规则,并对新文本进行分类.但现有关联分类方法应用于文本分类时存在两方面不足:一方面,用以构造分类规则的频繁模式仅考虑特征词在文本中出现与否,从而忽视了出现频度;另一方面,当产生的规则数量较多时,为提高分类效率需要进行规则修剪,修剪后的分类准确性明显降低.为此,提出了基于分类规则树的带词频的频繁模式文本分类方法.研究结果表明,词频的引入可以提高关联分类的准确率;而采用分类规则树可使分类时间明显加快又确保不降低分类质量.这两方面的措施弥补了现有关联分类应用于文本分类的不足.与3种典型文本分类方法比较后发现,在低维特征空间中,关联分类的性能优于Bayes,kNN(k nearest neighbor)和SVM(support vectormachines),因此是一种很有应用前景的文本分类方法.

    Abstract:

    Association categorization approach based on frequent patterns has been recently presented, which builds the classification rules according to frequent patterns in various categories and classifies the new text employing these rules. But there are two shortages when the method is applied to classify text data: one is that the method ignores the information about word’s frequency in a text; another is that the rule pruning to improve the classification efficiency will lead to obvious descending of accuracy when mass rules are generated. Therefore, a text categorization algorithm based on frequent patterns with term frequency is presented. This study illuminates that the word frequency is helpful for improving the accuracy of the association categorization and the classification rule tree can improve the efficiency of the association classification. The result of experiments shows the performance of association classification is better than three typical text classification methods Bayes, kNN (k nearest neighbor) and SVM (support vector machines), so it is a promising text classification method.

    参考文献
    相似文献
    引证文献
引用本文

陈晓云,陈袆,王雷,李荣陆,胡运发.基于分类规则树的频繁模式文本分类.软件学报,2006,17(5):1017-1025

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2004-04-15
  • 最后修改日期:2005-05-08
  • 录用日期:
  • 在线发布日期:
  • 出版日期:
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号