Text Categorization Based on Classification Rules Tree by Frequent Patterns
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Association categorization approach based on frequent patterns has been recently presented, which builds the classification rules according to frequent patterns in various categories and classifies the new text employing these rules. But there are two shortages when the method is applied to classify text data: one is that the method ignores the information about word’s frequency in a text; another is that the rule pruning to improve the classification efficiency will lead to obvious descending of accuracy when mass rules are generated. Therefore, a text categorization algorithm based on frequent patterns with term frequency is presented. This study illuminates that the word frequency is helpful for improving the accuracy of the association categorization and the classification rule tree can improve the efficiency of the association classification. The result of experiments shows the performance of association classification is better than three typical text classification methods Bayes, kNN (k nearest neighbor) and SVM (support vector machines), so it is a promising text classification method.

    Reference
    Related
    Cited by
Get Citation

陈晓云,陈袆,王雷,李荣陆,胡运发.基于分类规则树的频繁模式文本分类.软件学报,2006,17(5):1017-1025

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:April 15,2004
  • Revised:May 08,2005
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063