Improved Stumps Combined by Boosting for Text Categorization
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Stumps, classification trees with only one split at the root node, have been shown by Schapire and Singer to be an effective method for text categorization when embedded in a boosting algorithm as its base classifiers. In their experiments, the splitting point (the partition) of each stump is decided by whether a certain term appears or not in a text document, which is too weak to obtain satisfied accuracy even after they are combined by boosting, and therefore the iteration times needed by boosting is sharply increased as an indicator of low efficiency. To improve these base classifiers, an idea is proposed in this paper to decide the splitting point of each stump by all the terms of a text document. Specifically, it employs the numerical relationship between the similarities of the VSM-vector of text document and the representational VSM-vector of each class as the partition criteria of the base classifiers. Meanwhile, to further facilitate its convergence, the boosting weights assigned to sample documents are introduced to the computation of representational VSM-vectors for possible classes dynamically. Experimental results show that the algorithm is both more efficient for training and more effective than its predecessor for fulfilling text categorization tasks. This trend seems more conspicuous along with the incensement of problem scale.

    Reference
    Related
    Cited by
Get Citation

刁力力,胡可云,陆玉昌,石纯一.用Boosting方法组合增强Stumps进行文本分类.软件学报,2002,13(8):1361-1367

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:October 15,2001
  • Revised:February 06,2002
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063