A Sequence-Based Automatic Text Classification Algorithm
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [8]
  • |
  • Related
  • |
  • Cited by [25]
  • | |
  • Comments
    Abstract:

    An automatic text-classification algorithm based on sequence is presented in this paper. It utilizes the semantic relevance on two levels: relevance between sentences (subpattern) and between keywords which represent specific meaning (concept node) in one sentence. In this way, each keyword can be combined with dynamic weight. For subpatterns which contain no keywords, Markov model is used to estimate the amplitude of their signals, thereby the feature sequence for the text which needs to be classified is created.In the experiment of classifying Chinese documents,it is BEP value is about 83%.Furthermore,it is easy to implement in actual system.

    Reference
    [1] Chute,C.G.An example based mapping method for text categorization and retrieval.ACM Transactions on Information System,1994,12(3):252~277.
    [2] Cohen,W.W.,Singer,Y.Context-Sensitive learning methods for text categorization.ACM Transactions on Information System,1999,17(2):141~173.
    [3] Turle,H.,Croft,B.Evaluation of an inference network net-based retrieval model.ACM Transactions on Information System,1991,9(3):187~222.
    [4] Apte,C.,Damerau,F.Automated learning of decision rules for text categorization.ACM Transactions on Information System,1994,12(3):233~251.
    [5] Belkin,N.J.,Croft,W.B.Information filtering and information retrieval: two sides of the same coin? Communications of the ACM,1994,35(12):29~38.
    [6] Xiang,Jing-cheng,Wang Yi-qing.Singal Detection and Estimation.Beijing: Electronics Industry Press,1994.165~166 (in Chinese).
    [7] Lam,W.,Ruiz,M.,Srinivasan,P.Automatic text categorization and its application to text retrieval.IEEE Transactions on Knowledge and Data Engineering,1999,11(6):865~879.
    [8] 向敬成,王意清.信号检测与估计.北京:电子工业出版社,1994.165~166.
    Related
    Comments
    Comments
    分享到微博
    Submit
Get Citation

解冲锋,李星.基于序列的文本自动分类算法.软件学报,2002,13(4):783-789

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 01,2000
  • Revised:October 30,2000
You are the first2038755Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063