Latent Concept Extraction and Text Clustering Based on Information Theory
DOI:
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    To emphasize the fuzzy relation among words, latent concepts, text and topics, an information theory based approach to latent concept extraction and text clustering is proposed. Latent concept variable and topic variable are introduced to reveal such relation, and a global objective function is defined in the theme of rate-distortion theory. An anneal-like algorithm is designed to extract the hierarchical tree of latent concept, and to group the texts under corresponding concept hierarchy at the same time. Furthermore, it determines the number of concept and text clustering result with a concept selection method based on minimal description length criteria. It is a soft co-clustering method and outperforms the ones based on the word space, and current text hard co-clustering method based on latent concept by experiments.

    Reference
    Related
    Cited by
Get Citation

李晓光,于 戈,王大玲,鲍玉斌.基于信息论的潜在概念获取与文本聚类.软件学报,2008,19(9):2276-2284

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:December 28,2006
  • Revised:August 03,2007
  • Adopted:
  • Online:
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063