Latent Concept Extraction and Text Clustering Based on Information Theory

微信服务号

微信订阅号

2025-6-6- 15

Home > Archive>Volume 19, Issue 9, 2008 >2276-2284

Latent Concept Extraction and Text Clustering Based on Information Theory
DOI:
                        
                    
Author:
                        LI Xiao-GuangLI Xiao-Guang

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YU GeYU Ge

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Da-LingWANG Da-Ling

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
BAO Yu-BinBAO Yu-Bin

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

To emphasize the fuzzy relation among words, latent concepts, text and topics, an information theory based approach to latent concept extraction and text clustering is proposed. Latent concept variable and topic variable are introduced to reveal such relation, and a global objective function is defined in the theme of rate-distortion theory. An anneal-like algorithm is designed to extract the hierarchical tree of latent concept, and to group the texts under corresponding concept hierarchy at the same time. Furthermore, it determines the number of concept and text clustering result with a concept selection method based on minimal description length criteria. It is a soft co-clustering method and outperforms the ones based on the word space, and current text hard co-clustering method based on latent concept by experiments.

Key words:latent concept; topic; text clustering; Information theory

Get Citation

李晓光,于戈,王大玲,鲍玉斌.基于信息论的潜在概念获取与文本聚类.软件学报,2008,19(9):2276-2284

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:December 28,2006
Revised:August 03,2007
Adopted:
Online:
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History