Abstract:To emphasize the fuzzy relation among words, latent concepts, text and topics, an information theory based approach to latent concept extraction and text clustering is proposed. Latent concept variable and topic variable are introduced to reveal such relation, and a global objective function is defined in the theme of rate-distortion theory. An anneal-like algorithm is designed to extract the hierarchical tree of latent concept, and to group the texts under corresponding concept hierarchy at the same time. Furthermore, it determines the number of concept and text clustering result with a concept selection method based on minimal description length criteria. It is a soft co-clustering method and outperforms the ones based on the word space, and current text hard co-clustering method based on latent concept by experiments.