Domain Topic Hierarchy Model for Topic Aspect Sharing
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The hierarchical topic model is an important tool to organize topic hierarchy. Most of the existing hierarchical topic models provide tree-structured prior distributions for document topics by introducing the nCRP construction method into the topic models, but they cannot acquire a topic hierarchy with clear domain meanings, referred to as domain topic hierarchy. Meanwhile, there are not only hierarchical relationships among domain topics but also sub-topic aspect sharing relationships under different parent topics. There is no appropriate model that yields such domain topic hierarchy in the current research on topic relationships. In order to automatically and effectively mine the hierarchical and correlated relationships of domain topics from domain texts, improvements are put forward as follows. Firstly, this study improves the nCRP construction method through the topic sharing mechanism and proposes the nCRP+ hierarchical construction method to provide a tree-structured prior distribution with hierarchical topic aspect sharing for topics generated from topic models. Then the reallocated hierarchical Dirichlet processes (rHDP) are developed based on nCRP+ and HDP models, and an rHDP model is proposed. By employing the domain taxonomy, word semantics, and domain representation of topic words, the study defines domain knowledge, including the domain membership degree based on the voting mechanism, the semantic relevance between words and domain topics, and the contribution degree of hierarchical topic words. Finally, domain knowledge is used to improve the allocation processes of domain topics and topic words in the rHDP model, and rHDP with domain knowledge (rHDP_DK) model is proposed to improve the sampling process. The experimental results show that hierarchical topic models based on nCRP+ are superior to those based on nCRP (hLDA and nHDP) and neural topic model (TSNTM) in terms of evaluation metrics. The topic hierarchy, built by the rHDP_DK model, is characterized by clear domain topic hierarchy and explicit domain differences among related sub-topics. Furthermore, the model will provide a general automatic mining framework for domain topic hierarchy.

    Reference
    Related
    Cited by
Get Citation

万常选,张奕韬,刘德喜,刘喜平,廖国琼,万齐智.主题方面共享的领域主题层次模型.软件学报,2024,35(4):1790-1818

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 09,2022
  • Revised:June 28,2022
  • Adopted:
  • Online: July 28,2023
  • Published: April 06,2024
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063