Abstract:The hierarchical topic model is an important tool to organize topic hierarchy. Most of the existing hierarchical topic models provide tree-structured prior distributions for document topics by introducing the nCRP construction method into the topic models, but they cannot acquire a topic hierarchy with clear domain meanings, referred to as domain topic hierarchy. Meanwhile, there are not only hierarchical relationships among domain topics but also sub-topic aspect sharing relationships under different parent topics. There is no appropriate model that yields such domain topic hierarchy in the current research on topic relationships. In order to automatically and effectively mine the hierarchical and correlated relationships of domain topics from domain texts, improvements are put forward as follows. Firstly, this study improves the nCRP construction method through the topic sharing mechanism and proposes the nCRP+ hierarchical construction method to provide a tree-structured prior distribution with hierarchical topic aspect sharing for topics generated from topic models. Then the reallocated hierarchical Dirichlet processes (rHDP) are developed based on nCRP+ and HDP models, and an rHDP model is proposed. By employing the domain taxonomy, word semantics, and domain representation of topic words, the study defines domain knowledge, including the domain membership degree based on the voting mechanism, the semantic relevance between words and domain topics, and the contribution degree of hierarchical topic words. Finally, domain knowledge is used to improve the allocation processes of domain topics and topic words in the rHDP model, and rHDP with domain knowledge (rHDP_DK) model is proposed to improve the sampling process. The experimental results show that hierarchical topic models based on nCRP+ are superior to those based on nCRP (hLDA and nHDP) and neural topic model (TSNTM) in terms of evaluation metrics. The topic hierarchy, built by the rHDP_DK model, is characterized by clear domain topic hierarchy and explicit domain differences among related sub-topics. Furthermore, the model will provide a general automatic mining framework for domain topic hierarchy.