Interactive Topic Modeling Based on Hierarchical Dirichlet Process
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61472354); National High-Tech R&D Program of China (863) (2012AA12A404)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the rapid development of information technology, large amounts of text data have been produced, collected and stored. Topic modeling is one of the important tools in text analysis, and is widely used for large text collection analysis. However, the topic model usually cannot be combined with users' domain knowledge intuitively and effectively during the topic modeling process. In order to solve this problem, this paper proposes an interactive visual analysis system to help users refine generated topic models. First, the hierarchical Dirichlet process is modified to support the word constraints. Then, the generated topic models is displayed via a matrix view to visually reveal the underlying relationship between words and topics, and semantic-preserving word clouds is used to help users find word constraints effectively. User can interactively refine the topic models by adding word constraints. Finally, the applicability of this new system is demonstrated via case studies and user studies.

    Reference
    Related
    Cited by
Get Citation

严宇宇,陶煜波,林海.基于层次狄利克雷过程的交互式主题建模.软件学报,2016,27(5):1114-1126

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 24,2015
  • Revised:November 09,2015
  • Adopted:
  • Online: May 06,2016
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063