Interactive Topic Modeling Based on Hierarchical Dirichlet Process
Author:
Affiliation:

Fund Project:

National Natural Science Foundation of China (61472354); National High-Tech R&D Program of China (863) (2012AA12A404)

  • Article
  • | |
  • Metrics
  • |
  • Reference [25]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    With the rapid development of information technology, large amounts of text data have been produced, collected and stored. Topic modeling is one of the important tools in text analysis, and is widely used for large text collection analysis. However, the topic model usually cannot be combined with users' domain knowledge intuitively and effectively during the topic modeling process. In order to solve this problem, this paper proposes an interactive visual analysis system to help users refine generated topic models. First, the hierarchical Dirichlet process is modified to support the word constraints. Then, the generated topic models is displayed via a matrix view to visually reveal the underlying relationship between words and topics, and semantic-preserving word clouds is used to help users find word constraints effectively. User can interactively refine the topic models by adding word constraints. Finally, the applicability of this new system is demonstrated via case studies and user studies.

    Reference
    [1] Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003,3:993-1022.
    [2] Hu Y, Boyd-Graber J, Satinoff B. Interactive topic modeling. In:Proc. of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies (HLT 2011), Vol.1. 2011. 248-257.
    [3] Choo J, Lee C, Reddy CK, Park H. UTOPIAN:User-Driven topic modeling based on interactive nonnegative matrix factorization. IEEE Trans. on Visualization and Computer Graphics, 2013,19(12):1992-2001.[doi:10.1109/TVCG.2013.212]
    [4] Paulovich FV, Toledo FMB, Telles GP, Minghim R, Nonato LG. Semantic wordification of document collections. Computer Graphics Forum, 2012,31(3pt3):1145-1153.[doi:10.1111/j.1467-8659.2012.03107.x]
    [5] Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990,41(6):391-407.[doi:10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9]
    [6] Hofmann T. Probabilistic latent semantic indexing. In:Proc. of the 22nd Annual Int'l ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM Press, 1999. 50-57.[doi:10.1145/312624.312649]
    [7] Griffiths D, Tenenbaum M. Hierarchical topic models and the nested Chinese restaurant process. In:Advances in Neural Information Processing Systems 16:Proc. of the 2003 Conf. 2004.
    [8] Blei D, Lafferty J. Correlated topic models. Advances in Neural Information Processing Systems, 2006,18:147.
    [9] Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 2006, 101(476):1566-1581.[doi:10.1198/016214506000000302]
    [10] Havre S, Hetzler B, Nowell L. Themeriver:Visualizing theme changes over time. In:Proc. of the IEEE Symp. on Information Visualization (InfoVis 2000). IEEE, 2000. 115-123.[doi:10.1109/INFVIS.2000.885098]
    [11] Liu S, Zhou MX, Pan S, Song Y, Qian W, Cai W, Lian X. Tiara:Interactive, topic-based visual text summarization and analysis. ACM Trans. on Intelligent Systems and Technology, 2012,3(2):25.[doi:10.1145/2089094.2089101]
    [12] Dou W, Yu L, Wang X, Ma Z, Ribarsky W. Hierarchicaltopics:Visually exploring large text collections using topic hierarchies. IEEE Trans. on Visualization and Computer Graphics, 2013,19(12):2002-2011.[doi:10.1109/TVCG.2013.162]
    [13] Cui W, Liu S, Tan L, Shi C, Song Y, Gao Z, Qu H, Tong X. Textflow:Towards better understanding of evolving topics in text. IEEE Trans. on Visualization and Computer Graphics, 2011,17(12):2412-2421.[doi:10.1109/TVCG.2011.239]
    [14] Cui W, Liu S, Wu Z, Wei H. How hierarchical topics evolve in large text corpora. IEEE Trans. on Visualization and Computer Graphics, 2014,20(12):2281-2290.[doi:10.1109/TVCG.2014.2346433]
    [15] Xu P, Wu Y, Wei E, Peng TQ, Liu S, Zhu JJ, Qu H. Visual analysis of topic competition on social media. IEEE Trans. on Visualization and Computer Graphics, 2013,19(12):2012-2021.[doi:10.1109/TVCG.2013.221]
    [16] Sun G, Wu Y, Liu S, Peng TQ, Zhu JJH, Liang R. EvoRiver:Visual analysis of topic coopetition on social media. IEEE Trans. on Visualization and Computer Graphics, 2014,20(12):1753-1762.[doi:10.1109/TVCG.2014.2346919]
    [17] Liu YH, Wang CB, Ye P, Zhang K. Analysis of micro-blog diffusion using a dynamic fluid model. Journal of Visualization, 2015, 18(2):201-219.[doi:10.1007/s12650-015-0277-y]
    [18] Cao N, Lu L, Lin YR, Wang F, Wen Z. SocialHelix:Visual analysis of sentiment divergence in social media. Journal of Visualization, 2015,18(2):221-235.[doi:10.1007/s12650-014-0246-x]
    [19] Chaney AJ, Blei DM. Visualizing topic models. In:Proc. of the 6th Int'l Conf. on Weblogs and Social Media. 2012.
    [20] Chuang J, Manning CD, Heer J. Termite:Visualization techniques for assessing textual topic models. In:Proc. of the Int'l Working Conf. on Advanced Visual Interfaces. ACM Press, 2012. 74-77.[doi:10.1145/2254556.2254572]
    [21] Nguyen VA, Hu Y, Boyd-Graber JL, Resnik P. Argviz:Interactive visualization of topic dynamics in multi-party conversations. In:Proc. of the HLT-NAACL. 2013. 36-39.
    [22] Gansner ER, Hu Y, Kobourov S. Gmap:Visualizing graphs and clusters as maps. In:Proc. of the Pacific Visualization Symp. (PacificVis 2010). IEEE, 2010. 201-208.[doi:10.1109/PACIFICVIS.2010.5429590]
    [23] McCormick Jr WT, Schweitzer PJ, White TW. Problem decomposition and data reorganization by a clustering technique. Operations Research, 1972,20(5):993-1009.[doi:10.1287/opre.20.5.993]
    [24] Bostock M, Ogievetsky V, Heer J. D3 data-driven documents. IEEE Trans. on Visualization and Computer Graphics, 2011,17(12):2301-2309.[doi:10.1109/TVCG.2011.185]
    [25] Mao XL, Ming ZY, Chua TS, Li S, Yan H, Li X. SSHLDA:A semi-supervised hierarchical topic model. In:Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 2012. 800-809.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

严宇宇,陶煜波,林海.基于层次狄利克雷过程的交互式主题建模.软件学报,2016,27(5):1114-1126

Copy
Share
Article Metrics
  • Abstract:4862
  • PDF: 6448
  • HTML: 2377
  • Cited by: 0
History
  • Received:July 24,2015
  • Revised:November 09,2015
  • Online: May 06,2016
You are the first2033283Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063