Mining Unstructured Economic Indicators Based on PSP_HDP Topic Model
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61972184, 61562032, 61662027, 61762042); Natural Science Foundation of Jiangxi Province of China (20152ACB20003)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    With the increasing enrichment of economic activity data, a large number of financial texts have emerged on Internet, which contains the influence factors of the economic development. How to mine these economic factors from these texts is the key to conduct economic analysis based on unstructured data. Due to the limitation of manual selection of economic indicators, and the inaccuracy of modelling economic indicators in unstructured texts, the CRF (Chinese restaurant franchise) allocation processes in HDP topic model are extended to a more efficient pattern. In order to describe the dish style in a restaurant, the existing economic taxonomies are used to determine the domain membership of a document. The semantic similarity between words is exploited to define the semantic relevance between words and topics, which reflect the similarity of customers' requirements for dishes. For each word, its representativeness of each topic is employed to evaluate its contribution to the topic, which explains the loyalty of a customer to each dish. By combining documents' domain properties, word semantics and words' presence in topics with HDP topic model, a novel model, PSP_HDP topic model, is proposed. As the PSP_HDP topic model improves documents-topics and topics-words allocation processes, it increases the accuracy of identifying economic topics and distinctiveness of the topics, which leads to a more effective mining of economic topics and economic factors. Experimental results show that the proposed model not only achieves a better performance in terms of topic diversity, topic perplexity and topic complexity, but also is effective in finding more cohesive unstructured economic indicators and economic factors.

    Reference
    Related
    Cited by
Get Citation

张奕韬,万常选,刘喜平,江腾蛟,刘德喜,廖国琼.基于PSP_HDP主题模型的非结构化经济指标挖掘.软件学报,2020,31(3):845-865

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 05,2019
  • Revised:September 10,2019
  • Adopted:
  • Online: January 10,2020
  • Published: March 06,2020
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063