Abstract:With the increasing enrichment of economic activity data, a large number of financial texts have emerged on Internet, which contains the influence factors of the economic development. How to mine these economic factors from these texts is the key to conduct economic analysis based on unstructured data. Due to the limitation of manual selection of economic indicators, and the inaccuracy of modelling economic indicators in unstructured texts, the CRF (Chinese restaurant franchise) allocation processes in HDP topic model are extended to a more efficient pattern. In order to describe the dish style in a restaurant, the existing economic taxonomies are used to determine the domain membership of a document. The semantic similarity between words is exploited to define the semantic relevance between words and topics, which reflect the similarity of customers' requirements for dishes. For each word, its representativeness of each topic is employed to evaluate its contribution to the topic, which explains the loyalty of a customer to each dish. By combining documents' domain properties, word semantics and words' presence in topics with HDP topic model, a novel model, PSP_HDP topic model, is proposed. As the PSP_HDP topic model improves documents-topics and topics-words allocation processes, it increases the accuracy of identifying economic topics and distinctiveness of the topics, which leads to a more effective mining of economic topics and economic factors. Experimental results show that the proposed model not only achieves a better performance in terms of topic diversity, topic perplexity and topic complexity, but also is effective in finding more cohesive unstructured economic indicators and economic factors.