Research on Aggregation Model for Chinese Short Texts
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61300018); China Unionpay-UESTC-Project of Financial Big Data

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Aggregation task for Chinese short texts is to associate a pair of similar short texts together.The pair needs to belong to same entity in two data sets.Such study has important theoretical and practical interests for data resource integration across different fields.In this article, an effective aggregation model is devised for Chinese short text.The model is able to decrease the volume of candidate pairs sharply for matching and ensure the matching accuracy via two key steps, namely fast matching and refined matching.Meanwhile, aiming to the deficiency of the traditional similarity algorithms for short text, an improved similarity algorithm, called generalized Jaro-Winkler is proposed.The aggregation experiments performed on different merchant data sets suggest that the new algorithm has the best performance both in matching accuracy and stability compared with those traditional algorithms.

    Reference
    Related
    Cited by
Get Citation

刘震,陈晶,郑建宾,华锦芝,肖淋峰.中文短文本聚合模型研究.软件学报,2017,28(10):2674-2692

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:March 03,2016
  • Revised:September 07,2016
  • Adopted:
  • Online: September 30,2017
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063