Research on Aggregation Model for Chinese Short Texts

doi:10.13328/j.cnki.jos.005147

微信服务号

微信订阅号

2025-6-5- 6

Home > Archive>Volume 28, Issue 10, 2017 >2674-2692. DOI:10.13328/j.cnki.jos.005147

PDF HTML XML Export Cite reminder

Research on Aggregation Model for Chinese Short Texts
DOI:
                        10.13328/j.cnki.jos.005147
                    
Author:
                        LIU ZhenLIU Zhen
Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China;Big Data Research Center, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN JingCHEN Jing
Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHENG Jian-BinZHENG Jian-Bin
Institute of Electronic Payment, China Unionpay Limited Liability Company, Shanghai 201201, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUA Jin-ZhiHUA Jin-Zhi
Institute of Electronic Payment, China Unionpay Limited Liability Company, Shanghai 201201, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIAO Lin-FengXIAO Lin-Feng
Web Sciences Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61300018); China Unionpay-UESTC-Project of Financial Big Data

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Aggregation task for Chinese short texts is to associate a pair of similar short texts together.The pair needs to belong to same entity in two data sets.Such study has important theoretical and practical interests for data resource integration across different fields.In this article, an effective aggregation model is devised for Chinese short text.The model is able to decrease the volume of candidate pairs sharply for matching and ensure the matching accuracy via two key steps, namely fast matching and refined matching.Meanwhile, aiming to the deficiency of the traditional similarity algorithms for short text, an improved similarity algorithm, called generalized Jaro-Winkler is proposed.The aggregation experiments performed on different merchant data sets suggest that the new algorithm has the best performance both in matching accuracy and stability compared with those traditional algorithms.

Key words:Chinese short text;aggregation model;similarity of text;generalized Jaro-Winkler;fast matching;refined matching

Get Citation

刘震,陈晶,郑建宾,华锦芝,肖淋峰.中文短文本聚合模型研究.软件学报,2017,28(10):2674-2692

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:March 03,2016
Revised:September 07,2016
Adopted:
Online: September 30,2017
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History