Abstract:Aggregation task for Chinese short texts is to associate a pair of similar short texts together.The pair needs to belong to same entity in two data sets.Such study has important theoretical and practical interests for data resource integration across different fields.In this article, an effective aggregation model is devised for Chinese short text.The model is able to decrease the volume of candidate pairs sharply for matching and ensure the matching accuracy via two key steps, namely fast matching and refined matching.Meanwhile, aiming to the deficiency of the traditional similarity algorithms for short text, an improved similarity algorithm, called generalized Jaro-Winkler is proposed.The aggregation experiments performed on different merchant data sets suggest that the new algorithm has the best performance both in matching accuracy and stability compared with those traditional algorithms.