Sequence Clustering Algorithms Based on Global and Local Similarity

微信服务号

微信订阅号

2025-5-13- 3

Home > Archive>Volume 21, Issue 4, 2010 >702-717

Sequence Clustering Algorithms Based on Global and Local Similarity
DOI:
                        
                    
Author:
                        DAI Dong-BoDAI Dong-Bo

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
TANG Chun-LeiTANG Chun-Lei

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIONG YunXIONG Yun

Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Many current sequence clustering algorithms are based on the hypothesis that sequence can be characterized by its local features, without differentiating between global similarity and local similarity of sequences in different applications, which is applicable to biological sequences such as DNA and protein with conserved sub-patterns. However, in some domains such as the comparison of customers’ purchase behaviors in retail transaction database and the pattern match in time series data, due to difficulties in forming frequent sub-pattern, it is more reasonable to cluster these sequence data based on global similarity. Besides, among sequence clustering algorithms based on local similarity, the ability that sub-patterns characterize sequence should be improved. So, this paper proposes two clustering algorithms, GSClu (global similarity clustering) and LSClu (local similarity clustering), for different application fields, based on global and local similarity respectively. GSClu uses bisecting k-means technique and CSClu adopts sub-patterns with gap constraint to cluster the sequence data of corresponding application field. Sequence data in the experiments include retail transaction data and protein data. The experimental results show that GSClu and LSClu are of fast processing rate and high clustering quality.

Key words:sequence data; similarity; clustering

Get Citation

戴东波,汤春蕾,熊赟.基于整体和局部相似性的序列聚类算法.软件学报,2010,21(4):702-717

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 09,2008
Revised:February 24,2009
Adopted:
Online:
Published:

You are the first2044080Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History