• Article
  • | |
  • Metrics
  • |
  • Reference [7]
  • |
  • Related [20]
  • |
  • Cited by [13]
  • | |
  • Comments
    Abstract:

    Clustering of transactions can find potential useful patterns to improve the product profit. In this paper, a two-step clustering algorithm——CATD is proposed, applicable in large transaction databases. First, the database is divided into partitions in which transactions are partially clustered into a number of subclusters. A hierarchical clustering algorithm is used to control the distance between these subclusters. In the global clustering, a k-medoids clustering algorithm is performed on the subclusters to get a set of k global clusters and identify noise. The algorithm is feasible for large databases because it only scans the original databases once and the clustering process can be performed in main memory due to the partitioning scheme and the support vector representative of subclusters.

    Reference
    [1] Weber, R., Schek, H.-J., Blott S. A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: Ashish, G., Oded, S., Jennifer, W., eds. Proceedings of the 24th International Conference on Very Large Data Bases. New York, USA: Morgan Kaufmann, 1998. 194~205.
    [2] Aggarwal, C.C., Wolf J.L., Yu, P.S. A new methods for similarity indexing of market basket data. In: Alex, D., Christos, F., Shahram, G., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Philadephia, Pennsylvania, USA: ACM Press, 1999. 407~418.
    [3] Han, E., Karypis, G., Kumar, V. Hypergraph based clustering in high-dimensional data sets: a summary of results. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, 1998,21(1):15~22.
    [4] Zhang, T., Ramakrishnan, R., Linvy, M. BIRCH: an efficient data clustering method for very large databases. In: Jagadish, H.V., Mumick, I.S., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Montreal, Canada: ACM Press, 1996. 103~114.
    [5] Guha, S., Rastogi, R., Shim, K. CURE: an efficient clustering algorithm for large database. In: Laura, M.H., Ashutosh, T., eds. Proceedings of the ACM SIGMOD International Conference on Management of Data. Seattle, Washington, USA: ACM Press, 1998. 73~84.
    [6] Cheung, D.W., Hu, K., Xia, S. An adaptive algorithm for mining association rules on shared-memory multi-processors parallel machine. Distributed and Parallel Databases, Kluwer Academic Publishers, (to appear).
    [7] Lang, S.D., Mao, L.-J., Hsu, W.-L. Probabilistic analysis of the RNN-CLINK clustering algorithm. In: Proceedings of the SPIE on Data Mining and Knowledge Discovery: Theory, Tools, and Technology. Orlando, Florida, 1999. 31~38.
    Comments
    Comments
    分享到微博
    Submit
Get Citation

陈宁,陈安,周龙骧.大规模交易数据库的一种有效聚类算法.软件学报,2001,12(4):475-484

Copy
Share
Article Metrics
  • Abstract:4025
  • PDF: 5254
  • HTML: 0
  • Cited by: 0
History
  • Received:July 28,2000
  • Revised:December 19,2000
You are the first2032778Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063