Abstract:In data mining, lots of clustering algorithms have been developed, and most of them are limited by scalability and interpretability. To solve this problem, a concept-based data clustering model is presented. From the perspective of the metadata describing samples, some basic concepts are extracted from the preprocessed dataset firstly in this model, and then generalizes, higher level concepts representing clustering results. Finally, the samples are classified into different final concepts and the clustering process is completed. On the premise of ensuring the accuracy of the clustering results, this model can greatly decrease the number of tuples needing to be processed, improving the data scalability of clustering algorithms. In addition, to discover and analyze knowledge based on concepts, this model can improve the interpretability of clustering results, and facilitate to interact with users. Experimental results show that the proposed model is more useful to the algorithms with higher computation cost and better results.