[关键词]
[摘要]
概念聚类适用于领域知识不完整或领域知识缺乏时的数据挖掘任务.定义了一种基于语义的距离判定函数,结合领域知识对连续属性值进行概念化处理,对于用分类属性和数值属性混合描述数据对象的情况,提出了一种动态概念聚类算法DDCA(domain-baseddynamicclusteringalgorithm).该算法能够自动确定聚类数目,依据聚类内部属性值的频繁程度修正聚类中心,通过概念归纳处理,用概念合取表达式解释聚类输出.研究表明,基于语义距离判定函数和基于领域知识的动态概念聚类的算法DDCA是有效的.
[Key word]
[Abstract]
Conceptual clustering analysis is suitable to discover the knowledge in database with incomplete or absent domain background information. It is difficult for original conceptual clustering method to deal with the data objects described by numerical attribute values. A new criterion function based on semantic distance is proposed in this paper, and a novel domain-based dynamic conceptual clustering algorithm (DDCA) is also presented. With the discretization of the continuous attribute values, it works well on the datasets that are described by mixed numerical attributes and categorical attributes. The algorithm automatically determines the number of clusters, modifies the demoid according to the frequency of the attribute values within each cluster and gives out the interpretations of the clustering with the conceptual complex expression. The experiments demonstrate that the semantic-based criterion function and the dynamic conceptual clustering algorithm are effective and efficient.
[中图分类号]
[基金项目]
国家自然科学基金资助项目(69835010)