Abstract:Currently, the mainstream subspace clustering methods for categorical data are dependent on linear similarity measure and the relationship between attributes is overlooked. In this study, an approach is proposed for clustering categorical data with a novel kernel soft feature-selection scheme. First, categorical data is projected into the high-dimensional kernel space by introducing the kernel function and the similarity measure of categorical data in kernel subspace is given. Based on the measure, the kernel subspace clustering objective function is derived and an optimization method is proposed to solve the objective function. At last, kernel subspace clustering algorithm for categorical data is proposed, the algorithm considers the relationship between the attributes and each attribute assigned with weights measuring its degree of relevance to the clusters, enabling automatic feature selection during the clustering process. A cluster validity index is also defined to evaluate the categorical clusters. Experimental results carried out on some synthetic datasets and real-world datasets demonstrate that the proposed method effectively excavates the nonlinear relationship among attributes and improves the performance and efficiency of clustering.