Abstract:Within the internal organization of the data, the data points respectively play three different structural roles: the hub, centroid and outlier. The neighborhood-based density factor (NDF) used in the neighborhood based clustering (NBC) algorithm has the ability of identifying which points act as hubs, centriods or outliers in separated-well data set. However, NDF often works poorly in the circumstances of noise and overlapping. This paper introduces a polynomial kernel based neighborhood density factor (PKNDF) to address this issue. Relying on the PKNDF, a structural data clustering algorithm is further presented which can find all salient clusters with arbitrary shapes and unbalanced sizes in a noisy or overlapping data set. It builds clusters into the framework of directed trees in graph theory and thereby each point is scanned only once in the process of clustering. Hence, its computational complexity is nearly linear in the size of the input data. Experimental results on both synthetic and real-world datasets have demonstrated its effectiveness and efficiency.