Abstract:Spectral clustering is a flexible and effective clustering method for complex structure data sets. It is based on spectral graph theory and can produce satisfactory clustering results by mapping the data points into a low-dimensional space constituted by eigenvectors so that the data structure is optimized. But in the process of spectral clustering, the computational complexity of eigen-decomposition is usually O(n3), which limits the application of spectral clustering algorithm in big data problems. Nyström extension method uses partial points sampled from the data set and approximate calculation to simulate the real eigenspace. In this way, the computational complexity can be effectively reduced, which provides a new idea for big data spectral clustering algorithm. The selection of sampling strategy is essential for Nyström extension technology. In this paper, the design of an adaptive Nyström sampling method is presented. The sampling probability of every data point will be updated after each sampling pass, and a proof is given that the sampling error will decrease exponentially with the increase of sample times. Based on the adaptive Nyström sampling method, a spectral clustering algorithm for big data analysis is presented, and its feasibility and effectiveness is verified by experiments.