Abstract:This study proposes a feature extraction algorithm based on the principal component analysis (PCA) of the anisotropic Gaussian kernel penalty which is different from the traditional kernel PCA algorithms. In the non-linear data dimensionality reduction, the nondimensionalization of raw data is ignored by the traditional kernel PCA algorithms. Meanwhile, the previous kernel function is mainly controlled by one identical kernel width parameter in each dimension, which cannot reflect the significance of different features in each dimension precisely, resulting in the low accuracy of dimensionality reduction process. To address the above issues, contraposing the current problem of nondimensionalization of raw data, an averaging algorithm is proposed in this study, which has shown sound performance in improving the variance contribution rate of the original data typically. Then, anisotropic Gaussian kernel function is introduced owing each dimension has different kernel width parameters which can critically reflect the importance of the dimension data features. In addition, the feature penalty function of kernel PCA is formulated based on the anisotropic Gaussian kernel function to represent the raw data with fewer features and reflect the importance of each principal component information. Furthermore, the gradient descent method is introduced to update the kernel width of feature penalty function and control the iterative process of the feature extraction algorithm. To verify the effectiveness of the proposed algorithm, several algorithms are compared on UCI public data sets and KDDCUP99 data sets, respectively. The experimental results show that the feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 4.49% higher on average than the previous PCA algorithms on UCI public data sets. The feature extraction algorithm of the PCA based on the anisotropic Gaussian kernel penalty is 8% higher on average than the previous PCA algorithms on KDDCUP99 data sets.