Abstract:In the era of big data, the sample scale and the dynamic update and variation of dimensionality greatly increase the computational burden. Most of these data sets do not exist in the form of a single data type but are more often hybrid data containing both symbolic and numerical data. For this reason, scholars have proposed many feature selection algorithms for hybrid data. However, most of the existing algorithms are only applicable to static data or small-scale incremental data and cannot handle large-scale dynamic changing data, especially large-scale incremental data sets with changing data distribution. To address this limitation, this paper proposes a multi-granulation incremental feature selection algorithm for dynamic hybrid data based on an information fusion mechanism by analyzing the variations and updates of granularity space and granularity structure in dynamic data. The algorithm focuses on the mechanism of granularity space construction in dynamic hybrid data, the mechanism of dynamic update of multiple data granularity structures, and the mechanism of information fusion for data distribution variations. Finally, the paper verifies the feasibility and efficiency of the proposed algorithm by comparing the experimental results with other algorithms on the UCI dataset.