Abstract:In the open environment, data streams have the characteristics of high-speed data generation, unlimited data volume, and concept drift. In the task of data stream classification, it is expensive and impractical to generate a large amount of training data by manual annotation. A data stream with a small number of samples labeled and a large number of samples unlabeled and with concept drifts presents a great challenge to machine learning. However, the existing research mainly focuses on supervised classification of data streams, while semi-supervised classification of data streams with concept drifts has not yet attracted attention enough. Therefore, based on the comprehensive collection of the work of semi-supervised classification of data streams, this study sorts the existing semi-supervised data stream classification algorithms into several types from several aspects, describes and summarizes many existing algorithms based on the types of classifiers used in the algorithms and the concept drift detection methods utilized. On some widely employed real and synthetic datasets, several representative semi-supervised classification algorithms for data streams are chosen to be compared and analyzed in many aspects. Finally, this study proposes some issues that are worthy to be further discussed in future for semi-supervised classification of data streams with concept drifts. The experimental results show that the classification accuracy of the algorithms for semi-supervised data stream classification is related to many factors, but it has the greatest relationship with the changes of data distribution. This review will help the interested researchers quickly enter into the field of semi-supervised classification of data streams.