Abstract:Deep learning has been widely employed in many fields and yields excellent performance. However, this often requires the support of large amounts of labeled data, which usually means high costs and harsh application conditions. Therefore, with the development of deep learning, how to break through data limitations in practical scenarios has become an important research problem. Specifically, as one of the most important research directions, semi-supervised learning greatly relieves the data requirement pressure of deep learning by conducting learning with the assistance of abundant unlabeled data and a small number of labeled data. The pseudo-labeling method plays a significant role in semi-supervised learning, and the quality of its generated pseudo labels will influence the final results of semi-supervised learning. Focusing on pseudo-labeling in semi-supervised learning, this study proposes the pseudo-labeling method based on optimal transport theory, which introduces the pseudo-labeling procedure constraint with labeled data as generation process guidance. On this basis, the pseudo-labeling procedure is converted to the optimization problem of optimal transport, which offers a new form for solving pseudo-labeling. Meanwhile, to solve this problem, this study introduces the Sinkhorn-Knopp algorithm for approximate fast solutions to avoid the heavy computation burden. As an independent module, the proposed method can be combined with other semi-supervised learning tricks such as consistency regularization for complete semi-supervised learning. Finally, this study conducts experiments on four classic public image classification datasets of CIFAR-10, SVHN, MNIST, and FashionMNIST to verify the effectiveness of the proposed method. The experimental results show that compared with the state-of-the-art semi-supervised learning methods, this method yields better performance, especially under fewer labeled data.