Abstract:Domain adaptation (DA) is a group of machine learning tasks where the training set (source domain) and the test set (target domain) exhibit different distributions. Its key idea lies in how to overcome the negative impact given by these distributional differences, in other words, how to design an effective training strategy to obtain a classifier with high generalization performance by minimizing the difference between data domains. This study focuses on the tasks of unsupervised DA (UDA), where annotations are available in the source domain but absent in the target domain. This problem can be considered as how to use partially annotated data and unannotated data to train a classifier in a semi-supervised learning framework. Then, two kinds of semi-supervised learning techniques, namely pseudo labels (PLs) and consistent regularization (CR), are used to augment and annotate data in the observed domain for learning the classifier. Consequently, the classifier can obtain better generalization performance in the tasks of UDA. This study proposes augmentation-based UDA (A-UDA), in which the unannotated data in the target domain are augmented by random augmentation, and the high-confident data are annotated by adding pseudo-labels based on the predicted output of the model. The classifier is trained on the augmented data set. The distribution distance between the source domain and the target domain is calculated by using the maximum mean difference (MMD). By minimizing this distance, the classifier achieves high generalization performance. The proposed method is evaluated on multiple UDA tasks, including MNIST-USPS, Office-Home, and ImageCLEF-DA. Compared to other existing methods, it achieves better performance on these tasks.