Abstract:As a new type of distributed machine learning paradigm, federated learning makes full use of the computing power of many distributed clients and their local data to jointly train a machine learning model under the premise of meeting user privacy and data confidentiality requirements. In cross-device federated learning scenarios, the client usually consists of thousands or even tens of thousands of mobile devices or terminal devices. Due to the limitations of communication and computing costs, the aggregation server only selects few clients for the training during each round of training. Meanwhile, several widely employed federated optimization algorithms adopt a completely random client selection algorithm, which has been proven to have a huge optimization space. In recent years, how to efficiently and reliably select a suitable set from massive heterogeneous clients to participate in training and thus optimize the resource consumption and model performance of federated learning protocols has been extensively studied, but there is still no comprehensive investigation on the key issue. Therefore, this study conducts a comprehensive survey of client selection algorithms for cross-device federated learning. Specifically, it provides a formal description of the client selection problem, then gives the classification of selection algorithms, and discusses and analyzes the algorithms one by one. Finally, some future research directions for client selection algorithms are explored.