National Natural Science Foundation of China (U1435220); Beijing Major Science and Technology Projects (D171100003417002); Civil Aviation Science and Technology Major Project (MHRD20160109)
Machine learning problems can be viewed as optimization-centric programs, and the optimization algorithm is an important tool to solve the objective function. In the era of big data, in order to speed up the training process, it is essential to design parallel and distributed optimization algorithms by multi-core computing and distributed computing technologies. In recent years, there are a lot of research works in this field, and some algorithms have been widely applied on machine learning platforms. In this paper, five common optimization algorithms, including gradient descent algorithm, second order optimization algorithm, proximal gradient algorithm, coordinate descent algorithm and alternating direction method of multiplier, are studied. Each type of algorithm is analyzed from the view of parallel and distributed respectively, and algorithms of the same type are compared by their model type, input data characteristic, algorithm evaluation and parallel communication mode. In addition, the implementations and applications of the optimization algorithm on representative scalable machine learning platforms are analyzed. Meanwhile, all the optimization algorithms introduced in this paper are categorized by a hierarchical classification diagram, which can be used as a tool to select the appropriate optimization algorithm according to the objective function type, and also to cross explore how to apply optimization algorithms to the new objective function type. Finally, the problems of the existing optimization algorithms are discussed, and the possible solutions and the future research directions are proposed.