Abstract:Stochastic optimization is one of the efficient methods for solving large-scale machine learning problems. In stochastic learning, the full gradient is replaced by the gradient of loss function in terms of a randomly selected single sample to avoid high computational cost. However, large variance is usually caused. Recent studies have shown that the convergence rate of stochastic gradient methods can be effectively improved by reducing the variance. In this paper, the variance reduction issue in COMID (composite objective mirror descent) is addressed when solving non-smooth optimization problems. First a proof is provided to show that the COMID has a convergence rate O(1/√T+σ2/√T) in the terms of variance, where T is the iteration number and σ2 is the variance. This convergence rate ensures the effectiveness of reducing the variance. Then, a stochastic optimization algorithm α-MDVR (mirror descent with variance reduction) is obtained by incorporating the strategy of reducing variance into COMID. Unlike Prox-SVRG (proximal stochastic variance reduced gradient), α-MDVR does not depend on the number of samples and only uses a small portion of samples to modify the gradient. The comparative experiments demonstrate that α-MDVR not only reduces the variance but also decreases the computational time.