一种减小方差求解非光滑问题的随机优化算法

doi:10.13328/j.cnki.jos.004890

微信服务号

微信订阅号

2025年7月30日 21:43 星期三

首页 > 过刊浏览>2015年第26卷第11期 >2752-2761. DOI:10.13328/j.cnki.jos.004890

PDF HTML阅读 XML下载导出引用引用提醒

一种减小方差求解非光滑问题的随机优化算法
DOI:
                        10.13328/j.cnki.jos.004890
                    
CSTR:
                        
                    
作者:
                        朱小辉朱小辉
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
陶卿陶卿
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
邵言剑邵言剑
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
储德军储德军
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61273296); 安徽省自然科学基金(1308085QF121)

Stochastic Optimization Algorithm with Variance Reduction for Solving Non-Smooth Problems

Author:

ZHU Xiao-Hui
ZHU Xiao-Hui
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
TAO Qing
TAO Qing
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
SHAO Yan-Jian
SHAO Yan-Jian
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
CHU De-Jun
CHU De-Jun
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [19]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随机优化算法是求解大规模机器学习问题的高效方法之一.随机学习算法使用随机抽取的单个样本梯度代替全梯度,有效节省了计算量,但却会导致较大的方差.近期的研究结果表明:在光滑损失优化问题中使用减小方差策略,能够有效提高随机梯度算法的收敛速率.考虑求解非光滑损失问题随机优化算法COMID(compositeobjective mirror descent)的方差减小问题.首先证明了COMID具有方差形式的O(1/√T+σ²/√T)收敛速率,其中,T是迭代步数,σ²是方差.该收敛速率保证了减小方差的有效性,进而在COMID中引入减小方差的策略,得到一种随机优化算法α-MDVR(mirror descent with variance reduction).不同于Prox-SVRG(proximal stochastic variance reduced gradient),α-MDVR收敛速率不依赖于样本数目,每次迭代只使用部分样本来修正梯度.对比实验验证了α-MDVR既减小了方差,又节省了计算时间.

关键词:机器学习;随机算法;非光滑;方差;composite objective mirror descent(COMID)

Abstract:

Stochastic optimization is one of the efficient methods for solving large-scale machine learning problems. In stochastic learning, the full gradient is replaced by the gradient of loss function in terms of a randomly selected single sample to avoid high computational cost. However, large variance is usually caused. Recent studies have shown that the convergence rate of stochastic gradient methods can be effectively improved by reducing the variance. In this paper, the variance reduction issue in COMID (composite objective mirror descent) is addressed when solving non-smooth optimization problems. First a proof is provided to show that the COMID has a convergence rate O(1/√T+σ²/√T) in the terms of variance, where T is the iteration number and σ² is the variance. This convergence rate ensures the effectiveness of reducing the variance. Then, a stochastic optimization algorithm α-MDVR (mirror descent with variance reduction) is obtained by incorporating the strategy of reducing variance into COMID. Unlike Prox-SVRG (proximal stochastic variance reduced gradient), α-MDVR does not depend on the number of samples and only uses a small portion of samples to modify the gradient. The comparative experiments demonstrate that α-MDVR not only reduces the variance but also decreases the computational time.

Key words:machine learning;stochastic algorithm;non-smooth;variance;composite objective mirror descent (COMID)

参考文献

[1] Tseng P. Approximation accuracy, gradient methods, and error bound for structured convex optimization. Mathematical Programming, 2010,125(2):263-295.[doi:10.1007/s10107-010-0394-2]

[2] Nemirovski A, Juditsky A, Lan G, Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009,19(4):1574-1609.[doi:10.1137/070704277]

[3] Shalev-Shwartz S, Tewari A. Stochastic methods for L1 regularized loss minimization. In:Proc. of the 26th Annual Int'l Conf. on Machine Learning. 2009. 929-936.

[4] Johnson R, Zhang T. Accelerating stochastic gradient descent using predictive variance reduction. In:Proc. of the Advances in Neural Information Processing Systems 26. 2013. 315-323.

[5] Shalev-Shwartz S, Zhang T. Stochastic dual coordinate ascent methods for regularized loss minimization. arXiv preprint arXiv:1209.1873, 2012.

[6] Le Roux N, Schmidt M, Bach F. A stochastic gradient method with an exponential convergence rate for strongly convex optimization with finite training sets. arXiv preprint, arXiv:1202.6258, 2012.

[7] Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. In:Advances in Neural Information Processing Systems. 2009. 2116-2124.

[8] Xiao L, Zhang T. A proximal stochastic gradient method with progressive variance reduction. arXiv:1403.4699v1, 2014.

[9] Duchi J, Shalev-Shwartz S, Singer Y, Tewari A. Composite objective mirror descent. In:Proc. of the 23rd Annual Workshop on Computational Learning Theory. ACM Press, 2010. 116-128.

[10] Duchi J, Shalev-Shwartz S, Singer Y. Efficient projections onto the L1-ball for learning in high dimensions. In:Proc. of the 25th Int'l Conf. on Machine Learning. 2008. 272-279.

[11] Beck A, Teboulle M. A fast iterative shrinkage thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2009,2(1):183-202.[doi:10.1137/080716542]

[12] Nesterov Y. A method of solving a convex programming problem with convergence rate O(1/k²). Soviet Mathematics Doklady, 1983,27(2):372-376.

[13] Bottou L. Stochastic Gradient Descent Tricks. Neural Networks:Tricks of the Trade. Berlin, Heidelberg:Springer-Verlag, 2012. 421-436.[doi:10.1007/978-3-642-35289-8_25]

[14] Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 2003,31(3):167-175.[doi:10.1016/S0167-6377(02)00231-6]

[15] Bregman LM. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 1967,7(3):200-217.[doi:10.1016/0041-5553 (67)90040-7]

[16] Hazan E, Kale S. Beyond the regret minimization barrier:An optimal algorithm for stochastic strongly convex optimization. Journal of Machine Learning Research, 2014,15(1):2489-2512.

[17] Lan G. An optimal method for stochastic composite optimization. Mathematical Programming, Series A, 2012,133(1-2):365-397.[doi:10.1007/s10107-010-0434-y]

[18] Shalev-Shwartz S, Singer Y, Srebro N. Pegasos:Primal estimated sub-gradient solver for SVM. Mathematical Programming, 2011, 127(1):3-30.[doi:10.1007/s10107-010-0420-4]

[19] Lin QH, Chen X, Peña J. A sparsity preserving stochastic gradient method for composite optimization. Manuscript, Carnegie Mellon University, 2011. 15213.

引用本文

朱小辉,陶卿,邵言剑,储德军.一种减小方差求解非光滑问题的随机优化算法.软件学报,2015,26(11):2752-2761

复制

文章指标

点击次数:5240
下载次数: 7724
HTML阅读次数: 2586
引用次数: 0

历史

收稿日期:2015-02-26
最后修改日期:2015-08-26
录用日期:
在线发布日期: 2015-11-04
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码