一种求解强凸优化问题的最优随机算法

doi:10.13328/j.cnki.jos.004633

微信服务号

微信订阅号

2025年8月7日 22:15 星期四

首页 > 过刊浏览>2014年第25卷第9期 >2160-2171. DOI:10.13328/j.cnki.jos.004633

PDF HTML阅读 XML下载导出引用引用提醒

一种求解强凸优化问题的最优随机算法
DOI:
                        10.13328/j.cnki.jos.004633
                    
CSTR:
                        
                    
作者:
                        邵言剑邵言剑
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
陶卿陶卿
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
姜纪远姜纪远
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找
周柏周柏
中国人民解放军陆军军官学院 十一系, 安徽 合肥 230031
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(61273296)

Stochastic Algorithm with Optimal Convergence Rate for Strongly Convex Optimization Problems

Author:

SHAO Yan-Jian
SHAO Yan-Jian
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
TAO Qing
TAO Qing
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
JIANG Ji-Yuan
JIANG Ji-Yuan
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找
ZHOU Bai
ZHOU Bai
11st Department, Army Officer Academy of PLA, Hefei 230031, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [21]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

随机梯度下降(SGD)算法是处理大规模数据的有效方法之一.黑箱方法SGD在强凸条件下能达到最优的O(1/T)收敛速率,但对于求解L1+L2正则化学习问题的结构优化算法,如COMID(composite objective mirror descent)仅具有O(lnT/T)的收敛速率.提出一种能够保证稀疏性基于COMID的加权算法,证明了其不仅具有O(1/T)的收敛速率,还具有on-the-fly计算的优点,从而减少了计算代价.实验结果表明了理论分析的正确性和所提算法的有效性.

关键词:机器学习;随机优化;强凸问题;混合正则化项;COMID (composite objective mirror descent)

Abstract:

Stochastic gradient descent (SGD) is one of the efficient methods for dealing with large-scale data. Recent research shows that the black-box SGD method can reach an O(1/T) convergence rate for strongly-convex problems. However, for solving the regularized problem with L1 plus L2 terms, the convergence rate of the structural optimization method such as COMID (composite objective mirror descent) can only attain O(lnT/T). In this paper, a weighted algorithm based on COMID is presented, to keep the sparsity imposed by the L1 regularization term. A prove is provided to show that it achieves an O(1/T) convergence rate. Furthermore, the proposed scheme takes the advantage of computation on-the-fly so that the computational costs are reduced. The experimental results demonstrate the correctness of theoretic analysis and effectiveness of the proposed algorithm.

Key words:machine learning;stochastic optimization;strongly-convex;hybrid regularization;COMID (composite objective mirror descent)

参考文献

[1] 孙正雅,陶卿.统计机器学习综述:损失函数与优化求解.中国计算机学会通讯,2009,5(8):7-14.

[2] Tao Q, Gao QK, Jiang JY, Chu DJ. Survey of solving the optimization problems for sparse learning. Ruan Jian Xue Bao/Journal of Software, 2013,24(11):2498-2507 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4479.htm [doi: 10.3724/SP.J.1001.2013.044790]

[3] Robbins H, Monro S. A stochastic approximation method. The Annuals of Mathematical Statistics, 1951,22:400-407. [doi:10.1214/aoms/1177729586]

[4] Kiefer J, Wolfowitz J. Stochastic estimation of the maximum of a regression function. The Annuals of Mathematical Statistics, 1952,23:462-466.

[5] Shalev-Shwartz S, Singer Y, Srebro N. Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. of the 24th Int’l Conf. on Machine Learning. ACM Press, 2007. 807-814.

[6] Hazan E, Kale S. Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization. Journal of Machine Learning Research-Proceedings Track, 2011,19:421-436.

[7] Rakhlin A, Shamir O, Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. In: Proc. of the 29th Int’l Conf. on Machine Learning. 2012. 449-456. http://arxiv.org/abs/1109.5647

[8] Lacoste-Julien S, Schmidt M, Bach F. A simpler approach to obtaining an o(1/t) convergence rate for projected stochastic subgradient descent. arXiv preprint arXiv:1212.2002, 2012.

[9] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, Series B, 1996,58(1):267-288.

[10] Shalev-Shwartz S, Zhang T. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 2013,14:567-599.

[11] Nesterov Y. How to advance in structural convex optimization. OPTIMA: Mathematical Programming Society Newsletter, 2008,78: 2-5.

[12] Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 2010,11:2543-2596.

[13] Duchi J, Shalev-Shwartz S, Singer Y, Tewari A. Composite objective mirror descent. In: Proc.of the 23rd Annual Workshop on Computational Learning Theory. ACM Press, 2010. 116-128.

[14] Ouyang H, He N, Tran L, Gray A. Stochastic alternating direction method of multipliers. In: Proc. of the 30th Int’l Conf. on Machine Learning. 2013. 80-88. http://jmlr.org/proceedings/papers/v28/ouyang13.html

[15] Nemirovski A, Yudin D. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics, John Wiley-Interscience Publication, 1983.

[16] Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 2003,31(3):167-175. [doi: 10.1016/S0167-6377(02)00231-6]

[17] Nemirovski A, Juditsky A, Lan G, Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009,19(4):1574-1609. [doi: 10.1137/070704277]

[18] Bregman LM. The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 1967,7:200-217. [doi: 10.1016/0041-5553(67) 90040-7]

[19] Bertsekas D. Convex Analysis and Optimization. Athena: Scientific, 2003.

[20] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 2008,9:1871-1874.

[21] Chen X, Lin Q, Pena J. Optimal regularized dual averaging methods for stochastic optimization. In: Proc.of the Advances in Neural Information Processing Systems. 2012. 404-412. http://papers.nips.cc/paper/4543-optimal-regularized-dual-averaging-methods-forstochastic- optimization.pdf

引用本文

邵言剑,陶卿,姜纪远,周柏.一种求解强凸优化问题的最优随机算法.软件学报,2014,25(9):2160-2171

复制

文章指标

点击次数:5383
下载次数: 9815
HTML阅读次数: 2934
引用次数: 0

历史

收稿日期:2014-01-23
最后修改日期:2014-04-09
录用日期:
在线发布日期: 2014-09-09
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码