一种求解强凸优化问题的最优随机算法
作者:
基金项目:

国家自然科学基金(61273296)


Stochastic Algorithm with Optimal Convergence Rate for Strongly Convex Optimization Problems
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [21]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随机梯度下降(SGD)算法是处理大规模数据的有效方法之一.黑箱方法SGD在强凸条件下能达到最优的O(1/T)收敛速率,但对于求解L1+L2正则化学习问题的结构优化算法,如COMID(composite objective mirror descent)仅具有O(lnT/T)的收敛速率.提出一种能够保证稀疏性基于COMID的加权算法,证明了其不仅具有O(1/T)的收敛速率,还具有on-the-fly计算的优点,从而减少了计算代价.实验结果表明了理论分析的正确性和所提算法的有效性.

    Abstract:

    Stochastic gradient descent (SGD) is one of the efficient methods for dealing with large-scale data. Recent research shows that the black-box SGD method can reach an O(1/T) convergence rate for strongly-convex problems. However, for solving the regularized problem with L1 plus L2 terms, the convergence rate of the structural optimization method such as COMID (composite objective mirror descent) can only attain O(lnT/T). In this paper, a weighted algorithm based on COMID is presented, to keep the sparsity imposed by the L1 regularization term. A prove is provided to show that it achieves an O(1/T) convergence rate. Furthermore, the proposed scheme takes the advantage of computation on-the-fly so that the computational costs are reduced. The experimental results demonstrate the correctness of theoretic analysis and effectiveness of the proposed algorithm.

    参考文献
    [1] 孙正雅,陶卿.统计机器学习综述:损失函数与优化求解.中国计算机学会通讯,2009,5(8):7-14.
    [2] Tao Q, Gao QK, Jiang JY, Chu DJ. Survey of solving the optimization problems for sparse learning. Ruan Jian Xue Bao/Journal of Software, 2013,24(11):2498-2507 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4479.htm [doi: 10.3724/SP.J.1001.2013.044790]
    [3] Robbins H, Monro S. A stochastic approximation method. The Annuals of Mathematical Statistics, 1951,22:400-407. [doi:10.1214/aoms/1177729586]
    [4] Kiefer J, Wolfowitz J. Stochastic estimation of the maximum of a regression function. The Annuals of Mathematical Statistics, 1952,23:462-466.
    [5] Shalev-Shwartz S, Singer Y, Srebro N. Pegasos: Primal estimated sub-gradient solver for SVM. In: Proc. of the 24th Int’l Conf. on Machine Learning. ACM Press, 2007. 807-814.
    [6] Hazan E, Kale S. Beyond the regret minimization barrier: An optimal algorithm for stochastic strongly-convex optimization. Journal of Machine Learning Research-Proceedings Track, 2011,19:421-436.
    [7] Rakhlin A, Shamir O, Sridharan K. Making gradient descent optimal for strongly convex stochastic optimization. In: Proc. of the 29th Int’l Conf. on Machine Learning. 2012. 449-456. http://arxiv.org/abs/1109.5647
    [8] Lacoste-Julien S, Schmidt M, Bach F. A simpler approach to obtaining an o(1/t) convergence rate for projected stochastic subgradient descent. arXiv preprint arXiv:1212.2002, 2012.
    [9] Tibshirani R. Regression shrinkage and selection via the lasso. Journal of Royal Statistical Society, Series B, 1996,58(1):267-288.
    [10] Shalev-Shwartz S, Zhang T. Stochastic dual coordinate ascent methods for regularized loss minimization. Journal of Machine Learning Research, 2013,14:567-599.
    [11] Nesterov Y. How to advance in structural convex optimization. OPTIMA: Mathematical Programming Society Newsletter, 2008,78: 2-5.
    [12] Xiao L. Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research, 2010,11:2543-2596.
    [13] Duchi J, Shalev-Shwartz S, Singer Y, Tewari A. Composite objective mirror descent. In: Proc.of the 23rd Annual Workshop on Computational Learning Theory. ACM Press, 2010. 116-128.
    [14] Ouyang H, He N, Tran L, Gray A. Stochastic alternating direction method of multipliers. In: Proc. of the 30th Int’l Conf. on Machine Learning. 2013. 80-88. http://jmlr.org/proceedings/papers/v28/ouyang13.html
    [15] Nemirovski A, Yudin D. Problem Complexity and Method Efficiency in Optimization. Wiley-Interscience Series in Discrete Mathematics, John Wiley-Interscience Publication, 1983.
    [16] Beck A, Teboulle M. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 2003,31(3):167-175. [doi: 10.1016/S0167-6377(02)00231-6]
    [17] Nemirovski A, Juditsky A, Lan G, Shapiro A. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 2009,19(4):1574-1609. [doi: 10.1137/070704277]
    [18] Bregman LM. The relaxation method of finding the common point convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 1967,7:200-217. [doi: 10.1016/0041-5553(67) 90040-7]
    [19] Bertsekas D. Convex Analysis and Optimization. Athena: Scientific, 2003.
    [20] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. LIBLINEAR: A library for large linear classification. The Journal of Machine Learning Research, 2008,9:1871-1874.
    [21] Chen X, Lin Q, Pena J. Optimal regularized dual averaging methods for stochastic optimization. In: Proc.of the Advances in Neural Information Processing Systems. 2012. 404-412. http://papers.nips.cc/paper/4543-optimal-regularized-dual-averaging-methods-forstochastic- optimization.pdf
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

邵言剑,陶卿,姜纪远,周柏.一种求解强凸优化问题的最优随机算法.软件学报,2014,25(9):2160-2171

复制
分享
文章指标
  • 点击次数:5383
  • 下载次数: 9815
  • HTML阅读次数: 2934
  • 引用次数: 0
历史
  • 收稿日期:2014-01-23
  • 最后修改日期:2014-04-09
  • 在线发布日期: 2014-09-09
文章二维码
您是第20486625位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号