一种减小方差求解非光滑问题的随机优化算法
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61273296); 安徽省自然科学基金(1308085QF121)


Stochastic Optimization Algorithm with Variance Reduction for Solving Non-Smooth Problems
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随机优化算法是求解大规模机器学习问题的高效方法之一.随机学习算法使用随机抽取的单个样本梯度代替全梯度,有效节省了计算量,但却会导致较大的方差.近期的研究结果表明:在光滑损失优化问题中使用减小方差策略,能够有效提高随机梯度算法的收敛速率.考虑求解非光滑损失问题随机优化算法COMID(compositeobjective mirror descent)的方差减小问题.首先证明了COMID具有方差形式的O(1/√T+σ2/√T)收敛速率,其中,T是迭代步数,σ2是方差.该收敛速率保证了减小方差的有效性,进而在COMID中引入减小方差的策略,得到一种随机优化算法α-MDVR(mirror descent with variance reduction).不同于Prox-SVRG(proximal stochastic variance reduced gradient),α-MDVR收敛速率不依赖于样本数目,每次迭代只使用部分样本来修正梯度.对比实验验证了α-MDVR既减小了方差,又节省了计算时间.

    Abstract:

    Stochastic optimization is one of the efficient methods for solving large-scale machine learning problems. In stochastic learning, the full gradient is replaced by the gradient of loss function in terms of a randomly selected single sample to avoid high computational cost. However, large variance is usually caused. Recent studies have shown that the convergence rate of stochastic gradient methods can be effectively improved by reducing the variance. In this paper, the variance reduction issue in COMID (composite objective mirror descent) is addressed when solving non-smooth optimization problems. First a proof is provided to show that the COMID has a convergence rate O(1/√T+σ2/√T) in the terms of variance, where T is the iteration number and σ2 is the variance. This convergence rate ensures the effectiveness of reducing the variance. Then, a stochastic optimization algorithm α-MDVR (mirror descent with variance reduction) is obtained by incorporating the strategy of reducing variance into COMID. Unlike Prox-SVRG (proximal stochastic variance reduced gradient), α-MDVR does not depend on the number of samples and only uses a small portion of samples to modify the gradient. The comparative experiments demonstrate that α-MDVR not only reduces the variance but also decreases the computational time.

    参考文献
    相似文献
    引证文献
引用本文

朱小辉,陶卿,邵言剑,储德军.一种减小方差求解非光滑问题的随机优化算法.软件学报,2015,26(11):2752-2761

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2015-02-26
  • 最后修改日期:2015-08-26
  • 录用日期:
  • 在线发布日期: 2015-11-04
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号