Abstract:Stochastic optimization algorithms are recognized as essential for addressing large-scale data and complex models in machine learning. Among these, variance reduction methods, such as the STORM algorithm, have gained attention for their ability to achieve optimal convergence rates of $ {\mathrm{O}}\left({T}^{-1/3}\right) $. However, traditional variance reduction methods typically depend on specific problem parameters (e.g., the smoothness constant, noise variance, and gradient upper bound) for setting the learning rate and momentum, limiting their practical applicability. To overcome this limitation, this study proposes an adaptive variance reduction method based on a normalization technique, which eliminates the need for prior knowledge of problem parameters while maintaining optimal convergence rates. Compared to existing adaptive variance reduction methods, the proposed approach offers several advantages: (1) no reliance on additional assumptions, such as bounded gradients, bounded function values, or excessively large initial batch sizes; (2) the achievement of the optimal convergence rate of $ {\mathrm{O}}\left({T}^{-1/3}\right) $without extra term of $ {\mathrm{O}}\left(\mathrm{log}T\right)$; (3) a concise and straightforward proof, facilitating extensions to other stochastic optimization problems. The superiority of the proposed method is further validated through numerical experiments, demonstrating enhanced performance when compared to other approaches.