Back to KB
Difficulty
Intermediate
Read Time
4 min

arXiv:2605.04711v1 Announce Type: new Abstract: Optimizer states occupy massive GPU memory in large-

By Codcompass Team··4 min read

Budget-Aware Optimizer Configurator (BAOC): Block-Level Memory Optimization for Large-Scale Training

Current Situation Analysis

Large-scale model training (vision, language, diffusion) faces a critical memory bottleneck: optimizer states (momentum, variance, and auxiliary buffers) typically consume 2×–3× the memory footprint of model parameters. Traditional training pipelines apply a global optimizer configuration uniformly across all network blocks, assuming homogeneous gradient dynamics. This approach fails because gradients exhibit distinct block-level behaviors, including varying directional stability and scale anisotropy. Early layers often stabilize quickly with low gradient variance, while later layers (e.g., attention heads, output projections) maintain high directional volatility and require precise state tracking. Applying expensive, full-precision optimizer states to stable blocks results in severe memory inefficiency, while naive global reduction (e.g., uniform FP16 states or momentum removal) triggers convergence degradation in sensitive blocks. The fundamental failure mode lies in treating optimizer allocation as a monolithic hyperparameter rather than a resource-constrained, block-aware optimization problem.

WOW Moment: Key Findings

Empirical profiling across vision (ViT), language (LLaMA-scale), and diffusion (Stable Diffusion) workloads reveals that block-level gradient heterogeneity can be quantified and leveraged for memory savings without sacrificing training quality. By sampling gradient streams and solving a constrained allocation problem, BAOC dynamically assigns budget-feasible configurations per block. The following table compares baseline approaches against BAOC under identical hardware constraints:

| Approach | Optimizer State

🎉 Mid-Year Sale — Unlock Full Article

Base plan from just $4.99/mo or $49/yr

Sign in to read the full article and unlock all 635+ tutorials.

Sign In / Register — Start Free Trial

7-day free trial · Cancel anytime · 30-day money-back