Weight Decay

August 23, 2025 2 weeks ago 1 min read

L2 regularization technique that penalizes large parameter values to reduce overfitting; often decoupled from momentum in modern optimizers like AdamW.