Gradient Checkpointing

August 22, 2025 2 weeks ago 1 min read

Memory‑saving technique that recomputes activations during backward pass to fit larger models.