Quantization (AI)

August 22, 2025 • 3 months ago • 1 min read

Compressing model weights and activations to lower precision, for example 8‑bit or 4‑bit, to reduce memory and speed up inference.