Scalar step size for gradient updates; too high diverges, too low slows training. ← Layer 1 (L1) Keccak‑256 →