Alignment method where a reward model is trained on human preference comparisons, then the base model is fine‑tuned with reinforcement learning to follow desired behavior. ← Zero‑Knowledge Proof Account Abstraction →