Model trained to score outputs according to preferences or policies, used to guide reinforcement learning or re‑ranking of generations. ← Zero‑Knowledge Proof Account Abstraction →