Learning paradigm where an agent interacts with an environment and optimizes actions to maximize cumulative reward over time. ← Zero‑Knowledge Proof Account Abstraction →