Reinforcement learning architecture with a policy network (actor) and a value estimator (critic) that learn together. ← Account Abstraction Aztec Network (ZK Layer 2) →