Reinforcement learning algorithm that stabilizes updates with clipped objectives, used in RLHF. ← Pedersen Commitment Quadratic Funding →