RLHF, Reinforcement Learning from Human Feedback

August 22, 2025 • 3 months ago • 1 min read

Alignment method where a reward model is trained on human preference comparisons, then the base model is fine‑tuned with reinforcement learning to follow desired behavior.