Model Alignment
Alignment is the step that makes a model follow chat prompts instead of just auto-completing paragraphs. We teach the model to choose preferred responses over rejected responses.
Alignment is the step that makes a model follow chat prompts instead of just auto-completing paragraphs. We teach the model to choose preferred responses over rejected responses.
Why is DPO simpler than RLHF?
Tap card to flipDPO mathematically bypasses the need to train a separate reward model or use PPO reinforcement learning.
Auto-saves immediately
Review core concepts before doing the quiz
Why is DPO simpler than RLHF?
Tap card to flipDPO mathematically bypasses the need to train a separate reward model or use PPO reinforcement learning.
Always online
Hi! I'm Spooky, your study buddy! Let's learn together.