RLHF
Full Form: Reinforcement Learning from Human Feedback
Category: AI Techniques
📖 Definition
RLHF trains AI models using human preference data. Humans rate AI outputs, and the model learns to produce outputs that humans prefer. This improves alignment with human values.
🔑 Key Points
- Uses human feedback to train AI preferences
- Key technique for making AI helpful and harmless
- Used by ChatGPT, Claude, and most modern chatbots
- Can be expensive and time-consuming
💡 Why It Matters
RLHF is how AI labs make their models more helpful and aligned with human values. It's crucial for creating AI that people want to use.