🎓 All Courses | 📚 ChatGPT University Syllabus
Stickipedia University
📋 Study this course on TaskLoco

RLHF is the training technique that transformed a raw language model into the helpful, conversational ChatGPT users interact with today.

How It Works

  1. Human raters rank model responses from best to worst
  2. A reward model is trained on those rankings
  3. The language model is fine-tuned using reinforcement learning to maximize the reward

This is why ChatGPT feels more like a helpful assistant than a raw text predictor.


YouTube • Top 10
ChatGPT University: Reinforcement Learning from Human Feedback (RLHF)
Tap to Watch ›
📸
Google Images • Top 10
ChatGPT University: Reinforcement Learning from Human Feedback (RLHF)
Tap to View ›

Reference:

Wikipedia: RLHF

image for linkhttps://en.wikipedia.org/wiki/Reinforcement_learning_from_human_feedback

📚 ChatGPT University — Full Course Syllabus
📋 Study this course on TaskLoco

TaskLoco™ — The Sticky Note GOAT