Reinforcement Learning Fine-Tuning Technique by OpenAI
OpenAI Just announced #Reinforcement Fine Tuning. Here is the demo:
But what is Reinforcement Fine-Tuning (RFT)?
RFT is a new model customization technique that allows users to fine-tune OpenAI’s models (specifically the #O1 series) using reinforcement learning rather than traditional supervised fine-tuning. The key difference is that while supervised fine-tuning teaches models to mimic inputs, RFT teaches models to develop new reasoning capabilities over custom domains.
𝐇𝐨𝐰 𝐢𝐭 𝐰𝐨𝐫𝐤𝐬:
1. 𝘋𝘢𝘵𝘢 𝘗𝘳𝘦𝘱𝘢𝘳𝘢𝘵𝘪𝘰𝘯:
- Users provide a dataset in JSONL format
- Each entry contains:
- Input data (e.g., a problem or case)
- Instructions for the model
- The correct answer (used for grading, not shown to model during training)
2. 𝘎𝘳𝘢𝘥𝘪𝘯𝘨 𝘚𝘺𝘴𝘵𝘦𝘮:
- Users define or use pre-built “graders”
- Graders score model outputs from 0 to 1
- Scores can be binary or partial credit
- The grading helps reinforce correct reasoning patterns
3. 𝘛𝘳𝘢𝘪𝘯𝘪𝘯𝘨 𝘗𝘳𝘰𝘤𝘦𝘴𝘴:
- When the model sees a problem, it’s given space to think
- The model’s answer is graded
- Reinforcement learning algorithms:
- Reinforce thinking patterns that led to correct answers
- Disincentivize patterns that led to incorrect answers
𝐊𝐞𝐲 𝐁𝐞𝐧𝐞𝐟𝐢𝐭𝐬:
- Can achieve results with very small datasets (as few as dozens of examples)
- Allows models to learn new reasoning capabilities, not just mimicry
- Can make smaller models perform better than larger base models
- Works especially well for tasks requiring deep expertise
The example in the demo in genetic disease diagnosis:
𝘐𝘯𝘱𝘶𝘵:
- Patient case reports containing:
- Symptoms present
- Symptoms absent
- Patient history
𝘛𝘢𝘴𝘬:
- Identify genes potentially responsible for genetic diseases
- Rank genes by likelihood
- Provide reasoning for selections
𝘙𝘦𝘴𝘶𝘭𝘵𝘴 𝘴𝘩𝘰𝘸𝘯 𝘪𝘯 𝘵𝘩𝘦 𝘥𝘦𝘮𝘰:
- Base O1-mini: 17.7% accuracy
- Base O1: 25% accuracy
- RFT-trained O1-mini: 31% accuracy
The impressive part was that the fine-tuned smaller model (O1-mini) outperformed the larger base model (O1) after reinforcement fine-tuning.
This technique represents a significant advancement in model customization, allowing organizations to create highly specialized AI models for complex domain-specific tasks while requiring relatively small amounts of training data.