A rough mental model of Reinforcement Learning curriculum.

2026-02-16 · Bits and Bobs 2/16/26

A rough mental model of Reinforcement Learning curriculum.
- You want to give examples at the right time to the model as it's learning.
  - If the example is too hard too early, it confuses the model.
  - You want to stay in its zone of proximal development.
- Generally you want to segment training by difficulty.
- At the beginning you have lots of easy and few hard examples.
- Then shift the mix based on your surprisal.
  - A multi armed bandit optimization.
- "I don't think hard would work… oh it does, add more hard in now".
- "I do think this easy one works so don't include it. Oh it doesn't? Increase the mix of easy ones."
- Surfing along your edge of surprisal.

More on this topic

From other episodes