A rough mental model of Reinforcement Learning curriculum.
You want to give examples at the right time to the model as it's learning.
If the example is too hard too early, it confuses the model.
You want to stay in its zone of proximal development.
Generally you want to segment training by difficulty.
At the beginning you have lots of easy and few hard examples.
Then shift the mix based on your surprisal.
"I don't think hard would work… oh it does, add more hard in now".
"I do think this easy one works so don't include it. Oh it doesn't? Increase the mix of easy ones."
Surfing along your edge of surprisal.