Alpha-zero showed how much learning could happen if you had a rigorous ground truth system.

· Bits and Bobs 1/27/25
  • Alpha-zero showed how much learning could happen if you had a rigorous ground truth system[adm].
    • Games like Go have rigid, clear rules about what moves are legal and what constitutes a win.
      • The ground truth can be applied without a human in the loop, because the rules are black and white and possible to easily model in a computer with full fidelity.
    • That means if you set up a co-evolutionary loop, you can pour extraordinary amounts of compute into it and it will get better and better, with no humans in the loop.
      • A self-catalyzing infinite stream of training data.
    • That hasn't worked for things like reasoning yet because there's no ground truth you can efficiently compare against[adn][ado].
    • But now GPT4-class models are commodity and there are a number of open weights versions.
    • Those models can act like the "ground truth" for other models to use to bootstrap off of… it just requires conveniently ignoring the license[adp] of the open weights models.
    • Some of the recent breakthroughs likely happened in this way.
      • "We just released an MIT-licensed Llama-derived model."
      • "Wait, what?"[adq]
    • It's impossible to imagine that this can be stopped, it's too powerful a technique, and too easy for someone to have a licensing "oopsie".
      • Once the weights are published, there's no taking them back.
      • By the time the original publisher of the infringing model is taken down (which might take a long time, especially if they're international) that derivative model has been picked up by the swarm.[adr]

More on this topic

From other episodes