Alpha-zero showed how much learning could happen if you had a rigorous ground truth system.
- Alpha-zero showed how much learning could happen if you had a rigorous ground truth system[adm].
- Games like Go have rigid, clear rules about what moves are legal and what constitutes a win.
- The ground truth can be applied without a human in the loop, because the rules are black and white and possible to easily model in a computer with full fidelity.
- That means if you set up a co-evolutionary loop, you can pour extraordinary amounts of compute into it and it will get better and better, with no humans in the loop.
- A self-catalyzing infinite stream of training data.
- But now GPT4-class models are commodity and there are a number of open weights versions.
- Those models can act like the "ground truth" for other models to use to bootstrap off of… it just requires conveniently ignoring the license[adp] of the open weights models.
- Some of the recent breakthroughs likely happened in this way.
- "We just released an MIT-licensed Llama-derived model."
- "Wait, what?"[adq]
- It's impossible to imagine that this can be stopped, it's too powerful a technique, and too easy for someone to have a licensing "oopsie".
- Once the weights are published, there's no taking them back.
- By the time the original publisher of the infringing model is taken down (which might take a long time, especially if they're international) that derivative model has been picked up by the swarm.[adr]