Normal LLM models are like the brain's System 1.
That is, highly parallel, vibes matching from past experience.
OpenAI's o1 (codenamed Strawberry) is different: like the brain's System 2.
Heavy, expensive machinery for general purpose problem solving.
Some problems are one-ply problems.
These are problems that the right expert could give a gut answer on and be right.
Normal LLMs are great at these.
Some problems are inherently multi-ply problems.
Even experts in the field would have to sit down to think it through
This is where something like o1 is useful.
In normal LLMs, when generating output, it's YOLOing tokens.
If its gut answer to an early token is wrong, the rest of the answer will also now be trash.
Vs o1 can recognize it made an error and revise it.
o1 style models are less about outcome supervision vs process supervision.
Having an exceptionally well tuned process can be very powerful.
In high school, one of my favorite teachers was my AP Physics teacher, Dr. Patel.
On the very first day he gave all of us the answer key of the final answer to every question in the book.
He'd grade our homework on how rigorously we followed the process, not the right answer (which we already knew).
Every single formula, input, intermediate step had to be shown in order, cleanly written.
He'd grade extraordinarily harshly.
At the beginning it drove me crazy.
But as time went on and I got good at it it felt like flying; no matter how complicated the problem I was confident I could break it down into smaller pieces until it yielded to the process.
o1's training is like giving the model its own Dr. Patel.
Whereas working with other LLMs is a little kayak, o1 is a massive ship.
Use it when you want to pull out the big guns.
Or when it's worth it to write a mini spec of what to do and come back later when it's done.
It used to be that to get great results out of LLMs required a lot of prompt-fu expertise.
This system can give very good results no matter how good the prompt is, but is hard for people with prompt-fu to steer.