o1 is not like other LLMs; it's a different type of thing.

It's more akin to comparing whole systems/scaffolding (e.g. Github Copilot Workspaces) around a model than comparing base models.

It just so happens to implement what other people have implemented as a system around a model, into the model itself.

o1 is a monolith, but a self-improving monolith. Produced by a generative process, not by humans.

That gives it some benefits: the ability to train it to do better (possibly without limit, if you want to invest tons of CPU into it, like Alpha Go).

But it also means that it's "integrated" and hard to tweak and direct and configure.

One of the dials you can't get into in o1: how long to think about this task?

A more modular system discovered by a swarming ecosystem still could do better.

More on this topic