The quality of LLMs is model + harness.

  • The quality of LLMs is model + harness.
    • Model quality is getting saturated.
    • The differential quality comes from the harness now.
    • It's gotten way harder to do a vibecheck when they're all so good.
    • Long-running agentic toolcalling is where the incremental quality is visible.
    • But most uses just don't need the quality.
    • Andrew Ng has noted in the past that the quality jump from adding a good agentic harness to GPT3.5 was higher than the quality jump to GPT4.
    • If the harness is more important than the model, but the harness is easy / cheap to build and reverse engineer, that implies different strategic outcomes.
    • By wrapping the models and standing on their shoulders you can get further, with way less capital–but also less moat.

More on this topic

From other episodes