The quality of LLMs is model + harness.
- The quality of LLMs is model + harness.
- Model quality is getting saturated.
- The differential quality comes from the harness now.
- It's gotten way harder to do a vibecheck when they're all so good.
- Long-running agentic toolcalling is where the incremental quality is visible.
- But most uses just don't need the quality.
- Andrew Ng has noted in the past that the quality jump from adding a good agentic harness to GPT3.5 was higher than the quality jump to GPT4.
- If the harness is more important than the model, but the harness is easy / cheap to build and reverse engineer, that implies different strategic outcomes.
- By wrapping the models and standing on their shoulders you can get further, with way less capital–but also less moat.