LLM generation is slow and error prone.

2024-07-01 · Bits and Bobs 7/1/24

Even if it works 95% of the time, that 5% it doesn't is hard to predict.

LLMs are great at answering a specific, unique question… but then the user needs to sit there and wait while the answer unspools.

Some use cases get enough value from the LLM to make it worth it, but there are a lot of use cases where it wouldn't be viable if the user had to wait a long time for an answer that could very well be wrong.

A successful large-scale system will use LLMs to create a lot of answers, and then cache them so in the future a similar question can be answered with a retrieval, not a generation.

Guess at the kinds of questions users will have, then unleash a swarm of LLMs on them in a precomputation step.

By the time the user asks their question, there's already a pre-cached answer ready to go.

The power of LLMs, but with faster results.

LLM generation is slow and error prone.

More on this topic