The fundamental realities of KVCaches dominate what kinds of UXes are viable.

2026-05-04 · Bits and Bobs 5/4/26

The fundamental realities of KVCaches dominate what kinds of UXes are viable.
- If your session is still in the KVCache, it's trivial to serve, just stream out the new tokens.
- If your session has to be recreated, then it takes going through the whole context.
  - What counts as a session is "exact prefix match."
  - That means that multiple people in the same workflow could share the same prefix.
- LLM providers keep your sessions warm in the cache for your next response.
- LLM providers have been dropping this from 60 minutes to closer to 5 minutes to get more efficiency.
- If you want to cost the model provider a ton, send a single question about 5 minutes after the last one finished, to stay permanently in the cache.

More on this topic

From other episodes