The fundamental realities of KVCaches dominate what kinds of UXes are viable.

· Bits and Bobs 5/4/26
  • The fundamental realities of KVCaches dominate what kinds of UXes are viable.
    • If your session is still in the KVCache, it's trivial to serve, just stream out the new tokens.
    • If your session has to be recreated, then it takes going through the whole context.
      • What counts as a session is "exact prefix match."
      • That means that multiple people in the same workflow could share the same prefix.
    • LLM providers keep your sessions warm in the cache for your next response.
    • LLM providers have been dropping this from 60 minutes to closer to 5 minutes to get more efficiency.
    • If you want to cost the model provider a ton, send a single question about 5 minutes after the last one finished, to stay permanently in the cache.

More on this topic

From other episodes