The fundamental realities of KVCaches dominate what kinds of UXes are viable.
If your session is still in the KVCache, it's trivial to serve, just stream out the new tokens.
If your session has to be recreated, then it takes going through the whole context.
LLM providers keep your sessions warm in the cache for your next response.
LLM providers have been dropping this from 60 minutes to closer to 5 minutes to get more efficiency.
If you want to cost the model provider a ton, send a single question about 5 minutes after the last one finished, to stay permanently in the cache.