LLMs are significantly cheaper if you only append the tokens.

· Bits and Bobs 4/20/26
  • LLMs are significantly cheaper if you only append the tokens.
    • If you only append tokens, you can reuse the existing KVCache from earlier runs instead of having to regenerate it.
    • That can be a quadratic speed-up over the whole generation, since otherwise for token n you have to generate n-1 and n-2 all the way back down to 0.
    • That's one of the reasons various UIs lean on chat as the concept, since you can only append to chat, keeping you naturally in the cache.

More on this topic

From other episodes