There's a trade off between throughput and latency in LLM inference.

· Bits and Bobs 4/20/26
  • There's a trade off between throughput and latency in LLM inference.
    • More throughput means they can give the same quality completion for cheaper.
    • Most services don't allow you to specify but for async workloads throughput is more important.