There's a trade off between throughput and latency in LLM inference.

2026-04-20 · Bits and Bobs 4/20/26

There's a trade off between throughput and latency in LLM inference.
- More throughput means they can give the same quality completion for cheaper.
- Most services don't allow you to specify but for async workloads throughput is more important.