There's a trade off between throughput and latency in LLM inference.
- There's a trade off between throughput and latency in LLM inference.
- More throughput means they can give the same quality completion for cheaper.
- Most services don't allow you to specify but for async workloads throughput is more important.