throughput

The rate at which an AI system can process work, typically measured in tokens per second. Higher throughput enables serving more users simultaneously. Data center design increasingly optimizes for throughput, as inference workloads scale with user demand rather than being bounded like training.

Referenced in the Book

Discussed in Chapter 1 of This Is Server Country