Key Takeaways
- 1 AI processes text as numerical tokens—about 1,300 tokens per 1,000 words
- 2 Inference happens billions of times daily, requiring massive infrastructure
- 3 Memory bandwidth—not compute speed—is the limiting factor
Tokens: Where AI Begins
When you type a question, AI doesn't see words—it sees numbers called tokens.
The Transformer Revolution
In 2017, the transformer architecture changed everything. Instead of reading word-by-word, AI now processes all words simultaneously using attention.
Sequential
Forgets the beginning by the end
Parallel Attention
Sees all relationships at once
Training vs. Inference
There are two phases in AI: training (teaching the model once) and inference (using it constantly). Inference now dominates infrastructure needs.
The Memory Bottleneck
The limiting factor isn't compute speed—it's how fast chips can move data.
Why This Requires Infrastructure
This is why fields in rural Michigan become $7 billion projects. The AI boom isn't a software story—it's an infrastructure story.
Go Deeper
Chapter 1 of This Is Server Country explores the technical foundations of AI in depth—how attention mechanisms work, why memory bandwidth matters, and how the shift to inference economics is driving the trillion-dollar buildout.
Learn more about the book →