The dominant neural network architecture for modern AI systems, introduced in 2017. Transformers use attention mechanisms to process input sequences in parallel, enabling them to understand relationships between words regardless of their position in text. GPT, Claude, and most large language models are built on transformer architecture.
Discussed in Chapter 1 of This Is Server Country