A type of computer memory that stacks multiple layers of DRAM chips vertically, enabling extremely high data transfer rates (up to 3 terabytes per second on modern GPUs). HBM is essential for AI inference because model weights must be rapidly accessed during processing. Memory bandwidth is often the limiting factor in AI performance, not raw computation speed.
Discussed in Chapter 2 of This Is Server Country