The process of using a trained AI model to generate predictions, responses, or outputs based on new input. Unlike training (which happens once), inference happens billions of times daily as users interact with AI systems. A single ChatGPT response requires inference across all 96+ transformer layers, consuming significant computational resources.
Discussed in Chapter 1 of This Is Server Country