inference

/IN-fer-ens/

The process of using a trained AI model to generate predictions, responses, or outputs based on new input. Unlike training (which happens once), inference happens billions of times daily as users interact with AI systems. A single ChatGPT response requires inference across all 96+ transformer layers, consuming significant computational resources.

Referenced in the Book