Inference

Inference is the process of using a trained AI model to generate outputs — in other words, actually running the model to get a response. Training is what happens when the model learns from data (expensive, done once or periodically). Inference is what happens every time you send a message to an AI tool (cheaper, happens constantly).

The speed and cost of inference determine how quickly AI tools respond and how much it costs to run them at scale. Advances in inference efficiency are a major driver of falling AI costs.

Related terms

Explore more terms