Two words run the entire AI-compute conversation, and most coverage uses them without defining them. Here they are, from the source. NVIDIA's annual report describes its platform serving "artificial intelligence, or AI, model training and inference" (NVIDIA Form 10-K, FY2026). Those are the two halves.

The way this actually works: training is teaching the model. You feed it enormous amounts of data and adjust billions of parameters until it's good. It's expensive and bursty — a giant job that runs for weeks, then stops. Inference is using the trained model: every time you ask a chatbot a question, that's one inference. Each is cheap relative to training, but you do it constantly, and the total grows with every user.

Why does the distinction matter economically? Because the two have opposite cost shapes. Training cost is front-loaded and finite per model. Inference cost is open-ended — it scales with adoption, forever. A lab can afford a huge training run as a one-time bet; it has to make inference cheap or the model loses money on every query. That's why so much of the patent activity covered on this site — mixture-of-experts routing, perplexity-based adaptive compute — targets inference efficiency specifically.

One analogy, then gone: training is building a restaurant's recipes; inference is cooking each order. You design the menu once. You cook every single meal a customer orders, and a popular restaurant cooks a lot of meals.

So next time you read that a company is "spending on AI compute," ask which half. Training spend is a capability bet. Inference spend is an operating cost that tracks usage. NVIDIA puts both on one line because it sells into both — but for understanding the business, they're two different stories, and the 10-K names them separately for a reason.