Definition
Inference is the process of using a trained model to generate outputs for new inputs. It's the "production" phase after training is complete.
- **Training vs. Inference:**
- Training: Learning from data (expensive, done once)
- Inference: Using what was learned (cheaper, done many times)
- **Inference Considerations:**
- Latency: Time to generate response
- Throughput: Requests handled per second
- Cost: Compute resources needed
- Accuracy: Quality of predictions
Examples
When you ask ChatGPT a question, it performs inference to generate the answer.
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.