Definition
Knowledge distillation transfers knowledge from a large "teacher" model to a smaller "student" model.
Process: 1. Train large teacher model 2. Teacher generates soft predictions 3. Student learns to match teacher outputs 4. Student becomes smaller approximation
Why It Works: - Soft labels contain more information than hard labels - Teacher's uncertainty is informative - Dark knowledge transfer
- **Types:**
- Response Distillation: Match output probabilities
- Feature Distillation: Match intermediate representations
- Relation Distillation: Match relationships between examples
Applications: - Deploy smaller models on edge devices - Reduce inference costs - Model compression
Examples: - DistilBERT: 60% smaller, 97% performance - Many LLM distillations available
Examples
DistilBERT learning from BERT to be 40% smaller while keeping most capability.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.