Definition
Scaling laws describe predictable relationships between model size, training data, compute, and performance.
Key Findings: - Performance improves predictably with scale - Compute-optimal: Balance model size and data - Power law relationships
Chinchilla Scaling Laws: - Previous models were undertrained - Optimal: Train longer on more data - 20 tokens per parameter recommended
Implications: - Larger models = better performance - More data = better performance - More compute = better performance - Returns are predictable
Emergent Abilities: - Some capabilities appear suddenly at scale - Examples: arithmetic, multi-step reasoning - Debated whether truly "emergent"
Limitations: - Eventually hit diminishing returns - Doesn't capture all capabilities - May not hold for all tasks
Examples
GPT-4 being better than GPT-3 primarily due to larger scale.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.