A→Z
A2ZAI
Back to Glossary
concepts

Scaling Laws

Empirical relationships showing how AI performance improves with more data, compute, and parameters.

Share:

Definition

Scaling laws describe predictable relationships between model size, training data, compute, and performance.

Key Findings: - Performance improves predictably with scale - Compute-optimal: Balance model size and data - Power law relationships

Chinchilla Scaling Laws: - Previous models were undertrained - Optimal: Train longer on more data - 20 tokens per parameter recommended

Implications: - Larger models = better performance - More data = better performance - More compute = better performance - Returns are predictable

Emergent Abilities: - Some capabilities appear suddenly at scale - Examples: arithmetic, multi-step reasoning - Debated whether truly "emergent"

Limitations: - Eventually hit diminishing returns - Doesn't capture all capabilities - May not hold for all tasks

Examples

GPT-4 being better than GPT-3 primarily due to larger scale.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free daily digest. No spam, unsubscribe anytime.

Discussion