Back to Glossary
concepts

Perplexity

Metric measuring how well a language model predicts text - lower is better.

Share:

Definition

Perplexity measures how "surprised" a language model is by text, indicating prediction quality.

Intuition: - Lower perplexity = better predictions - Perplexity of 10 = model as confused as choosing from 10 options - Good models have low perplexity on test data

Calculation: - Exponential of cross-entropy loss - Geometric mean of inverse probabilities - PPL = exp(loss)

Typical Values: - State-of-the-art LLMs: 3-10 on standard benchmarks - Random guessing: Vocabulary size - Perfect prediction: 1

Limitations: - Doesn't measure coherence - Doesn't measure factuality - Dataset-dependent - Not comparable across different tokenizers

Examples

GPT-4 achieving perplexity of 8.5 on WikiText-103 benchmark.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion