Understanding Model Parameters
You see it everywhere: "Llama 3 70B", "Mistral 7B", "GPT-4 with 1.7 trillion parameters."
What do these numbers mean, and why should you care?
What Are Parameters?
Parameters are the learnable numbers inside a neural network that determine its behavior.
Think of parameters like the settings on a massive mixing board:
- Millions of knobs
- Each slightly adjusted during training
- Together they determine what the model outputs
When an LLM is trained, it's essentially finding the right values for all these parameters to predict text accurately.
The Numbers in Context
| Model | Parameters | Rough Capability |
|---|---|---|
| GPT-2 | 1.5B | Basic text generation |
| Llama 3 8B | 8B | Good for simple tasks |
| Mistral 7B | 7B | Surprisingly capable |
| Llama 3 70B | 70B | Strong all-around |
| GPT-4 | ~1.7T | State of the art |
| Llama 3 405B | 405B | Near GPT-4 level |
B = Billion, T = Trillion
Does Bigger Always Mean Better?
Generally, Yes
More parameters = more capacity to learn complex patterns.
A 70B model will usually outperform a 7B model on:
- Complex reasoning
- Nuanced understanding
- Following intricate instructions
But It's Not That Simple
Architecture matters:
- Mistral 7B often beats larger models
- Training quality can compensate for size
- Mixture of Experts (MoE) changes the math
Diminishing returns:
- The jump from 7B to 70B is huge
- The jump from 70B to 700B? Less dramatic
Why Size Matters for You
Running Models Locally
Larger models need more resources:
| Model Size | VRAM Needed | Can Run On |
|---|---|---|
| 7B | ~8GB | Gaming GPU (RTX 3080) |
| 13B | ~16GB | High-end GPU (RTX 4090) |
| 70B | ~40GB | Server GPU (A100) |
| 405B | ~200GB+ | Multiple A100s |
Rule of thumb: ~2 bytes per parameter for basic inference.
API Costs
Bigger models = higher API costs:
- GPT-3.5: ~$0.002 per 1K tokens
- GPT-4: ~$0.03 per 1K tokens (15x more)
Speed
Bigger models = slower responses:
- 7B: Near-instant responses
- 70B: Noticeable pause
- 400B+: Several seconds per response
Choosing the Right Size
When to Use Smaller Models (7B-13B)
- Simple tasks (summarization, classification)
- Running locally on consumer hardware
- High-volume, cost-sensitive applications
- Speed is critical
When to Use Larger Models (70B+)
- Complex reasoning required
- Nuanced, creative writing
- Multi-step problems
- Accuracy is paramount
The Open Source Sweet Spot
For most people running local AI:
Llama 3 8B / Mistral 7B
- Runs on gaming hardware
- Surprisingly capable
- Fast responses
- Free to use
Llama 3 70B
- Needs serious hardware (or cloud)
- Near-commercial quality
- Great for serious projects
Quantization: Cheating the Size Limit
Quantization reduces model precision to fit larger models on smaller hardware.
A 70B model normally needs ~140GB of memory, but:
- 8-bit quantization: ~70GB (half!)
- 4-bit quantization: ~35GB (quarter!)
- 2-bit quantization: ~17GB (but quality suffers)
Trade-off: Some quality loss for massive memory savings.
The Bottom Line
Parameters indicate model capability, but context matters:
- 7B models are your daily drivers
- 70B models are for serious work
- 400B+ models are API-only for most people
For most tasks, a well-trained 7B model beats a poorly-trained 70B model.
Don't chase parameter counts—chase results for your use case.
Next up: Prompt Engineering Basics — How to get better results from AI