Understanding Model Parameters

You see it everywhere: "Llama 3 70B", "Mistral 7B", "GPT-4 with 1.7 trillion parameters."

What do these numbers mean, and why should you care?

What Are Parameters?

Parameters are the learnable numbers inside a neural network that determine its behavior.

Think of parameters like the settings on a massive mixing board:

Millions of knobs
Each slightly adjusted during training
Together they determine what the model outputs

When an LLM is trained, it's essentially finding the right values for all these parameters to predict text accurately.

The Numbers in Context

Model	Parameters	Rough Capability
GPT-2	1.5B	Basic text generation
Llama 3 8B	8B	Good for simple tasks
Mistral 7B	7B	Surprisingly capable
Llama 3 70B	70B	Strong all-around
GPT-4	~1.7T	State of the art
Llama 3 405B	405B	Near GPT-4 level

B = Billion, T = Trillion

Does Bigger Always Mean Better?

Generally, Yes

More parameters = more capacity to learn complex patterns.

A 70B model will usually outperform a 7B model on:

Complex reasoning
Nuanced understanding
Following intricate instructions

But It's Not That Simple

Architecture matters:

Mistral 7B often beats larger models
Training quality can compensate for size
Mixture of Experts (MoE) changes the math

Diminishing returns:

The jump from 7B to 70B is huge
The jump from 70B to 700B? Less dramatic

Why Size Matters for You

Running Models Locally

Larger models need more resources:

Model Size	VRAM Needed	Can Run On
7B	~8GB	Gaming GPU (RTX 3080)
13B	~16GB	High-end GPU (RTX 4090)
70B	~40GB	Server GPU (A100)
405B	~200GB+	Multiple A100s

Rule of thumb: ~2 bytes per parameter for basic inference.

API Costs

Bigger models = higher API costs:

GPT-3.5: ~$0.002 per 1K tokens
GPT-4: ~$0.03 per 1K tokens (15x more)

Speed

Bigger models = slower responses:

7B: Near-instant responses
70B: Noticeable pause
400B+: Several seconds per response

Choosing the Right Size

When to Use Smaller Models (7B-13B)

Simple tasks (summarization, classification)
Running locally on consumer hardware
High-volume, cost-sensitive applications
Speed is critical

When to Use Larger Models (70B+)

Complex reasoning required
Nuanced, creative writing
Multi-step problems
Accuracy is paramount

The Open Source Sweet Spot

For most people running local AI:

Llama 3 8B / Mistral 7B

Runs on gaming hardware
Surprisingly capable
Fast responses
Free to use

Llama 3 70B

Needs serious hardware (or cloud)
Near-commercial quality
Great for serious projects

Quantization: Cheating the Size Limit

Quantization reduces model precision to fit larger models on smaller hardware.

A 70B model normally needs ~140GB of memory, but:

8-bit quantization: ~70GB (half!)
4-bit quantization: ~35GB (quarter!)
2-bit quantization: ~17GB (but quality suffers)

Trade-off: Some quality loss for massive memory savings.

The Bottom Line

Parameters indicate model capability, but context matters:

7B models are your daily drivers
70B models are for serious work
400B+ models are API-only for most people

For most tasks, a well-trained 7B model beats a poorly-trained 70B model.

Don't chase parameter counts—chase results for your use case.

Next up: Prompt Engineering Basics — How to get better results from AI

Understanding Model Parameters

Understanding Model Parameters

What Are Parameters?

The Numbers in Context

Does Bigger Always Mean Better?

Generally, Yes

But It's Not That Simple

Why Size Matters for You

Running Models Locally

API Costs

Speed

Choosing the Right Size

When to Use Smaller Models (7B-13B)

When to Use Larger Models (70B+)

The Open Source Sweet Spot

Quantization: Cheating the Size Limit

The Bottom Line

Enjoying the course?

Discussion

Course Outline