A→Z
A2ZAI
Back to AI 101
Lesson 4 of 15
intermediatemodels

Understanding Model Parameters

What 7B, 70B, and 405B actually mean

5 min read
Share:

Understanding Model Parameters

You see it everywhere: "Llama 3 70B", "Mistral 7B", "GPT-4 with 1.7 trillion parameters."

What do these numbers mean, and why should you care?

What Are Parameters?

Parameters are the learnable numbers inside a neural network that determine its behavior.

Think of parameters like the settings on a massive mixing board:

  • Millions of knobs
  • Each slightly adjusted during training
  • Together they determine what the model outputs

When an LLM is trained, it's essentially finding the right values for all these parameters to predict text accurately.

The Numbers in Context

ModelParametersRough Capability
GPT-21.5BBasic text generation
Llama 3 8B8BGood for simple tasks
Mistral 7B7BSurprisingly capable
Llama 3 70B70BStrong all-around
GPT-4~1.7TState of the art
Llama 3 405B405BNear GPT-4 level

B = Billion, T = Trillion

Does Bigger Always Mean Better?

Generally, Yes

More parameters = more capacity to learn complex patterns.

A 70B model will usually outperform a 7B model on:

  • Complex reasoning
  • Nuanced understanding
  • Following intricate instructions

But It's Not That Simple

Architecture matters:

  • Mistral 7B often beats larger models
  • Training quality can compensate for size
  • Mixture of Experts (MoE) changes the math

Diminishing returns:

  • The jump from 7B to 70B is huge
  • The jump from 70B to 700B? Less dramatic

Why Size Matters for You

Running Models Locally

Larger models need more resources:

Model SizeVRAM NeededCan Run On
7B~8GBGaming GPU (RTX 3080)
13B~16GBHigh-end GPU (RTX 4090)
70B~40GBServer GPU (A100)
405B~200GB+Multiple A100s

Rule of thumb: ~2 bytes per parameter for basic inference.

API Costs

Bigger models = higher API costs:

  • GPT-3.5: ~$0.002 per 1K tokens
  • GPT-4: ~$0.03 per 1K tokens (15x more)

Speed

Bigger models = slower responses:

  • 7B: Near-instant responses
  • 70B: Noticeable pause
  • 400B+: Several seconds per response

Choosing the Right Size

When to Use Smaller Models (7B-13B)

  • Simple tasks (summarization, classification)
  • Running locally on consumer hardware
  • High-volume, cost-sensitive applications
  • Speed is critical

When to Use Larger Models (70B+)

  • Complex reasoning required
  • Nuanced, creative writing
  • Multi-step problems
  • Accuracy is paramount

The Open Source Sweet Spot

For most people running local AI:

Llama 3 8B / Mistral 7B

  • Runs on gaming hardware
  • Surprisingly capable
  • Fast responses
  • Free to use

Llama 3 70B

  • Needs serious hardware (or cloud)
  • Near-commercial quality
  • Great for serious projects

Quantization: Cheating the Size Limit

Quantization reduces model precision to fit larger models on smaller hardware.

A 70B model normally needs ~140GB of memory, but:

  • 8-bit quantization: ~70GB (half!)
  • 4-bit quantization: ~35GB (quarter!)
  • 2-bit quantization: ~17GB (but quality suffers)

Trade-off: Some quality loss for massive memory savings.

The Bottom Line

Parameters indicate model capability, but context matters:

  • 7B models are your daily drivers
  • 70B models are for serious work
  • 400B+ models are API-only for most people

For most tasks, a well-trained 7B model beats a poorly-trained 70B model.

Don't chase parameter counts—chase results for your use case.


Next up: Prompt Engineering Basics — How to get better results from AI

Enjoying the course?

Get notified when we add new lessons and AI updates.

Free daily digest. No spam, unsubscribe anytime.

Discussion