How LLMs Work

Large Language Models like GPT-4 and Claude can write essays, code, poetry, and hold conversations. How do they actually work?

The Core Insight

LLMs are next-word prediction machines.

When you type "The cat sat on the ___", the model predicts the most likely next word based on patterns it learned from billions of text examples.

That's it. Everything else—conversations, reasoning, creativity—emerges from doing this really, really well.

Training: Reading the Internet

Phase 1: Pre-training

The model reads massive amounts of text:

Websites, books, articles, code
Wikipedia, Reddit, academic papers
Billions upon billions of words

For each chunk of text, it plays a game:

Hide the next word
Predict what it should be
Check if it was right
Adjust to do better next time

After seeing enough examples, the model learns:

Grammar and syntax
Facts and knowledge
Reasoning patterns
Writing styles

Phase 2: Fine-tuning

Raw pre-trained models are like unsocialized geniuses—smart but unhelpful.

Fine-tuning teaches them to:

Follow instructions
Be helpful and harmless
Format responses appropriately

This uses RLHF (Reinforcement Learning from Human Feedback):

Humans rate model responses
Model learns what humans prefer
Repeat until it's actually useful

The Transformer Architecture

All modern LLMs use transformers (from the 2017 paper "Attention Is All You Need").

The Key Innovation: Attention

Previous models read text word-by-word, forgetting earlier words.

Transformers can look at all words simultaneously and decide which ones are important for each prediction.

Example: "The trophy didn't fit in the suitcase because it was too big."

What does "it" refer to? The trophy or suitcase?

Attention lets the model:

Look at all words in the sentence
Calculate relevance scores
Determine "it" = trophy (because trophies are typically "too big")

Why This Matters

Attention enables:

Understanding context across long passages
Connecting related concepts
Handling complex, multi-step reasoning

Inside the Model: Parameters

LLMs have billions of parameters—numbers that determine behavior.

Model	Parameters
GPT-2	1.5 billion
GPT-3	175 billion
GPT-4	~1.7 trillion
Llama 3 405B	405 billion

More parameters = more capacity to learn patterns = generally better performance.

But also = more expensive to train and run.

Generation: How Responses Are Created

When you ask a question:

Tokenization: Your text is split into tokens (~4 characters each)
Encoding: Tokens become numbers the model understands
Processing: Numbers flow through transformer layers
Prediction: Model outputs probability for each possible next token
Selection: Pick a token (usually using some randomness)
Repeat: Use the new token to predict the next one
Stop: Continue until reaching an end signal

This happens incredibly fast—hundreds of tokens per second.

The "Temperature" Setting

Temperature controls randomness in token selection:

Low temperature (0.0-0.3): More predictable, focused responses
Medium temperature (0.5-0.7): Balanced creativity and coherence
High temperature (0.8-1.0+): More creative, but potentially chaotic

What LLMs Don't Do

LLMs are impressive but not magic:

They don't "understand" like humans. They find statistical patterns.
They don't access the internet (unless given tools).
They can "hallucinate"—generate plausible-sounding nonsense.
They don't learn from conversations (each chat starts fresh).

The Bottom Line

LLMs are sophisticated pattern matching machines trained on human text. They predict what comes next based on what they've seen before.

The "intelligence" emerges from:

Massive scale (billions of parameters)
Massive data (trillions of words)
Clever architecture (transformers + attention)

Understanding this helps you use them better—and recognize their limitations.

Next up: Understanding Model Parameters — What "7B" and "70B" actually mean

How LLMs Work

How LLMs Work

The Core Insight

Training: Reading the Internet

Phase 1: Pre-training

Phase 2: Fine-tuning

The Transformer Architecture

The Key Innovation: Attention

Why This Matters

Inside the Model: Parameters

Generation: How Responses Are Created

The "Temperature" Setting

What LLMs Don't Do

The Bottom Line

Enjoying the course?

Discussion

Course Outline