A→Z
A2ZAI
Back to Glossary
techniques

Attention Mechanism

A technique allowing models to focus on relevant parts of the input when generating output.

Share:

Definition

Attention mechanisms allow neural networks to weigh the importance of different parts of the input when producing each part of the output.

Self-Attention: For each element in a sequence, calculate how much to "attend to" every other element: 1. Create Query, Key, Value vectors for each token 2. Calculate attention scores (Query × Key) 3. Apply softmax to get weights 4. Weighted sum of Values produces output

Why Attention Revolutionized AI: - Handles long-range dependencies - Parallelizable (unlike RNNs) - Interpretable (attention weights show focus) - Scales effectively

Multi-Head Attention: Run multiple attention operations in parallel, each learning different relationships (syntax, semantics, coreference, etc.)

Examples

When translating "The cat sat on the mat, it was soft," attention helps connect "it" to "mat."

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free daily digest. No spam, unsubscribe anytime.

Discussion