Definition
Attention mechanisms allow neural networks to weigh the importance of different parts of the input when producing each part of the output.
Self-Attention: For each element in a sequence, calculate how much to "attend to" every other element: 1. Create Query, Key, Value vectors for each token 2. Calculate attention scores (Query × Key) 3. Apply softmax to get weights 4. Weighted sum of Values produces output
Why Attention Revolutionized AI: - Handles long-range dependencies - Parallelizable (unlike RNNs) - Interpretable (attention weights show focus) - Scales effectively
Multi-Head Attention: Run multiple attention operations in parallel, each learning different relationships (syntax, semantics, coreference, etc.)
Examples
When translating "The cat sat on the mat, it was soft," attention helps connect "it" to "mat."
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.