A→Z
A2ZAI
Back to Glossary
techniques

Transformer

A neural network architecture using self-attention mechanisms, the foundation of modern LLMs.

Share:

Definition

Transformers are a type of neural network architecture introduced in the 2017 paper "Attention Is All You Need." They revolutionized NLP and are now used across AI.

Key Innovation - Self-Attention: Unlike previous architectures that processed data sequentially, transformers can process all parts of the input simultaneously by calculating attention scores between every pair of elements.

  • **Architecture Components:**
  • Encoder: Processes input (used in BERT)
  • Decoder: Generates output (used in GPT)
  • Attention Heads: Multiple parallel attention mechanisms

Why Transformers Dominate: - Highly parallelizable (faster training) - Better at capturing long-range dependencies - Scale effectively with more data and compute

Examples

GPT, BERT, Claude, and all modern LLMs use transformer architecture.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free daily digest. No spam, unsubscribe anytime.

Discussion