Definition
Transformers are a type of neural network architecture introduced in the 2017 paper "Attention Is All You Need." They revolutionized NLP and are now used across AI.
Key Innovation - Self-Attention: Unlike previous architectures that processed data sequentially, transformers can process all parts of the input simultaneously by calculating attention scores between every pair of elements.
- **Architecture Components:**
- Encoder: Processes input (used in BERT)
- Decoder: Generates output (used in GPT)
- Attention Heads: Multiple parallel attention mechanisms
Why Transformers Dominate: - Highly parallelizable (faster training) - Better at capturing long-range dependencies - Scale effectively with more data and compute
Examples
GPT, BERT, Claude, and all modern LLMs use transformer architecture.
Related Terms
AI models trained on massive text datasets that can understand and generate human-like text.
OpenAI's series of large language models that power ChatGPT.
A technique allowing models to focus on relevant parts of the input when generating output.
Bidirectional Encoder Representations from Transformers - Google's influential language model.
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.