Transformer

Definition

Transformers are a type of neural network architecture introduced in the 2017 paper "Attention Is All You Need." They revolutionized NLP and are now used across AI.

Key Innovation - Self-Attention: Unlike previous architectures that processed data sequentially, transformers can process all parts of the input simultaneously by calculating attention scores between every pair of elements.

**Architecture Components:**
Encoder: Processes input (used in BERT)
Decoder: Generates output (used in GPT)
Attention Heads: Multiple parallel attention mechanisms

Why Transformers Dominate: - Highly parallelizable (faster training) - Better at capturing long-range dependencies - Scale effectively with more data and compute

Examples

GPT, BERT, Claude, and all modern LLMs use transformer architecture.

Related Terms

Large Language Model (LLM)

AI models trained on massive text datasets that can understand and generate human-like text.

GPT (Generative Pre-trained Transformer)

OpenAI's series of large language models that power ChatGPT.

Attention Mechanism

A technique allowing models to focus on relevant parts of the input when generating output.

BERT

Bidirectional Encoder Representations from Transformers - Google's influential language model.

Definition

Examples

Related Terms

Want more AI knowledge?

Discussion