Definition
Batch size determines how many samples are processed before each parameter update during training.
- **Trade-offs:**
- Large Batch: Stable gradients, faster (parallel), more memory
- Small Batch: Noisy gradients, regularization effect, less memory
Common Sizes: - 16, 32, 64, 128, 256 typical - LLM training: Often 1000s (accumulated) - Limited by GPU memory
Gradient Accumulation: - Simulate larger batches with limited memory - Accumulate gradients over multiple forward passes - Update weights after N steps
Impact on Training: - Affects learning dynamics - May need to adjust learning rate - Larger batches often need higher learning rate
Examples
Training with batch size 32 means processing 32 images before updating weights.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free intelligence briefs. No spam, unsubscribe anytime.