Back to Glossary
concepts

Batch Size

Number of training examples processed together before updating model weights.

Share:

Definition

Batch size determines how many samples are processed before each parameter update during training.

  • **Trade-offs:**
  • Large Batch: Stable gradients, faster (parallel), more memory
  • Small Batch: Noisy gradients, regularization effect, less memory

Common Sizes: - 16, 32, 64, 128, 256 typical - LLM training: Often 1000s (accumulated) - Limited by GPU memory

Gradient Accumulation: - Simulate larger batches with limited memory - Accumulate gradients over multiple forward passes - Update weights after N steps

Impact on Training: - Affects learning dynamics - May need to adjust learning rate - Larger batches often need higher learning rate

Examples

Training with batch size 32 means processing 32 images before updating weights.

Want more AI knowledge?

Get bite-sized AI concepts delivered to your inbox.

Free intelligence briefs. No spam, unsubscribe anytime.

Discussion