Definition
Gradient descent is the core optimization algorithm for training machine learning models.
How It Works: 1. Calculate loss for current parameters 2. Compute gradient (direction of steepest increase) 3. Move parameters in opposite direction 4. Repeat until convergence
- **Variants:**
- Batch GD: Use all data each step
- Stochastic GD (SGD): Use one sample
- Mini-batch GD: Use small batches (most common)
- **Advanced Optimizers:**
- Adam: Adaptive learning rates, momentum
- AdamW: Adam with weight decay
- SGD with Momentum: Accelerates convergence
Challenges: - Learning rate selection - Local minima - Saddle points - Slow convergence
Examples
Adam optimizer adjusting millions of parameters to minimize prediction error.
Related Terms
Want more AI knowledge?
Get bite-sized AI concepts delivered to your inbox.
Free daily digest. No spam, unsubscribe anytime.