π― Training Neural Networks
Training a neural network means teaching it to make better predictions by adjusting its weights and biases through many iterations.
π 1. Training Process Overview
- π§ Initialize weights β random small values
- β‘οΈ Forward Propagation β get predictions
- π Calculate Loss β measure how wrong it is
- π Backward Propagation β find how to adjust weights
- βοΈ Update weights β using an optimization algorithm
- π Repeat β for many examples and iterations (epochs)
βοΈ 2. Optimization Algorithms (Gradient Descent)
π½ Gradient Descent
Moves the weights in the direction that reduces loss.
Like walking downhill to reach the lowest point (minimum loss).
π Optimizer Comparison
Optimizer | Feature |
---|---|
SGD (Stochastic Gradient Descent) | Updates weights using one data point at a time (fast but noisy) |
Mini-batch GD | Uses small batches (common in practice) |
Adam | Adaptive optimizer; combines momentum + scaling (very popular) |
β οΈ 3. Overfitting vs Underfitting
Term | What Happens | Example |
---|---|---|
Overfitting | Model learns training data too well, including noise β poor generalization | High accuracy on training, low on test |
Underfitting | Model is too simple β canβt capture patterns | Low accuracy on both |
π‘οΈ 4. Regularization Techniques
Help prevent overfitting:
Method | Description |
---|---|
L1 / L2 Regularization | Add penalty to large weights in loss function |
Dropout | Randomly turns off some neurons during training |
Early Stopping | Stop training when validation loss starts increasing |
βοΈ 5. Hyperparameters to Tune
Key Hyperparameters in Deep Learning:
Parameter | Description |
---|---|
Learning Rate | Size of the steps in gradient descent |
Epochs | Full passes through training data |
Batch Size | Number of samples processed at once |
Number of Layers/Neurons | Model complexity |
π§Ύ Summary
- Neural networks learn by adjusting weights to minimize loss.
- Optimizers like Adam improve the speed and stability of training.
- Proper tuning and regularization help avoid overfitting.