Regularization in Machine Learning
- Regularization is an approach to address over-fitting model.
- Overfitted model fails to generalize estimations on test data.
- When the underlying model to be learned is low bias/high variance, or when we have small amount of data, the estimated model is prone to over-fitting.
- Regularization reduces the variance of the model.
What is Over-fitting in Machine Learning?
Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. Generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data.
When machine learning algorithms are constructed, they leverage a sample dataset to train the model. However, when the model trains for too long on sample data or when the model is too complex, it can start to learn the “noise,” or irrelevant information, within the dataset. When the model memorizes the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to generalize well to new data. If a model cannot generalize well to new data, then it will not be able to perform the classification or prediction tasks that it was intended for.
Low error rates and a high variance are good indicators of overfitting. In order to prevent this type of behavior, part of the training dataset is typically set aside as the “test set” to check for overfitting. If the training data has a low error rate and the test data has a high error rate, it signals overfitting.
Types of Regularization
1. Modify the Loss function
L2 Regularization (Ridge Regularization)
It prevents the weights from getting too large (defined by L2 norm). Larger the weights, more complex the model is, more chances of overfitting.
L1 Regularization (Lasso Regularization)
It prevents the weights from getting too large (defined by L1 norm). Larger the weights, more complex the model is, more chances of overfitting. L1 regularization introduces sparsity in the weights. It forces more weights to be zero, than reducing the average magnitude of all weights
Entropy
It is used for the models that output probability. Forces the probability distribution towards uniformdistribution.
2. Modify Data Sampling
Data Augmentation
Create more data from available data by randomly cropping, dilating, rotating, adding small amount of noise etc.
K-fold Cross-validation
Divide the data into k groups. Train on (k-1) groups and test on 1 group. Try all k possible combinations.
3. Change training approach
Injecting Noise
Add random noise to the weights when they are being learned. It pushes the model to be relatively insensitive to small variations in the weights, hence regularization.
Dropout
This method is generally used for neural networks. Connections between consecutive layers are randomly dropped based on a dropout-ratio and the remaining network is trained in the current iteration. In the next iteration, another set of random connections are dropped.
Thank you for reading!
Please leave comments if you have any suggestion/s or would like to add a point/s or if you noticed any mistake/typos!
P.S. If you found this article helpful, clap! 👏👏👏 [feels rewarding and gives the motivation to continue my writing].