Regularization in Machine Learning

Himanshu Gaur
4 min readNov 28, 2022

--

The Lp Norm
  • Regularization is an approach to address over-fitting model.
  • Overfitted model fails to generalize estimations on test data.
  • When the underlying model to be learned is low bias/high variance, or when we have small amount of data, the estimated model is prone to over-fitting.
  • Regularization reduces the variance of the model.

What is Over-fitting in Machine Learning?

Overfitting is a concept in data science, which occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose. Generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data.

When machine learning algorithms are constructed, they leverage a sample dataset to train the model. However, when the model trains for too long on sample data or when the model is too complex, it can start to learn the “noise,” or irrelevant information, within the dataset. When the model memorizes the noise and fits too closely to the training set, the model becomes “overfitted,” and it is unable to generalize well to new data. If a model cannot generalize well to new data, then it will not be able to perform the classification or prediction tasks that it was intended for.

Low error rates and a high variance are good indicators of overfitting. In order to prevent this type of behavior, part of the training dataset is typically set aside as the “test set” to check for overfitting. If the training data has a low error rate and the test data has a high error rate, it signals overfitting.

How Overfitting affects Prediction

Types of Regularization

1. Modify the Loss function

L2 Regularization (Ridge Regularization)

It prevents the weights from getting too large (defined by L2 norm). Larger the weights, more complex the model is, more chances of overfitting.

Loss = Error + (Model Bias)*(L2 Regularization)

L1 Regularization (Lasso Regularization)

It prevents the weights from getting too large (defined by L1 norm). Larger the weights, more complex the model is, more chances of overfitting. L1 regularization introduces sparsity in the weights. It forces more weights to be zero, than reducing the average magnitude of all weights

Loss = Error + (Model Bias)*(L1 Regularization)

Entropy

It is used for the models that output probability. Forces the probability distribution towards uniformdistribution.

Loss = Error — (Model Bias)*(Entropy)

2. Modify Data Sampling

Data Augmentation

Create more data from available data by randomly cropping, dilating, rotating, adding small amount of noise etc.

K-fold Cross-validation

Divide the data into k groups. Train on (k-1) groups and test on 1 group. Try all k possible combinations.

5-fold cross-validation illustration

3. Change training approach

Injecting Noise

Add random noise to the weights when they are being learned. It pushes the model to be relatively insensitive to small variations in the weights, hence regularization.

Dropout

This method is generally used for neural networks. Connections between consecutive layers are randomly dropped based on a dropout-ratio and the remaining network is trained in the current iteration. In the next iteration, another set of random connections are dropped.

Drop-out

Thank you for reading!

Please leave comments if you have any suggestion/s or would like to add a point/s or if you noticed any mistake/typos!

P.S. If you found this article helpful, clap! 👏👏👏 [feels rewarding and gives the motivation to continue my writing].

--

--

Himanshu Gaur
Himanshu Gaur

Written by Himanshu Gaur

Data Engineering | Machine Learning | Like to Write | Love to Read

Responses (1)