Introduction
A common danger in Machine learning is overfitting, producing a model that performs well on training data, but that generalizes very poorly on new data or test data or we can say unseen data. This could involve learning noise in the data or learning to identify specific output rather than whatever factors are actually predictive of the desired outcome.
And the other side is underfitting, producing a model that doesn’t perform well even on training data. This mainly happens due to a provisional lack of features or less training dataset, also when the model tries to build the linear relationship through a nonlinear relationship.
What is Model Fitting?
Model fitting is a measure of how well a machine learning model generalizes to similar data to that on which it was trained. The generalization of a model to new data is ultimately what allows us to use machine learning algorithms every day to make predictions and classify data. The definition of a good model fit is one that accurately approximates the output of an unknown input when it is provided with unknowable inputs. A model’s fitting is the process of adjusting its parameters in order to improve its accuracy.
Understanding model fit is important for understanding the root cause of poor model accuracy. In fact, overfitting and underfitting are the two biggest causes of the poor performance of machine learning algorithms. Hence, model fitting is the essence of machine learning. If our model doesn’t fit our data correctly, the outcomes it produces will not be accurate enough to be useful for practical decision-making.
Overfitting the Training Data
Say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is something that we humans do all too often, and unfortunately, machines can fall into the same trap if we are not careful. In Machine Learning this is called overfitting: it means that the model performs well on the training data, but it does not generalize well.
Fig. shows an example of a high-degree polynomial life satisfaction model that strongly overfits the training data. Even though it performs much better on the training data than the simple linear model, would you really trust its predictions?
Complex models such as deep neural networks can detect subtle patterns in the data, but if the training set is noisy, or if it is too small (which introduces sampling noise), then the model is likely to detect patterns in the noise itself. Obviously, these patterns will not generalize to new instances. For example, say you feed your life satisfaction model many more attributes, including uninformative ones such as the country’s name. In that case, a complex model may detect patterns like the fact that all county‐ tries in the training data with a w in their name have a life satisfaction greater than 7: New Zealand (7.3), Norway (7.4), Sweden (7.2), and Switzerland (7.5).
How confident are you that the W-satisfaction rule generalizes to Rwanda or Zimbabwe? Obviously, this pattern occurred in the training data by pure chance, but the model has no way to tell whether a pattern is real or simply the result of noise in the data.
Overfitting happens when the model is too complex relative to the amount and noisiness of the training data. The possible solutions are: -
To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model).
by reducing the number of attributes in the training data or by constraining the model.
To gather more training data.
To reduce the noise in the training data (e.g., fix data errors and remove outliers).
Underfitting the Training Data
Underfitting is the opposite of overfitting, it occurs when your model is too simple to learn the underlying structure of the data. For example, a linear model of life satisfaction is prone to underfit; reality is just more complex than the model, so its predictions are bound to be inaccurate, even in the training examples.
The main options to fix this problem are:
Selecting a more powerful model, with more parameters.
Feeding better features to the learning algorithm (feature engineering).
Reducing the constraints on the model.
Another way of thinking about the overfitting problem is as a trade-off between bias and variance. Both are measures of what would happen if you were to retrain your model many times on different sets of training data (from the same larger population).
For example, the degree 0 models in “Overfitting and Underfitting” will make a lot of mistakes for pretty much any training set (drawn from the same population), which means that it has a high bias.
However, any two randomly chosen training sets should give pretty similar models (since any two randomly chosen training sets should have pretty similar average values). So, we say that it has a low variance. High bias and low variance typically correspond to underfitting.
On the other hand, the degree 9 models fit the training set perfectly. It has a very low bias but very high variance (since any two training sets would likely give rise to very different models). This corresponds to overfitting.
If your model has a high bias (which means it performs poorly even on your training data) then one thing to try is adding more features. Going from the degree 0 models in “Overfitting and Underfitting”.
to the degree, 1 model was a big improvement. If your model has high variance, then you can similarly remove features. But another solution is to obtain more data (if you can).
Summary
In this article, I tried to explain overfitting and underfitting in simple terms. If you have any questions related to the post, put them in the comment section and I will do my best to answer them.