Machine Learning Lesson of the Day – Overfitting and Underfitting

(This article was originally published at The Chemical Statistician » Statistics, and syndicated at StatsBlogs.)

Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data.  Intuitively, overfitting occurs when the model or the algorithm fits the data too well.  Specifically, overfitting occurs if the model or algorithm shows low bias but high variance.  Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using validation or cross-validation to compare their predictive accuracies on test data.

Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data.  Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough.  Specifically, underfitting occurs if the model or algorithm shows low variance but high bias.  Underfitting is often a result of an excessively simple model.

Both overfitting and underfitting lead to poor predictions on new data sets.

In my experience with statistics and machine learning, I don’t encounter underfitting very often.  Data sets that are used for predictive modelling nowadays often come with too many predictors, not too few.  Nonetheless, when building any model in machine learning for predictive modelling, use validation or cross-validation to assess predictive accuracy – whether you are trying to avoid overfitting or underfitting.


Filed under: Machine Learning, Machine Learning Lesson of the Day, Predictive Modelling, Statistics Tagged: bias, cross-validation, machine learning, overfitting, predictive modelling, statistics, underfitting, validation, variance



Please comment on the article here: The Chemical Statistician » Statistics

Tags: , , , , , , , , ,


Subscribe

Email:

  Subscribe