Overfitting Is Your Friend, Not Your Foe

David Landup
David Landup

It's true, nobody wants overfitting end models, just like nobody wants underfitting end models.

Overfit models perform great on training data, but can't generalize well to new instances. What you end up with is a model that's approaching a fully hard-coded model tailored to a specific dataset.

Underfit models can't generalize to new data, but they can't model the original training set either.

The right model is one that fits the data in such a way that it performs well predicting values in the training, validation and test set, as well as new instances.

Overfitting vs. Data Scientists

Battling overfitting is given a spotlight because it's more illusory, and more tempting for a rookie to create overfit models when they start with their Machine Learning journey. Throughout books, blog posts and courses, a common scenario is given:

"This model has a 100% accuracy rate! It's perfect! Or not. Actually, it just badly overfits the dataset, and when testing it on new instances, it performs with just X%, which is equal to random guessing."

After these sections, entire book and course chapters are dedicated to battling overfitting and how to avoid it. The word itself became stigmatized as a generally bad thing. And this is where the general conception arises:

Start course to continue
Lessson 8/17
You must first start the course before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms