Data Preprocessing

David Landup
Ammar Alyousfi

In the preprocessing stage, we'll prepare the data to be fed into machine learning models - whether we're performing shallow or deep learning. The first step is clearing the dataset of null values, abnormalities and enforcing data types. Then, we'll one-hot encode categorical variables into numerical ones.

Once that'd done - we can split the data into a training and testing set, as well as scale/standardize it to help both train the models faster and allow them to converge a bit easier. Let's start!

Removing Abnormalities

We've previously seen that abnormal listings exist, and by definition, they're in the minority. Since abnormal sales typically include a lower price for good features due to, say, being in a rush to sell a property, we don't want to discount the features as worse indicators of a price than they really are.

There are also "partial" sale conditions, which don't impact other variables, but do impact the price. If a house is partially built, it'll have the same area, lot frontage, etc. as a finished house! The overall quality can also be ranked as high, if high quality materials are used, but there's no guarantee that this will also reflect the overall condition variable (and even if it does, the variable doesn't correlate with the price much).

Start project to continue
Lessson 3/6
You must first start the project before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms