Exploratory Data Analysis (EDA)
Importing Modules
Let's take care of all of the imports, at the top of the script/Jupyter Notebook so we don't have to worry about imports later:
# Scikit-Learn and Shallow Learning
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import ElasticNet
from sklearn import metrics
# TF and Keras-related imports
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
# Data manipulation and processing
import pandas as pd
import numpy as np
# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
Since we'll be starting out with shallow learning techniques for the baseline performance - we've imported a utility function, a scaler (for preprocessing), two regressor models and the metrics
module from Scikit-Learn.
Though TensorFlow, we import Keras, and a commonly used class so we can shorten calls such as tf.keras.layers.Dense()
to layers.Dense()
.
We're naturally importing pandas
and numpy
for handling and manipulating data, as well as Matplotlib and Seaborn to visualize it.
Loading the Data
The dataset we'll be working with reports sales of residential units between 2006 and 2010 in a city called Ames which is located in Iowa, United States.