Tensorflow 2.0: Solving Classification and Regression Problems

After much hype, Google finally released TensorFlow 2.0 which is the latest version of Google's flagship deep learning platform. A lot of long-awaited features have been introduced in TensorFlow 2.0. This article very briefly covers how you can develop simple classification and regression models using TensorFlow 2.0.

Classification with Tensorflow 2.0

If you have ever worked with Keras library, you are in for a treat. TensorFlow 2.0 now uses Keras API as its default library for training classification and regression models. Before TensorFlow 2.0, one of the major criticisms that the earlier versions of TensorFlow had to face stemmed from the complexity of model creation. Previously you need to stitch graphs, sessions and placeholders together in order to create even a simple logistic regression model. With TensorFlow 2.0, creating classification and regression models have become a piece of cake.

So without further ado, let's develop a classification model with TensorFlow.

The Dataset

The dataset for the classification example can be downloaded freely from this link. Download the file in CSV format. If you open the downloaded CSV file, you will see that the file doesn't contain any headers. The detail of the columns is available at UCI machine learning repository. I will recommend that you read the dataset information in detail from the download link. I will briefly summarize the dataset in this section.

The dataset basically consists of 7 columns:

  1. price (the buying price of the car)
  2. maint ( the maintenance cost)
  3. doors (number of doors)
  4. persons (the seating capacity)
  5. lug_capacity (the luggage capacity)
  6. safety (how safe is the car)
  7. output (the condition of the car)

Given the first 6 columns, the task is to predict the value for the 7th column i.e. the output. The output column can have one of the three values i.e. "unacc" (unacceptable), "acc" (acceptable), good, and very good.

Importing Libraries

Before we import the dataset into our application, we need to import the required libraries.

import pandas as pd
import numpy as np
import tensorflow as tf

import matplotlib.pyplot as plt
%matplotlib inline

import seaborn as sns
sns.set(style="darkgrid")

Before we proceed, I want you to make sure that you have the latest version of TensorFlow i.e. TensorFlow 2.0. You can check your TensorFlow version with the following command:

print(tf.__version__)

If you do not have TensorFlow 2.0 installed, you can upgrade to the latest version via the following command:

$ pip install --upgrade tensorflow

Importing the Dataset

The following script imports the dataset. Change the path to your CSV data file according.

cols = ['price', 'maint', 'doors', 'persons', 'lug_capacity', 'safety','output']
cars = pd.read_csv(r'/content/drive/My Drive/datasets/car_dataset.csv', names=cols, header=None)

Since the CSV file doesn't contain column headers by default, we passed a list of column headers to the pd.read_csv() method.

Let's now see the first 5 rows of the dataset via the head() method.

cars.head()

Output:

You can see the 7 columns in the dataset.

Data Analysis and Preprocessing

Let's briefly analyze the dataset by plotting a pie chart that shows the distribution of the output. The following script increases the default plot size.

plot_size = plt.rcParams["figure.figsize"]
plot_size [0] = 8
plot_size [1] = 6
plt.rcParams["figure.figsize"] = plot_size

And the following script plots the pie chart showing the output distribution.

cars.output.value_counts().plot(kind='pie', autopct='%0.05f%%', colors=['lightblue', 'lightgreen', 'orange', 'pink'], explode=(0.05, 0.05, 0.05,0.05))

Output:

The output shows that majority of cars (70%) are in unacceptable condition while 20% cars are in acceptable conditions. The ratio of cars in good and very good condition is very low.

All the columns in our dataset are categorical. Deep learning is based on statistical algorithms and statistical algorithms work with numbers. Therefore, we need to convert the categorical information into numeric columns. There are various approaches to do that but one of the most common approach is one-hot encoding. In one-hot encoding, for each unique value in the categorical column, a new column is created. For the rows in the actual column where the unique value existed, a 1 is added to the corresponding row of the column created for that particular value. This might sound complex but the following example will make it clear.

The following script converts categorical columns into numeric columns:

price = pd.get_dummies(cars.price, prefix='price')
maint = pd.get_dummies(cars.maint, prefix='maint')

doors = pd.get_dummies(cars.doors, prefix='doors')
persons = pd.get_dummies(cars.persons, prefix='persons')

lug_capacity = pd.get_dummies(cars.lug_capacity, prefix='lug_capacity')
safety = pd.get_dummies(cars.safety, prefix='safety')

labels = pd.get_dummies(cars.output, prefix='condition')

To create our feature set, we can merge the first six columns horizontally:

X = pd.concat([price, maint, doors, persons, lug_capacity, safety] , axis=1)

Let's see how our label column looks now:

labels.head()

Output:

The label column is basically a one-hot encoded version of the output column that we had in our dataset. The output column had four unique values: unacc, acc, good and very good. In the one-hot encoded label dataset, you can see four columns, one for each of the unique values in the output column. You can see 1 in the column for the unique value that originally existed in that row. For instance, in the first five rows of the output column, the column value was unacc. In the labels column, you can see 1 in the first five rows of the condition_unacc column.

Let's now convert our labels into a numpy array since deep learning models in TensorFlow accept numpy array as input.

y = labels.values

The final step before we can train our TensorFlow 2.0 classification model is to divide the dataset into training and test sets:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=42)

Model Training

To train the model, let's import the TensorFlow 2.0 classes. Execute the following script:

from tensorflow.keras.layers import Input, Dense, Activation,Dropout
from tensorflow.keras.models import Model

As I said earlier, TensorFlow 2.0 uses the Keras API for training the model. In the script above we basically import Input, Dense, Activation, and Dropout classes from tensorflow.keras.layers module. Similarly, we also import the Model class from the tensorflow.keras.models module.

The next step is to create our classification model:

input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(15, activation='relu')(input_layer)
dense_layer_2 = Dense(10, activation='relu')(dense_layer_1)
output = Dense(y.shape[1], activation='softmax')(dense_layer_2)

model = Model(inputs=input_layer, outputs=output)
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

As can be seen from the script, the model contains three dense layers. The first two dense layers contain 15 and 10 nodes, respectively with relu activation function. The final dense layer contain 4 nodes (y.shape[1] == 4) and softmax activation function since this is a classification task. The model is trained using categorical_crossentropy loss function and adam optimizer. The evaluation metric is accuracy.

The following script shows the model summary:

print(model.summary())

Output:

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         [(None, 21)]              0
_________________________________________________________________
dense (Dense)                (None, 15)                330
_________________________________________________________________
dense_1 (Dense)              (None, 10)                160
_________________________________________________________________
dense_2 (Dense)              (None, 4)                 44
=================================================================
Total params: 534
Trainable params: 534
Non-trainable params: 0
_________________________________________________________________
None

Finally, to train the model execute the following script:

history = model.fit(X_train, y_train, batch_size=8, epochs=50, verbose=1, validation_split=0.2)

The model will be trained for 50 epochs but here for the sake of space, the result of only last 5 epochs is displayed:

Epoch 45/50
1105/1105 [==============================] - 0s 219us/sample - loss: 0.0114 - acc: 1.0000 - val_loss: 0.0606 - val_acc: 0.9856
Epoch 46/50
1105/1105 [==============================] - 0s 212us/sample - loss: 0.0113 - acc: 1.0000 - val_loss: 0.0497 - val_acc: 0.9856
Epoch 47/50
1105/1105 [==============================] - 0s 219us/sample - loss: 0.0102 - acc: 1.0000 - val_loss: 0.0517 - val_acc: 0.9856
Epoch 48/50
1105/1105 [==============================] - 0s 218us/sample - loss: 0.0091 - acc: 1.0000 - val_loss: 0.0536 - val_acc: 0.9856
Epoch 49/50
1105/1105 [==============================] - 0s 213us/sample - loss: 0.0095 - acc: 1.0000 - val_loss: 0.0513 - val_acc: 0.9819
Epoch 50/50
1105/1105 [==============================] - 0s 209us/sample - loss: 0.0080 - acc: 1.0000 - val_loss: 0.0536 - val_acc: 0.9856

By the end of the 50th epoch, we have training accuracy of 100% while validation accuracy of 98.56%, which is impressive.

Let's finally evaluate the performance of our classification model on the test set:

score = model.evaluate(X_test, y_test, verbose=1)

print("Test Score:", score[0])
print("Test Accuracy:", score[1])

Here is the output:

WARNING:tensorflow:Falling back from v2 loop because of error: Failed to find data adapter that can handle input: <class 'pandas.core.frame.DataFrame'>, <class 'NoneType'>
346/346 [==============================] - 0s 55us/sample - loss: 0.0605 - acc: 0.9740
Test Score: 0.06045335989359314
Test Accuracy: 0.9739884

Our model achieves an accuracy of 97.39% on the test set. Though it is slightly less than the training accuracy of 100%, it is still very good given the fact that we randomly chose the number of layers and the nodes. You can add more layers to the model with more nodes and see if you can get better results on the validation and test sets.

Regression with TensorFlow 2.0

In regression problem, the goal is to predict a continuous value. In this section, you will see how to solve a regression problem with TensorFlow 2.0

The Dataset

The dataset for this problem can be downloaded freely from this link. Download the CSV file.

The following script imports the dataset. Do not forget to change the path to your own CSV datafile.

petrol_cons = pd.read_csv(r'/content/drive/My Drive/datasets/petrol_consumption.csv')

Let's print the first five rows of the dataset via the head() function:

petrol_cons.head()

Output:

You can see that there are five columns in the dataset. The regression model will be trained on the first four columns, i.e. Petrol_tax, Average_income, Paved_Highways, and Population_Driver_License(%). The value for the last column i.e. Petrol_Consumption will be predicted. As you can see that there is no discrete value for the output column, rather the predicted value can be any continuous value.

Data Preprocessing

In the data preprocessing step we will simply split the data into features and labels, followed by dividing the data into test and training sets. Finally the data will be normalized. For regression problems in general, and for regression problems with deep learning, it is highly recommended that you normalize your dataset. Finally, since all the columns are numeric, here we do not need to perform one-hot encoding of the columns.

X = petrol_cons.iloc[:, 0:4].values
y = petrol_cons.iloc[:, 4].values

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In the above script, in the feature set X, the first four columns of the dataset are included. In the label set y, only the 5th column is included. Next, the data set is divided into training and test size via the train_test_split method of the sklearn.model_selection module. The value for the test_size attribute is 0.2 which means that the test set will contain 20% of the original data and the training set will consist of the remaining 80% of the original dataset. Finally, the StandardScaler class from the sklearn.preprocessing module is used to scale the dataset.

Model Training

The next step is to train our model. This is process is quite similar to training the classification. The only change will be in the loss function and the number of nodes in the output dense layer. Since now we are predicting a single continuous value, the output layer will only have 1 node.

input_layer = Input(shape=(X.shape[1],))
dense_layer_1 = Dense(100, activation='relu')(input_layer)
dense_layer_2 = Dense(50, activation='relu')(dense_layer_1)
dense_layer_3 = Dense(25, activation='relu')(dense_layer_2)
output = Dense(1)(dense_layer_3)

model = Model(inputs=input_layer, outputs=output)
model.compile(loss="mean_squared_error" , optimizer="adam", metrics=["mean_squared_error"])

Our model consists of four dense layers with 100, 50, 25, and 1 node, respectively. For regression problems, one of the most commonly used loss functions is mean_squared_error. The following script prints the summary of the model:

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_4 (InputLayer)         [(None, 4)]               0
_________________________________________________________________
dense_10 (Dense)             (None, 100)               500
_________________________________________________________________
dense_11 (Dense)             (None, 50)                5050
_________________________________________________________________
dense_12 (Dense)             (None, 25)                1275
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 26
=================================================================
Total params: 6,851
Trainable params: 6,851
Non-trainable params: 0

Finally, we can train the model with the following script:

history = model.fit(X_train, y_train, batch_size=2, epochs=100, verbose=1, validation_split=0.2)

Here is the result from the last 5 training epochs:

Epoch 96/100
30/30 [==============================] - 0s 2ms/sample - loss: 510.3316 - mean_squared_error: 510.3317 - val_loss: 10383.5234 - val_mean_squared_error: 10383.5234
Epoch 97/100
30/30 [==============================] - 0s 2ms/sample - loss: 523.3454 - mean_squared_error: 523.3453 - val_loss: 10488.3036 - val_mean_squared_error: 10488.3037
Epoch 98/100
30/30 [==============================] - 0s 2ms/sample - loss: 514.8281 - mean_squared_error: 514.8281 - val_loss: 10379.5087 - val_mean_squared_error: 10379.5088
Epoch 99/100
30/30 [==============================] - 0s 2ms/sample - loss: 504.0919 - mean_squared_error: 504.0919 - val_loss: 10301.3304 - val_mean_squared_error: 10301.3311
Epoch 100/100
30/30 [==============================] - 0s 2ms/sample - loss: 532.7809 - mean_squared_error: 532.7809 - val_loss: 10325.1699 - val_mean_squared_error: 10325.1709

To evaluate the performance of a regression model on test set, one of the most commonly used metrics is root mean squared error. We can find mean squared error between the predicted and actual values via the mean_squared_error class of the sklearn.metrics module. We can then take square root of the resultant mean squared error. Look at the following script:

from sklearn.metrics import mean_squared_error
from math import sqrt

pred_train = model.predict(X_train)
print(np.sqrt(mean_squared_error(y_train,pred_train)))

pred = model.predict(X_test)
print(np.sqrt(mean_squared_error(y_test,pred)))

The output shows the mean squared error for both the training and test sets. The results show that model performance is better on the training set since the root mean squared error value for training set is less. Our model is overfitting. The reason is obvious, we only had 48 records in the dataset. Try to train regression models with a larger dataset to get better results.

50.43599665058207
84.31961060849562

Conclusion

TensorFlow 2.0 is the latest version of Google's TensorFlow library for deep learning. This article briefly covers how to create classification and regression models with TensorFlow 2.0. To have a hands on experience, I would suggest that you practice the examples given in this article and try to create simple regression and classification models with TensorFlow 2.0 using some other datasets.

Author image
About Usman Malik
Paris (France) Twitter
Programmer | Blogger | Data Science Enthusiast | PhD To Be | Arsenal FC for Life