Machine Learning Workflow

David Landup
David Landup

We've done Exploratory Data Analysis and got familiar with the dataset we're working with. Now - it's time to hop into the standard Machine Learning Workflow, starting with preprocessing data.

Data Preprocessing

We've worked with DataFrames so far, though, this was all without images - we only stored their paths in case we want to retrieve and plot them. One way to load images is to simply iterate through the data and load them in:

import cv2

x = []
y = []

# Loading in 1000 images
for i in data[:1000]:
    if i.endswith('.png'):
        label=i[-5]
        img = cv2.imread(i)
        # Transformation steps, such as resizing
        img = cv2.resize(img,(200,200))
        x.append(img)
        y.append(label)

x and y are Python lists - which are very efficient at appending data at the cost of higher memory usage. Let's convert them to NumPy arrays, split them into a training and testing set, and call the garbage collection module to clear x and y from memory since we won't be using them anymore:

# Reduce from float32 for memory footprint
x = np.array(x, dtype='float16')
y = np.array(y, dtype='float16')

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y, shuffle=True, test_size=0.3)

import gc
x = None
y = None
gc.collect()
Start project to continue
Lessson 3/4
You must first start the project before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms