Evaluating a CNN Model Like a Pro

David Landup
David Landup

There's much more to evaluating a model over metric evaluation and predicting a batch and checking manually. These are the numbers you'd put out to signify how great your model is, if you were to write a publication - however, it's still a black box system in which we have no clue as to why the street before was classified as a building and vice versa. We do know that there's an overlap, but what has the model learned to make it misclassify in a relatively obvious image like the one above?

Note: Before going further, let's unshuffle the test set again. I know - it's tedious, and I wish there were a shorter way to do this, but there isn't. Some of the visualizations down the line we'll make are affected by the order of data.

test_generator = test_datagen.flow_from_directory(config['TEST_PATH'],
                                                 target_size=(150,150),
                                                 batch_size=32,
                                                 shuffle=False,
                                                 class_mode='categorical',
                                                 seed=2)

y_preds = model.predict(test_generator)

Identifying Wrong Predictions

Let's start out by identifying the wrong predictions. The test_generator.classes property is a NumPy array of the classes - one class for each instance. It's the length of our test set (3000) and can be used to directly compare against the most confident predictions in our y_preds:

Start project to continue
Lessson 4/6
You must first start the project before tracking progress.
Mark completed

© 2013-2025 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms