Evaluating a CNN Model Like a Pro
There's much more to evaluating a model over metric evaluation and predicting a batch and checking manually. These are the numbers you'd put out to signify how great your model is, if you were to write a publication - however, it's still a black box system in which we have no clue as to why the street before was classified as a building and vice versa. We do know that there's an overlap, but what has the model learned to make it misclassify in a relatively obvious image like the one above?
Note: Before going further, let's unshuffle the test set again. I know - it's tedious, and I wish there were a shorter way to do this, but there isn't. Some of the visualizations down the line we'll make are affected by the order of data.
test_generator = test_datagen.flow_from_directory(config['TEST_PATH'],
target_size=(150,150),
batch_size=32,
shuffle=False,
class_mode='categorical',
seed=2)
y_preds = model.predict(test_generator)
Identifying Wrong Predictions
Let's start out by identifying the wrong predictions. The test_generator.classes
property is a NumPy array of the classes - one class for each instance. It's the length of our test set (3000) and can be used to directly compare against the most confident predictions in our y_preds
: