Working with KerasCV

David Landup
David Landup

A Word from the Keras Team

So, why even have KerasCV? Why not just add new layers to Keras, since preprocessing layers already exist? I contacted the Keras team to ask for the motivation behind building KerasCV (and KerasNLP) as a horizontal addition to Keras itself:

KerasCV and KerasNLP are domain-specific collections of Keras model building blocks, such as layers or metrics. Their purpose is to make it quick and frictionless to assemble computer vision and NLP workflows that are performant and that follow modern best practices. You can think of them as a horizontal extension of the Keras API (`keras.layers`, `keras.metrics`, `keras.losses`, etc.). We could add these APIs in Keras directly, but we believe that using a separate package and namespace makes for greater modularity and increased development velocity. It also improves discoverability!

Such building blocks are constantly needed in mainstream CV and NLP workflows. Yet, many of them are hard to implement correctly, because:

  • Some implementation details can be subtle, such as the correct way to do causal masking and padding-masking in a Transformer block.
  • Some performance concerns might get overlooked (e.g. performant in-graph COCO metrics is very tricky)
  • Best practices with regard to things like initializers and dropout aren't always well-known; a good API with good defaults guides you towards doing the correct thing.

For these reasons, it is highly beneficial to have one team implement these reusable building blocks once, with best practices baked-in, then have everyone else reuse them rather than implementing their own _N_ times.

In true Keras fashion - new additions are democratized, and reasonable defaults make it harder to mess up. While still in development - you'd be wise to add KerasCV to your toolbelt already, since the layers are already able to boost your workflow. As new features are being released, you'll be able to just plug them in.

There's no need to learn a new API - KerasCV integrates seamlessly into Keras. Using it in your toolbelt is akin to expanding Keras itself, which is why it's called a "horizontal" addition to the package.

KerasCV Layers and Preprocessing

Let's take a look at some of the new layers, briefly mentioned in earlier lessons. As of July 2022, there are 28 new layers! Again, in true Keras fashion, they can be plugged into your models to create end-to-end predictors, applied to via a simple map() call, or used on individual images. This sort of flexibility lets you leverage the augmentations in any working style you personally prefer, or are bounded by via your team.

If you haven't already, install KerasCV:

$ pip install keras-cv

You can import it and use it as:

import keras_cv

output = keras_cv.LayerName(args)

The expanding list of new layers can be found in the official documentation, but let's take a look at a few important ones here:

  • MixUp
  • CutMix
  • RandAugment
  • RandomAugmentationPipeline

As times change, so do training strategies. In 2019, Yun et al. released "CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features", in which they argue that existing strategies such as MixUp and Cutout can be improved. MixUp would mix up two images, and weigh the labels appropriately, such as overlaying a low-opacity image of a cat over a dog, and assigning 0.5 to 'cat' as well as 0.5 to 'dog'. Cutout would randomly "drop out" spatial data, by either replacing them with black or white pixels. CutMix combines these two - cutting out rectangles from images, and inputting images with other labels in there, weighing the labels by the proportion of the image they take up.

These are easily summarized by applying them to an image and plotting the results. Let's find two images - one of a cat and one of a dog:

dog = ';0,0.190xh&resize=1200:*'
cat = ''

Let's fetch them and load them in using OpenCV:

import urllib

def url_to_array(url):
    req = urllib.request.urlopen(url)
    arr = np.array(bytearray(, dtype=np.int8)
    arr = cv2.imdecode(arr, -1)
    arr = cv2.cvtColor(arr, cv2.COLOR_BGR2RGB)
    arr = cv2.resize(arr, (224, 224))
    return arr

dog_img = url_to_array(dog)
cat_img = url_to_array(cat)

fig, ax = plt.subplots(1, 2, figsize=(16, 6))


Great! We've got a cute puppy and a cat. Let's save them as an array of images, and their labels as another:

images = np.array([dog_img, cat_img])
labels = np.array([0., 1.])

From here, we can define our augmentation layers:

rand_augment = keras_cv.layers.RandAugment(value_range=(0, 255))
cutmix = keras_cv.layers.CutMix()
mixup = keras_cv.layers.MixUp()

And pass the images in:

randaug_imgs = rand_augment(images)
cutmix_imgs = cutmix({"images": images, "labels": labels})
mixup_imgs = mixup({"images": images, "labels": labels})

Note: CutMix and MixUp require a dictionary of {"images": images, "labels": labels}, not just an array of images, because they also compute label proportions in an image. Labels will have to be of type float as well. Because of this - we'll be applying CutMix and MixUp during the preprocessing stages - typically on Dataset objects, rather than within the model's definition, where the rest of the pipeline can go.

Then, we can visualize them:

def visualize(images):
    fig, ax = plt.subplots(1, 2, figsize=(8, 6))
    for index, img in enumerate(images):

This results in:

The augmentations range from practically none to funky otherworldly images in negative colors. You can set the magnitude of augmentation with the magnitude argument, which is a floating point flag set as 0..1.

Now, alternatively, you can have a being fed into a CNN with a couple of preprocessing layers on top. Let's apply a random augmentation, and a CutMix or MixUp augmentation (separately, so we can isolate the effects). When working with instances, you can easily apply functions that transform your data with the map() function.

Let's create the Datasets first:

def preprocess_img(img, label):
    img = tf.image.resize(img, (224, 224))
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.cast(img, tf.float32)
    return {"images": img, "labels": label}

data =, labels)).map(preprocess_img).batch(2)

And let's map the images from data to images augmented with the new KerasCV layers, by applying the transformations and returning the images:

randaug_imgs =
cutmix_imgs =
mixup_imgs =

Finally, we can visualize the effects of the operations:

def visualize(dataset):
    fig, ax = plt.subplots(1, 2, figsize=(8, 6))
    for sample in dataset:
        images = sample["images"]
        labels = sample["labels"]
        for index, img in enumerate(images):


This results in:

The first two images have random augmentations, which were pretty mild here! The second one is CutMix where images from the batch (of only these two) are cut up and mixed. You can see that the returned labels are 0.87 dog and 0.12 cat, and 0.78 dog and 0.21 cat! Finally, the MixUp images are simply overlaid - with small artefacts added to the image, and with weighted classes as well. The first image, doesn't really look like a mix if you don't know that it is - but it does have underlying "dog" artifacts. Surprisingly enough, ConvNets are pretty sensitive to underlying artifacts, and an entire class of "attacks" on them is based on adding noise to images, which are inperceptible to us, but are very perceptible to ConvNets, making them produce very wrong outputs. You can read more about this in "Explaining and Harnessing Adversarial Examples" by Goodfellow et al.

Since we're not really adding noise, but rather, artifacts of other classes, they can still learn the difference between them even in a joined image!

Note: These are known as inter-class examples.

These two tricks help with generalization by forcing a model to diversify the features that make up a class. Most modern training pipelines use either a CutMix or MixUp, and rarely are they both applied, so choose the one you prefer, or give it up to chance and augment some batches with MixUp and some batches with CutMix! Let's take a look at how you can create an augmentation pipeline that can plug into any network definition or We'll be training an EfficientNet with and without the augmentation pipeline on a small dataset for reference.

Custom RandAugment

RandAugment is a special case of RandomAugmentationPipeline, in which a random augmentation layer is applied to the input. Internally, RandomAugmentationPipeline simply has a list of layers that it chooses from.

You can change this list easily by removing or adding keras_cv.layers.LayerName, or by simply replacing it with a custom list:

pipeline = keras_cv.layers.RandomAugmentationPipeline(
    layers=[keras_cv.layers.Grayscale(), keras_cv.layers.AutoContrast()]

This pipeline can then be applied to a via map():

dataset =

Creating an Augmentation Pipeline

Let's create an augmentation pipeline! We'll apply several augmentations, using RandAugment, and we'll apply CutMix or MixUp with a 50/50 probability to the dataset we're training on.

We'll make a non-augmented set, a set with only random augmentation and a set with random augmentation and CutMix or MixUp and compare how they train.

Now, CutMix and MixUp require a different format than we'd use for training - it'll need a dictionary with the form {"images": images, "labels": labels} as seen before, so we'll have two preparation functions: one for the CutMix/MixUp augmentations, and one for the model:

import random

def preprocess(img, label):
    img = tf.image.resize(img, (224, 224))
    img = tf.image.convert_image_dtype(img, tf.float32)
    img = tf.cast(img, tf.float32)
    label = tf.one_hot(label, n_classes)
    return {"images": img, "labels": label}

def prep_for_model(inputs):
    images, labels = inputs["images"], inputs["labels"]
    images = tf.cast(images, tf.float32)
    return images, labels

def cutmix_or_mixup(samples):
    if tf.random.uniform(()) > 0.5:
        samples = keras_cv.layers.CutMix()(samples)
        samples = keras_cv.layers.MixUp()(samples)
    return samples

preprocess() will prepare the data for cutmix_or_mixup() and prep_for_model() will prepare it for the model. The latter really only extracts the images and labels from the singular inputs dictionary into separate returned items.

Furthermore, after the input layer of the model, we'll apply several transformations:

# The value_range is concerned with the value range for your images. i.e. whether they're normalized to 0..1 or not.
value_range = (0, 255)

data_aug_pipeline = keras.Sequential([
    # magnitude sets how 'aggressive' the augmentations are
    keras_cv.layers.RandAugment(value_range=value_range, magnitude=0.3)

The pipeline only has one layer, which actually really uses many layers. We'll be applying CutMix/MixUp on the dataset itself.

Note: You can replace RandAugment() with a keras_cv.layers.RandomAugmentationPipeline(layers=[layer1, layer2, ...])

Training a Model with KerasCV Preprocessing Layers

Now, let's load a dataset in using tfds, apply these transformations:

import tensorflow_datasets as tfds

# Using 0..50% of the data for training for speed
# and to make it more difficult for the network
(train_set, test_set, valid_set), info = tfds.load("imagenette", 
                                           split=["train[:50%]", "validation", "train[70%:]"],
                                           as_supervised=True, with_info=True)

class_names = info.features["label"].names
n_classes = info.features["label"].num_classes
print(f'Class names: {class_names}')
print('Num of classes:', n_classes) # 10

print("Train set size:", len(train_set)) # 4734
print("Test set size:", len(test_set))   # 2841
print("Valid set size:", len(valid_set)) # 3925

We'll create a non-augmented __set_na and augmented __set_aug:

train_set_na =
test_set_na =
valid_set_na =

train_set_aug =
test_set_aug =
valid_set_aug =

The first model will won't use any augmentation:

model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(None, None, 3)),
    keras.applications.EfficientNetV2B0(weights=None, include_top=False),
    keras.layers.Dense(n_classes, activation='softmax')

history =, 

While the second one will use only RandAugment():

model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(None, None, 3)),
    keras.applications.EfficientNetV2B0(weights=None, include_top=False),
    keras.layers.Dense(n_classes, activation='softmax')

history2 =, 

And the third one will use both the augmentation pipeline and the corresponding CutMix/MixUp sets:

model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(None, None, 3)),
    keras.applications.EfficientNetV2B0(weights=None, include_top=False),
    keras.layers.Dense(n_classes, activation='softmax')

history3 =, 

Generally speaking - MixUp and/or CutMix with a RandAugment of reasonable magnitude (depends on the dataset) will produce better results down the line. They do require a slightly different preprocessing step, and do make the training somewhat slower, but if you're looking to maximize generalization - they're a good bet.

When these three networks train over 100 epochs, here are the validation accurracies:

The MixUp/CutMix pipeline evidently took longer to achieve the higher accuracy - but it eventually overtook the pipeline with just RandAugment(), even though it seemed to have been slightly worse in terms of accuracy in the beginning. This makes sense - it was harder to classify these images but the network eventually got the hang of it. Maybe even more importantly - the individual training/validation curves are significantly different:

With just RandAugment(), the training and validation curves are fairly similar, until around epoch 40 where the model starts to overfit, and they diverge. Even then, they still follow the same trend. Gradients are calculated by applying the loss function between training data and the outputs of the network. This network will only see small gradient updates, as it's correctly classifying most training samples. This makes it difficult to squeeze out the last few percentages and training becomes slower and harder near the end as it's getting better.

On the other hand, the MixUp/CutMix network never gets to see the curves cross:

The training loss, used to update gradients, stays high even though the validation loss and accuracy are high. This almost false belief that the network isn't that accurate might very well be what keeps the validation accuracy rising further than without MixUp/CutMix, as the network isn't settling into place. The training curves are closer to the validation curves at epoch 100, compared to say, epoch 40. Presumably, through more epochs, they'd converge together, as well. The point is - with MixUp/CutMix, we're increasing the potential of the network to learn through an extended period of time, compared to the network that didn't use them.

KerasCV Metrics, Visualization and Explainability Tools

Besides new layers - KerasCV will also feature new metrics, such as COCOMeanAveragePrecision and COCORecall, as well as formatting/utility methods for plotting and calculating bounding boxes, and visualization/explainability tools like GradCam++, covered in an earlier chapter through tf-keras-vis.

These tools are currently scattered between different projects, and some are rarely updated, so having a centralized repository will undoubtedly bring advanced tools to the average engineer.

All in all - it's a really exciting package to keep track of, especially during development!

KerasCV Models

As noted earlier - keras_cv.models will eventually replace keras.applications. When? When they're ready. No hard deadline. Old models are being ported into keras_cv via the community's and the team's efforts, and new models are incoming.

Currently, in July of 2022, several models are already ported:

  • DenseNet
  • MixerMLP
  • ResNets V1 and V2
  • VGGNet
  • DarkNet

And various others, including EfficientNetV2, ConvNeXt, etc. are currently being worked on. The API is much the same, so you shouldn't have issues porting from one to the other:

densenet = keras_cv.models.DenseNet121(include_rescaling=True, include_top=True, num_classes=2)
Model: "DenseNet121"
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_2 (InputLayer)           [(None, None, None,  0           []                               
Total params: 7,039,554
Trainable params: 6,955,906
Non-trainable params: 83,648

The weights argument currently doesn't support the 'imagenet' string - and will presumably be added later. For now, only saved weights can be loaded. For the news - follow the KerasCV GitHub and/or documentation. I'll update the course with new developments!

Lessson 13/17
You must first start the course before tracking progress.
Mark completed

© 2013-2024 Stack Abuse. All rights reserved.