DeepLabV3+ Implementation with Keras
Implementing Atrous Convolutional Blocks and Atrous Spatial Pyramid Pooling
With the dataset ready, it's time to create our DeepLabV3+ model. Let's refer back to the diagram and translate it into code:
The network uses several convolutional blocks, with differing dilation rates, both for the Atrous Spatial Pyramid Pooling (ASPP) module, and otherwise. Let's define a conv_block()
for that first:
# Turns into atrous_block with dilation_rate > 1
def conv_block(block_input, num_filters=256, kernel_size=(3, 3), dilation_rate=1, padding="same"):
x = keras.layers.Conv2D(num_filters, kernel_size=kernel_size, dilation_rate=dilation_rate, padding="same")(block_input)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Activation('relu')(x)
return x
By default, it's a regular convolutional block. By setting the dilation_rate
to anything above 1
- it becomes an "atrous" convolutional block. This is a pretty standard Conv-BN-ReLU block, with an adjustable dilation_rate
parameter.
Now, let's define the ASPP module, one of the most important parts of DeepLabV3+. There's a small detail omitted from the diagram above - information on how "Image Pooling" is done.
Regular Spatial Pyramid Pooling (on the left) downsamples the input and recovers the output from it by upsampling (encodes image into a denser vector and decodes it into a prediction). A U-Net-like encoder-decoder also does this, but injects spatial information on different scales while downsampling into the layers while upsampling (b). DeepLab tries to use the best of both of these approaches and performs Spatial Pyramid Pooling with intermediate shortcut injections of spatial context while upsampling.