Optimizing Deep Learning Models for Computer Vision

In the lessons so far, I've left various tips and tricks in paragraphs that are behind us. It takes time for some things to "click", and I personally tend to read a resource, do work in that field, and then re-read the resource again once some of the knowledge is really solidified. Usually, I find gems sprinkled around that just went over my head the first time I read it.
For those that don't have the luxury of re-reading multiple books from time to time, and for those who are new to the field and want to keep some guiding notes, here's a TL;DR of some of the advice that was given throughout the course.
Model Design
If you're designing your own network, here are a few key elements of performant architectures and notes I gathered while reading through design papers, playing around with the architecures and seeing many architectures being implemented by novices:
- Remember that designing networks isn't limited to PhD owners. Try your hand.
- Skip flattening layers, use global pooling (
GlobalAveragePooling2D()
,GlobalMaxPooling2D()
). - Use skip connections (shortcut connections).
- Use multiple layers with smaller filter sizes, rather than one layer with a larger filter size.
- For memory efficiency and smaller models (especially on CPU), use depthwise separable convolutions (
SeparableConv2D()
). - For training speed and devices with GPUs, use regular convolutions (
Conv2D
). - Dropout rates are usually good between
0.3
and0.5
. - When possible, re-use an existing network and tweak it (usually by adding a new input and top). The EfficientNet family is a great all-round network for most work. Alternatively, try ConvNeXt.