VGGNet - Deeeeeeeeeep Networks (2014)
LeNet5 had, well, 5 layers. AlexNet had 8 layers. AlexNet was deeper and larger - and it performed better in terms of accuracy. So, which role does depth play with ConvNets? Karen and Andrew at Google DeepMind and Oxford hopped on with their own take on this question, pushing the layer count to 16 and 19.
These were unprecedented depths! With only 8 layers, AlexNet was already sitting at 60M parameters and was difficult to train. How does someone go from there to 19? They fixed some of the hyperparameters in place, and decided to go for depth, using "very small" kernel sizes (3, 3
). One 7x7 convolutional layer can be replaced with three 3x3 layers, the sum of which introduce fewer trainable parameters than the one 7x7 layer. These smaller kernel sizes are what allowed this architecture to push the number of layers up to such a large number at the time.
In "Very Deep Convolutional Networks for Large-Scale Image Recognition", a deep model architecture is outlined. An input layer (224, 224, 3
), followed by a couple of convolutional layers, a max pooling layer, more convolutional layers, a max pooling layer... and so on, until a desired depth is achieved. This is followed by flattening and two fully connected layers, each with 4096
neurons (like AlexNet) and a classification layer with softmax. The paper includes an 11-layer, 13-layer, 16-layer and 19-layer schema, out of which, the 16 and 19-layer schemas are the most well known. They're oftentimes called VGG16 and VGG19. They competed in the 2014 ILSVRC challenge and were the runner up to the Inception architecture (covered later in this lesson).