AlexNet - Proving CNNs Can Do It (2012)
AlexNet, written by Alex Krizhevsky, Ilya Sutskever and Geoffrey Hinton, was released in 2012. At the time of writing, it's been a full decade since its release! It was successor to LeNet5 and competed in the 2012 ILSVRC challenge, beating the rest of the competitors by more than 10 percentage points in the top-5 error rate! While LeNet5 used a single convolution block, followed by average pooling, AlexNet used multiple stacked convolution layers. They highlighted how non-saturating relu
helps train faster and produces more accurate networks than saturating tanh
, after which relu
has been used extensively.
This depth of the network was essential to the performance, at the cost of longer training with more parameters. It starts out with a fairly large kernel size (11, 11
) and stride size (4, 4
), and ends up with a much more common (3, 3
) kernel size with a much smaller stride. The second convolutional block takes a normalized and pooled representation of the first, so we'll add a MaxPooling2D
and BatchNormalization
in between them.
The third, fourth and fifth convolutional layers are stacked on top of each other without any normalization or pooling. Finally, the maps are flattened and a dense classifier on top, with large dropouts (0.5) sprinkled in, is used. Since it was written for ImageNet - it has 1000 output classes, but for our dataset, we'll use an output of 10 classes.