"Bag of Tricks for CNNs"
Although ResNets were outperformed by the Inception network (right up next) - they became a defining cornerstone of computer vision, and are quite commonly used even today despite there existing architectures with better parameter utilization, faster training, and higher accuracy. While they are being phased out, their performance is decent, and more importantly, tweaks were made to the original architecture that boosted the performance significantly.
These variations kept ResNets very relevant to this day, and architectures like ResNeXt (which stacks parallely, rather then sequentially, similar to Inception) and variations on ResNeXts have achieve state-of-the art results in 2020!
Some tweaks are pretty small - some are fairly large. Let's take a look at one of the tweaks from "Bag of Tricks for Image Classification with Convolutional Neural Networks" by Tong He et al. They outline several tricks for training CNNs, and aggregate their results for ResNets, Inception and MobileNet. Their baseline is, naturally, much the same as we've seen it so far - image augmentation (random horizontal flipping, hue/saturation, normalization, etc.), nesterov-accelerated SGD with a momentum of 0.9, a starting learning rate of 0.1, a factor of 0.1 on reducing the learning rate when the network plateaus and a batch size of 256.