Defining a Convolutional Neural Network and Training
Model Definition
So - we've been acquainted with the building blocks of CNNs in the previous lesson. How do we put them into good use and how do you choose which layers to put where? There are some conventions to follow, though, all of them can be tweaked or changed depending on your specific dataset, and the best way to find a good architecture for your own model is to experiment.
You'll also find conflicting information, such as, whether the filter size should be small or large in the beginning. Some argue that filter sizes should start out large to prune away the unimportant data and reduce computational costs, while the images are still on the larger end, making the models focus on the more salient features from the get-go. Some argue that smaller filter sizes should be used to capture as much local information as possible and then make them larger to capture more global features (following the idea of the visual cortex hierarchy). Some keep the filter sizes the same throughout the entire architecture!
So, what's the right approach when it comes to filter sizes? Generally, I've personally found that starting out with smaller filter sizes and keeping them the same size after pooling works well, since the same filter size is then proportionally bigger in the next convolutional block, capturing more global patterns. Though, the best way to find filter sizes is to experiment.