At the end of the semester, a total score to which the corresponding final grade is assigned will be calculated from a weighted average of all scores according to the following weights: The conv layers should be using small filters e. There are three major sources of memory to keep track of: Instead of rolling your own architecture for a problem, you should look at whatever architecture currently works best on ImageNet, download a pretrained model and finetune it on your data. There are also several followup versions to the GoogLeNet, most recently Inception-v4. It features special skip connections and a heavy use of batch normalization. Why use stride of 1 in CONV? You are encouraged to work together on the homework, but you should write up your own solutions.


Case studies There are several architectures in the field of Convolutional Networks that have a name. The most common are: Computational Considerations The largest bottleneck to be aware of when constructing ConvNet architectures is the memory bottleneck. In some cases especially early in the ConvNet architectures , the amount of memory can build up very quickly with the rules of thumb presented above. In this arrangement, each neuron on the first CONV layer has a 3x3 view of the input volume. Most of these parameters are in the first fully connected layer, and it was since found that these FC layers can be removed with no performance downgrade, significantly reducing the number of necessary parameters. From the intermediate volume sizes: The architecture is also missing fully connected layers at the end of the network. In addition to the aforementioned benefit of keeping the spatial sizes constant after CONV, doing this actually improves performance. Hence, during the forward pass of a pooling layer it is common to keep track of the index of the max activation sometimes also called the switches so that gradient routing is efficient during backpropagation. You should rarely ever have to train a ConvNet from scratch or design one from scratch. Residual Network developed by Kaiming He et al. Intuitively, stacking CONV layers with tiny filters as opposed to having one CONV layer with big filters allows us to express more powerful features of the input, and with fewer parameters.


CS231n Winter 2016

