There are many many deep learning models out there doing various things. Depending on the exact task they are solving, they may be constructed differently. Some will use convolution followed by pooling. Some will use several convolutional layers before there is any pooling layer. Some will use max-pooling. Some will use mean-pooling. Some will have a dropout added. Some will have a batch-norm layer here and there. Some will use sigmoid neurons, some will use half-recitfiers. Some will classify and therefore optimize for cross-entropy. Others will minimize mean-squared error. Some will use unpooling layers. Some will use deconvolutional layers. Some will use stochastic gradient descent with momentum. Some will use ADAM. Some will have RESNET layers, some will use Inception. The choices are plentiful (see e.g. here).
Reading any of these particular papers, one is faced with a set of choices the authors had made, followed by the evaluation on the dataset of their choice. The discussion of choices typically refers strongly to papers where given techniques were first introduced, whereas the results section typically discusses in detail the previous state of the art. The shape of the architecture is often broken down into obvious and non obvious decisions. … Read more...