Article Reviewed: Very Deep Convolutional Networks for Large scale Image Recognition https://arxiv.org/pdf/1409.1556.pdf
a deep evaluation of CNN with 3x3 filters, it demonstrates a significant improvement on prior-art configurations by 16-19 weight layers. The best performing ConvNet models.
Questions: 1. what is smallest side of an isotropically-rescaled training image
📍Keywords: ImageNet, VGG, Depth, Localization ConvNet
Context: besides the smaller receptive window size, smaller stride of the first cnn layers. Improve network densely training and testing the whole image and over multiple scales. This paper focus on the architecture design- Depth. steadily increase the depth by adding more CNN layers.
Correctness: ****
Contribution: The best model in Larege single-network classification accuracy. Small size convolution filters with deep ConvNets. - the depth is beneficial for classification - Example of state of the art performance - generalize well to a wide range of tasks and datasets
Clarity: ****
the generic layout, and detailed specific configuration used in evaluation.
The input ConvNets is 224x224 RGB image.
pre-processing: substract RGB value from each pixel
cnn: 3x3 (center, left/right, up/down)
cnn 1x1 conv: linear transformation
carried out by 5 max-pooling layers (2x2 window, with stride 2).
Fully-Connected layers -> soft-max layer.
retification ReLu non-linearity and Local Response Normalization(no
need).
reach 512 layers,a 7x7 layer or a stack of three 3x3 conv? \(27C^2\) weights or \(49C^2\) parameters
11/13/16/19 layers 3x64/128/256/512 maxpool/fc4096/softmax
training: multi-scale training images, mini-batch gradient descent(256) with momentum 0.9 converge epoch is small:
Single scale training S=256/384
Multi-scale training [Smin, Smax]= [256,512]
Dataset: ILSVRC-2012 more than 1000 classes (training, testing, validation) Top1-top5 error
Single Scale Evaluation: Q =0.5(Smin+ Smax)
Multi-scale Evaluation: Q= {S-32, S+32}
Multi- Crop Evaluation: dense- multicrop - multicrop&dense
ConvNet Fusion: combine the outputs of several models by averaging their soft-max class posteriors. The best work so far.
Comparison with the state of are Deep Convnet outperform all previous models, even some models are joint models.