Week 5

https://arxiv.org/pdf/1608.06993.pdf Article reviewed: Densely Connected CN

Summary: DenseNet connects each layer to every other layer in a feed-forward fashion, \(L(L+1)/2\) layers. Pros: 1. alleviate the vanishing- gradient, 2. strengthen feature propagation, 3. encourage feature reuse, 4. substantially reduce the number of parameters. 5. less computation to achieve higher performance.

Keywords: DenseNet, CIFAR-10, CIFAR-100

Five C’s:
Category: Deep Learning / DenseNet Context: Direct connections between any two layers with the same feature-map size. It demonstrated consistent improvement in accuracy for large parameters without performance degradation or overfilling. Correctness: ⭐️⭐️⭐️⭐️⭐️ Contributions: Good for various computer vision and art works. compared ResNet+ DenseNet Clarity: ⭐️⭐️⭐️

Outline:

1. Basic idea:

each layer forward connect to others, therefore carries info and preserve explicit additive identity transformation. Flow of information and gradient throughout the network

[[Screen Shot 2022-02-12 at 10.54.31.png]] CNN key idea: create short paths from early layers to later layers

3. DenseNets:

ResNets:

connect the previous layer+ skip-connection (non-linear trans with an identity) which gradient can flow. \(x_l=H_l(x_{l-1})+x_{l-1}\)

Traditional lth to (l+1)th \(x_l=H_l(x_{l-1})\)

Dense Connectivity:

the lth layer receives the feature map of all preceding layers, \(x_l=H_l([x_0,x1,...x_,{l-1}])\) ? what about the regular CNN

Composite function

\(H_l(.)\) : Batch Normalization, ReLu,3x3

Pooling Layer:

key : down-sampling layer change the size of feature maps BN(1x1) by average pooling layer (2x2)

Growth rate

lth layer has \(k_0+k(l-1)\) input feature-maps k0 input , k growth rate no need to repeat

Bottleneck Layers

1x1 convoluation before each 3x3 convolution to improve computational efficiency.

Loop (BN->ReLu->CV(1x1)->BN-ReLu->Cv(3x3)) ? why

Compression

m - feature maps in transition layer \([\theta m]\) output feature maps, \(0 < \theta <1 , \theta=1\), unchanged

Implementation Details

DenseNet-BC structure

4. Experiments

CIFAR : 32x32 pixel SVHN street view house humber : 32x32 ImageNet: 1.2m images

Training

? why Dense net is better

Discussion

DenseNet-BC is consist most parameter efficient variant of DenseNet ResNet vs DenseNet 1/3 parameters Loss function less complicated connectively patterns are similar

Feature Reuse
  1. Feature extracted very early layers carry on
  2. Weights remain related stable for transition layers
  3. transition layer output redundant features
  4. weights across the entire dense block, final classification layer produce high level features.