https://arxiv.org/pdf/1608.06993.pdf Article reviewed: Densely Connected CN
Summary: DenseNet connects each layer to every other layer in a feed-forward fashion, \(L(L+1)/2\) layers. Pros: 1. alleviate the vanishing- gradient, 2. strengthen feature propagation, 3. encourage feature reuse, 4. substantially reduce the number of parameters. 5. less computation to achieve higher performance.
Keywords: DenseNet, CIFAR-10, CIFAR-100
Five C’s:
Category: Deep Learning / DenseNet Context: Direct connections between any two layers with the same feature-map size. It demonstrated consistent improvement in accuracy for large parameters without performance degradation or overfilling. Correctness: ⭐️⭐️⭐️⭐️⭐️ Contributions: Good for various computer vision and art works. compared ResNet+ DenseNet Clarity: ⭐️⭐️⭐️
each layer forward connect to others, therefore carries info and preserve explicit additive identity transformation. Flow of information and gradient throughout the network
[[Screen Shot 2022-02-12 at 10.54.31.png]] CNN key idea: create short paths from early layers to later layers
connect the previous layer+ skip-connection (non-linear trans with an identity) which gradient can flow. \(x_l=H_l(x_{l-1})+x_{l-1}\)
Traditional lth to (l+1)th \(x_l=H_l(x_{l-1})\)
the lth layer receives the feature map of all preceding layers, \(x_l=H_l([x_0,x1,...x_,{l-1}])\) ? what about the regular CNN
\(H_l(.)\) : Batch Normalization, ReLu,3x3
key : down-sampling layer change the size of feature maps BN(1x1) by average pooling layer (2x2)
lth layer has \(k_0+k(l-1)\) input feature-maps k0 input , k growth rate no need to repeat
1x1 convoluation before each 3x3 convolution to improve computational efficiency.
Loop (BN->ReLu->CV(1x1)->BN-ReLu->Cv(3x3)) ? why
m - feature maps in transition layer \([\theta m]\) output feature maps, \(0 < \theta <1 , \theta=1\), unchanged
DenseNet-BC structure
CIFAR : 32x32 pixel SVHN street view house humber : 32x32 ImageNet: 1.2m images
? why Dense net is better
DenseNet-BC is consist most parameter efficient variant of DenseNet ResNet vs DenseNet 1/3 parameters Loss function less complicated connectively patterns are similar