"Like the crack of the whip I snap attack
Front to back in this thing called rap
Dig it like a shovel rhyme devil
On a heavenly level
Bang the bass turn up the treble
Radical mind day and night all the time
Seven to fourteen wise divine
Maniac brainiac winning the game
I’m the lyrical Jesse James"
(Snap, The Power)
SNAP is a simple tool to design neural networks based on ‘Tensorflow’/‘Keras’1 backend. It is a wrapper to easily configure deep neural networks for regression, classification and multi-label tasks with some tweaks and tricks (skip shortcuts2, embedding3, feature selection4 and anomaly detection5).
The first example of application is a case of regression. We use the Peak function from Mlbench package, as you can see below.
x.1 | x.2 | x.3 | x.4 | x.5 | x.6 | x.7 | x.8 | x.9 | x.10 | x.11 | x.12 | x.13 | x.14 | x.15 | x.16 | x.17 | x.18 | x.19 | x.20 | y |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.25 | 0.09 | -1.71 | -0.06 | 0.43 | 0.30 | 1.08 | 0.05 | -0.05 | -0.03 | -0.58 | 0.74 | -1.16 | 0.33 | 0.31 | 0.26 | -0.17 | 0.42 | 0.82 | -0.99 | 0.31 |
-0.35 | 0.38 | 0.12 | -0.52 | 0.37 | 0.66 | 0.30 | -0.48 | -0.30 | 0.85 | -0.32 | -0.89 | 0.27 | -0.07 | -0.09 | 0.25 | -1.29 | 0.77 | 0.02 | 0.67 | 1.24 |
0.00 | -0.07 | 0.00 | 0.04 | 0.08 | -0.02 | -0.08 | 0.16 | 0.06 | 0.01 | 0.02 | 0.02 | 0.08 | 0.04 | -0.05 | -0.04 | 0.01 | -0.07 | 0.20 | -0.13 | 23.54 |
0.13 | 0.21 | 0.03 | -0.05 | 0.30 | 0.23 | -0.30 | -0.18 | -0.40 | -0.12 | 0.44 | 0.08 | -0.67 | -0.88 | 0.29 | 0.16 | -0.54 | 0.12 | -0.16 | -0.32 | 7.23 |
0.09 | 0.02 | -0.55 | -0.28 | 0.16 | -0.29 | 0.93 | -0.43 | 0.24 | -0.10 | 0.69 | 0.40 | -0.89 | -0.04 | 0.46 | 0.41 | 0.16 | 0.30 | -0.76 | -0.73 | 2.44 |
0.16 | -0.20 | -0.14 | 0.06 | 0.08 | 0.12 | -0.01 | -0.13 | -0.01 | -0.07 | -0.10 | -0.01 | -0.04 | 0.05 | -0.06 | 0.33 | -0.03 | -0.10 | -0.04 | -0.24 | 21.21 |
0.63 | -0.33 | 0.48 | -0.40 | -0.02 | 0.03 | -1.45 | -0.30 | -0.26 | -0.37 | -0.19 | -0.05 | 0.36 | 0.18 | -0.05 | -0.87 | -0.48 | -0.47 | -0.21 | 0.80 | 1.69 |
-0.46 | -0.26 | 0.04 | 0.09 | 0.07 | -0.23 | 0.08 | 0.11 | 0.48 | -0.11 | -0.27 | -0.03 | -0.14 | 0.31 | 0.68 | 0.05 | -0.02 | -0.22 | -0.08 | 0.00 | 13.00 |
-0.01 | -0.17 | 0.10 | -0.25 | -0.04 | -0.06 | 0.26 | 0.08 | -0.18 | -0.12 | -0.19 | 0.20 | 0.31 | 0.11 | -0.02 | 0.04 | 0.20 | 0.09 | -0.01 | 0.01 | 19.81 |
0.01 | 0.06 | 0.01 | 0.05 | 0.05 | -0.06 | 0.02 | -0.09 | -0.03 | -0.08 | -0.03 | -0.07 | -0.01 | -0.01 | 0.01 | -0.01 | -0.03 | -0.03 | -0.06 | -0.04 | 24.46 |
With SNAP you can easily design various network configurations, as the one below.
Example n. 1
The first step is the filtering of features and anomalies. The filtering variables allow for selecting relevant features and instances, excluding noisy fields and misleading outliers. The imp_thresh
variable decides the minimum feature importance as inclusion threshold (setting the value to 0.4 means that only the feature with value above 0.4, as measured by the empirical distribution of importance scores, will be included in the working set; the default value is 0, including all features). The anom_thresh
variable decides the maximum anomaly value for inclusion, as measured by the empirical distribution of dbscan lof (setting the value to 0.8 means that only the instances with value below 0.8 will be included in the working set; the default value is 1, including all instances).
The second step is a neural network configured with the parameters below (this is not necessarily the best configuration, it’s only an example). With SNAP you can easily design different kind of neural networks with a single line of code.
library(snap)
<- snap(data=benchmark1, target="y", task = "regr", layers = 2, nodes = c(64, 32), activations = c("tanh", "sigmoid"), regularization_L1 = c(0, 0), regularization_L2 = c(0, 0), dropout = c(0.7, 0.4), optimizer = "rmsprop", imp_thresh = 0.4, anom_thresh = 0.8, reps = 5, folds = 2, normalization = F)
example1 : 9.25 sec elapsed snap
The cross-validation is a two-fold scheme repeated five times, as depicted in reps
and folds
variable.
$trials
example1$trial_train_metrics
folds reps rmse mae mdae mape rrse rae prsn1 fold_1 rep_1 11.1223 8.6081 5.1666 1.7075 1.2613 1.0785 -0.0549
2 fold_2 rep_1 7.7973 6.2620 5.2499 3.2403 1.0469 0.9863 -0.0492
3 fold_1 rep_2 8.9439 7.6183 7.5325 3.8540 1.0547 1.0239 -0.0770
4 fold_2 rep_2 9.1078 7.5260 7.4695 4.4157 1.1240 1.0513 -0.0274
5 fold_1 rep_3 9.6969 8.3560 8.5024 4.9401 1.1981 1.1496 -0.1269
6 fold_2 rep_3 9.5654 8.0902 8.8777 5.2366 1.1357 1.1104 0.0097
7 fold_1 rep_4 9.8447 8.2428 7.9454 3.8420 1.1497 1.0723 -0.0051
8 fold_2 rep_4 9.4805 8.2230 8.9688 6.3085 1.2071 1.2210 -0.0756
9 fold_1 rep_5 9.5626 7.9044 8.0634 4.8664 1.1744 1.1278 -0.0233
10 fold_2 rep_5 9.8658 8.5781 8.7204 5.3642 1.1745 1.1378 -0.0818
$trial_val_metrics
folds reps rmse mae mdae mape rrse rae prsn1 fold_1 rep_1 8.6489 6.4152 3.4774 1.9627 1.1613 1.0104 -0.0483
2 fold_2 rep_1 9.7713 8.1149 6.2668 2.8969 1.1081 1.0167 -0.0150
3 fold_1 rep_2 8.7323 7.3657 7.6805 4.3962 1.0777 1.0289 -0.0348
4 fold_2 rep_2 9.1766 7.8225 8.1604 4.0927 1.0821 1.0513 -0.0555
5 fold_1 rep_3 9.5274 8.0984 8.7649 5.3236 1.1312 1.1115 0.0102
6 fold_2 rep_3 9.7352 8.3737 8.3359 4.8916 1.2028 1.1521 -0.1338
7 fold_1 rep_4 9.4571 8.2019 8.9644 6.2933 1.2041 1.2179 -0.0736
8 fold_2 rep_4 9.8482 8.2473 7.9405 3.8550 1.1501 1.0729 -0.0050
9 fold_1 rep_5 9.8222 8.5498 8.7522 5.3491 1.1693 1.1340 -0.0765
10 fold_2 rep_5 9.5909 7.9484 8.2310 4.9010 1.1779 1.1341 -0.0232
The selected features and instances are reported within the list of results:
$selected_feat
example11] "x.1" "x.2" "x.4" "x.5" "x.6" "x.7" "x.9" "x.11" "x.13" "x.14"
[11] "x.18" "x.19" "x.20"
[$selected_inst
example11] 1 2 3 4 5 6 7 10 11 12 13 14 15 16 17 19 20 21
[19] 22 23 25 26 27 28 29 31 32 34 35 36 37 38 39 40 41 43
[37] 44 45 46 47 48 49 50 51 52 53 54 55 56 57 59 60 62 63
[55] 64 65 66 70 73 75 76 77 78 79 80 83 84 85 86 87 89 91
[73] 93 94 95 96 97 98 99 100 101 102 105 106 107 109 110 111 112 113
[91] 115 116 117 118 119 120 121 122 123 124 125 126 128 129 130 131 132 133
[109] 134 135 136 137 139 140 142 143 144 145 146 148 149 152 153 154 156 158
[127] 160 161 162 163 164 165 166 168 169 170 171 172 173 174 175 176 177 178
[145] 181 182 183 184 185 186 189 190 192 195 196 197 198 199 200 202 203 204
[163] 205 206 207 209 210 211 212 213 215 216 217 218 223 224 225 226 227 228
[181] 229 230 231 232 233 234 236 237 239 240 241 242 243 244 246 247 249 250
[199] 251 252 253 254 256 258 260 261 263 264 265 266 267 268 269 270 271 272
[217] 273 277 278 279 280 281 283 284 285 286 287 288 289 290 291 292 293 294
[235] 295 296 297 298 299 300 [
SNAP creates a prediction function that can be directly applied on the same data scheme to predict new values.
$pred_fun(benchmark1[1:10,-21])
example1
predicted_y1 14.108350
2 14.184744
3 12.046779
4 7.926267
5 11.787643
6 10.038385
7 1.609311
8 11.890808
9 13.951631
10 12.714508
SNAP can be used also to easily manage classification tasks. To demonstrate a case, we use the Waveform function from Mlbench package, as you can see below.
x.1 | x.2 | x.3 | x.4 | x.5 | x.6 | x.7 | x.8 | x.9 | x.10 | x.11 | x.12 | x.13 | x.14 | x.15 | x.16 | x.17 | x.18 | x.19 | x.20 | x.21 | classes |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
-2.04 | 0.07 | -0.91 | 0.16 | 0.03 | 1.27 | 0.27 | 1.74 | -0.83 | 2.46 | 2.03 | 2.41 | 3.94 | 6.14 | 5.47 | 4.33 | 3.00 | 2.18 | 2.48 | 1.53 | 0.20 | 3 |
1.49 | 2.35 | 2.15 | 1.92 | 3.26 | 4.09 | 5.36 | 5.85 | 3.04 | 3.71 | 2.44 | 3.64 | 0.77 | 0.48 | -1.54 | 0.37 | 0.32 | -0.58 | -0.64 | 1.99 | 1.33 | 2 |
-1.06 | 1.76 | 3.28 | 2.35 | 4.77 | 4.56 | 5.05 | 3.52 | 3.58 | 3.45 | 3.57 | 1.97 | 0.49 | 1.25 | -0.32 | 0.28 | 0.10 | -0.02 | -0.86 | 1.55 | -0.37 | 2 |
0.94 | -1.35 | -0.11 | 1.26 | 1.02 | 1.75 | 3.57 | 3.74 | 3.87 | 4.50 | 6.60 | 2.95 | 4.85 | 4.46 | 2.06 | 4.02 | -0.56 | 1.06 | 1.96 | -0.84 | -1.71 | 3 |
0.53 | 0.14 | 0.21 | -0.68 | 0.95 | -0.60 | 1.18 | 1.63 | 3.15 | 3.46 | 6.44 | 2.65 | 3.80 | 3.91 | 3.90 | 3.30 | 2.73 | 0.70 | 1.44 | 1.15 | 0.06 | 3 |
-0.90 | -0.69 | -0.11 | 3.41 | 2.47 | 6.51 | 7.71 | 4.00 | 1.98 | 4.83 | 3.08 | 2.48 | 0.55 | 0.53 | 0.58 | 0.12 | 2.06 | 0.87 | -0.35 | -0.15 | 0.79 | 2 |
-0.25 | -1.64 | 0.72 | -0.21 | 2.13 | 0.39 | 5.00 | 2.90 | 3.14 | 3.77 | 4.79 | 3.64 | 2.27 | 2.96 | -0.10 | 0.68 | -0.10 | 0.23 | -0.20 | 0.27 | -0.44 | 2 |
1.18 | 0.32 | -0.71 | 0.15 | 1.01 | 2.34 | -0.29 | -0.56 | 0.48 | 2.84 | 4.35 | 2.72 | 4.89 | 6.25 | 5.93 | 4.38 | 3.00 | 1.64 | 2.00 | -0.20 | -0.01 | 3 |
1.06 | -0.57 | 0.84 | 0.24 | -2.06 | -0.73 | 1.99 | 2.28 | 1.20 | 2.64 | 3.14 | 3.02 | 3.92 | 3.46 | 4.03 | 5.70 | 3.00 | 0.54 | 0.38 | 1.19 | -1.90 | 3 |
0.00 | 2.00 | 3.79 | 4.25 | 3.22 | 4.42 | 3.94 | 5.68 | 5.25 | 3.56 | 0.73 | 1.44 | 0.75 | 0.89 | 1.40 | 0.46 | -0.16 | 1.41 | 0.03 | 0.54 | -0.23 | 2 |
Besides setting task
to “classif”, this time we are also trying to use a residual network. To mitigate the vanishing gradient problem when you have deep neural network with many layers
you have to set skip_shortcut
: setting the variable to TRUE automatically add a summation link between the input layer and the second-last layer before the output.
Here we propose a brief depiction:
Example n. 2
As for the previous one, this is an example of hyper-parameter configuration (and absolutely not the best possible configuration). Translated in code, you can see below.
<- snap(data=benchmark2, target="classes", task = "classif", layers = 4, activations = c("mish", "parametric_relu", "bent", "softplus"), nodes = c(1024, 96, 1024, 512), regularization_L1 = c(30, 50, 10, 40), regularization_L2 = c(80, 30, 60, 80), dropout = c(0.5, 0.2, 0.2, 0.2), imp_thresh = 0.66, anom_thresh = 0.95, optimizer = "adam", skip_shortcut = T)
example2 : 11.25 sec elapsed snap
You can visualize the configuration in Keras’s style. As you can see, after each layer, this time the result is normalized (we haven’t change the default value of normalization
).
$configuration
example2
layers activations regularization_L11 4 mish, parametric_relu, bent, softplus 30, 50, 10, 40
regularization_L2 nodes dropout1 80, 30, 60, 80 1024, 96, 1024, 512 0.5, 0.2, 0.2, 0.2
$model
example2
Model: "functional_3"
Model
________________________________________________________________________________Layer (type) Output Shape Param # Connected to
================================================================================
input_2 (InputLayer) [(None, 8)] 0
________________________________________________________________________________dense_3 (Dense) (None, 1024) 9216 input_2[0][0]
________________________________________________________________________________activation_2 (Activation) (None, 1024) 0 dense_3[0][0]
________________________________________________________________________________dropout_2 (Dropout) (None, 1024) 0 activation_2[0][0]
________________________________________________________________________________batch_normalization (Batc (None, 1024) 4096 dropout_2[0][0]
________________________________________________________________________________dense_4 (Dense) (None, 96) 98400 batch_normalization[0][0]
________________________________________________________________________________p_re_lu (PReLU) (None, 96) 96 dense_4[0][0]
________________________________________________________________________________dropout_3 (Dropout) (None, 96) 0 p_re_lu[0][0]
________________________________________________________________________________batch_normalization_1 (Ba (None, 96) 384 dropout_3[0][0]
________________________________________________________________________________dense_5 (Dense) (None, 1024) 99328 batch_normalization_1[0][0]
________________________________________________________________________________activation_3 (Activation) (None, 1024) 0 dense_5[0][0]
________________________________________________________________________________dropout_4 (Dropout) (None, 1024) 0 activation_3[0][0]
________________________________________________________________________________batch_normalization_2 (Ba (None, 1024) 4096 dropout_4[0][0]
________________________________________________________________________________dense_6 (Dense) (None, 512) 524800 batch_normalization_2[0][0]
________________________________________________________________________________activation_4 (Activation) (None, 512) 0 dense_6[0][0]
________________________________________________________________________________dropout_5 (Dropout) (None, 512) 0 activation_4[0][0]
________________________________________________________________________________batch_normalization_3 (Ba (None, 512) 2048 dropout_5[0][0]
________________________________________________________________________________dense_7 (Dense) (None, 8) 4104 batch_normalization_3[0][0]
________________________________________________________________________________add (Add) (None, 8) 0 dense_7[0][0]
0][0]
input_2[
________________________________________________________________________________dense_8 (Dense) (None, 3) 27 add[0][0]
================================================================================
: 746,595
Total params: 741,283
Trainable params-trainable params: 5,312
Non ________________________________________________________________________________
For classification tasks, a different set of metrics are used for training and validation (and you can sharp your focus on a specific class indicating the class string in positive
):
$metrics
example2
bac prc sen csi fsc kpp kdl0.5275 0.3699 0.3697 0.2266 0.3663 0.0557 0.0207
train 0.5264 0.3586 0.3676 0.2235 0.3571 0.0539 -0.0411
valid 0.4999 0.3463 0.3336 0.2044 0.3374 -0.0004 0.0681 test
When you have multiple class features, each one with multiple label set, you have a multilabel
task. The example3 comes from the yeast function from mldr.datsets: 2417 rows x 117 columns (too much to effectively have a snapshot table here).
This time we design a network with a single layer of 1024 nodes and mish activation (neither regularization or droput or normalization). The picture here:
Example n. 3
In the multilabel setting you cannot filter for feature importance (any imp_thresh
value above 0 is set to zero with a returning message), but you can still sift the instances with anom_thresh
.
<- snap(data=benchmark3, target=paste0("Class", 1:14), task = "multilabel", layer = 1, activations = "mish", nodes = 1024, anom_thresh = 0.9, normalization = F)
example3 : 51.13 sec elapsed snap
As in the previous examples, the result list includes also the standard history plot from Keras. Here you can see the plot for example3 (with early stop using span
and min_delta
to manage the patience parameter).
$plot example3
This time, when you use the prediction function, you get a whole set of label features:
$pred_fun(benchmark3[1:10,-c(104:117)])
example3
predicted_Class1 predicted_Class2 predicted_Class3 predicted_Class41 0 0 0 0
2 0 0 1 1
3 0 1 0 0
4 0 0 1 1
5 0 1 1 0
6 0 0 0 0
7 1 0 0 0
8 0 0 1 1
9 0 0 0 0
10 0 0 1 1
predicted_Class5 predicted_Class6 predicted_Class7 predicted_Class81 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
5 0 0 0 0
6 1 0 0 0
7 0 0 0 0
8 0 0 0 0
9 0 0 0 0
10 0 0 0 0
predicted_Class9 predicted_Class10 predicted_Class11 predicted_Class121 0 0 0 1
2 0 0 0 0
3 0 0 0 1
4 0 0 0 1
5 0 0 0 1
6 0 0 0 1
7 0 0 0 1
8 0 0 0 1
9 0 0 0 1
10 0 0 0 1
predicted_Class13 predicted_Class141 1 0
2 0 0
3 1 0
4 1 0
5 1 0
6 1 0
7 1 0
8 1 0
9 1 0
10 1 0
All you need to know about the current implementation of Tensorflow/Keras in R: https://keras.rstudio.com↩︎
If you need a concise overview on residual networks, you can take a look here: https://en.wikipedia.org/wiki/Residual_neural_network↩︎
In a future version of this article, I will add some examples on how to use global and local embeddings.↩︎
For feature selection we use CORElearn’s RReliefFbestK (as of the two different version presented in the package for numeric and factor variables). If you want a better understanding on the specific Relief metric, you can take a look here: https://www.rdocumentation.org/packages/CORElearn/versions/1.56.0/topics/attrEval↩︎
For the anomaly detection, we use the standard lof function from the dbscan package. Here’s a link to the documentation: https://rdrr.io/cran/dbscan/man/lof.html↩︎