To work with vector data of a manageable size, keep only the top 10,000 most frequently occurring words in the training data. Rare words will be discarded.
The variables train_data and test_data are lists of reviews; each review is a list of word indices (encoding a sequence of words). train_labels and test_labels are lists of 0s and 1s, where 0 stands for a negative review and 1 stands for positive one.
Note: The multi-assignment operator, %<-%, is automatically available whenever the R Keras package is loaded.
library(keras)
imdb <- dataset_imdb(num_words = 10000)
c(c(train_data, train_labels), c(test_data, test_labels)) %<-% imdb
Here’s how to decode a review back to English words.
# word_index is a named list mapping words to an integer index
word_index <- dataset_imdb_word_index()
# Reverses it, mapping integer indices to words
reverse_word_index <- names(word_index)
names(reverse_word_index) <- word_index
# Decodes the 1st review. Note that the indices are offset by 3 because 0, 1, and 2 are reserved indices for "padding," "start of sequence," and "unknown."
decoded_review <- sapply(train_data[[1]], function(index) {
word <- if (index >= 3) reverse_word_index[[as.character(index - 3)]]
if (!is.null(word)) word else "?"
})
paste(decoded_review, collapse = " ")
[1] "? this film was just brilliant casting location scenery story direction everyone's really suited the part they played and you could just imagine being there robert ? is an amazing actor and now the same being director ? father came from the same scottish island as myself so i loved the fact there was a real connection with this film the witty remarks throughout the film were great it was just brilliant so much that i bought the film as soon as it was released for ? and would recommend it to everyone to watch and the fly fishing was amazing really cried at the end it was so sad and you know what they say if you cry at a film it must have been good and this definitely was also ? to the two little boy's that played the ? of norman and paul they were just brilliant children are often left out of the ? list i think because the stars that play them all grown up are such a big profile for the whole film but these children are amazing and should be praised for what they have done don't you think the whole story was so lovely because it was true and was someone's life after all that was shared with us all"
Since you can’t feed lists of integers into a neural net, here we convert the training data into the necessary tensor format by converting from a list to a matrix (or rows of 1D vectors).
Each row of the matrix contains 10,000 columns representing the all possible words in a review. They are then one-hot encoded (also known as categorical encoding) so that either a 0 or 1 is present indicating if that words has been used in the review. This is done for each of the 25,000 reviews creating a sparse matrix of 250,000,000 1s and 0s (2GB of data). This is also done for the testing data, which is of the same size.
vectorize_sequences <- function(sequences, dimension = 10000) {
# Initialize a matrix with all zeroes
results <- matrix(0, nrow = length(sequences), ncol = dimension)
# Replace 0 with a 1 for each column of the matrix given in the list
for (i in 1:length(sequences))
results[i, sequences[[i]]] <- 1
results
}
x_train <- vectorize_sequences(train_data)
x_test <- vectorize_sequences(test_data)
str(x_train[1,])
num [1:10000] 1 1 0 1 1 1 1 1 1 0 ...
Also convert labels from integer to numeric.
y_train <- as.numeric(train_labels)
y_test <- as.numeric(test_labels)
Now the data is ready to be fed into a neural network.
A type of network that performs well on this type of vector data is a simple stack of fully connected (dense) layers with relu1 activations.
There are two key architecture decisions to be made about such a stack of dense layers:
Having more hidden units (a higher-dimensional representation space) allows your network to learn more-complex representations, but it makes the network more computationally expensive and may lead to learning unwanted patterns (patterns that will improve performance on the training data but not on the test data).
The model chosen has 2 intermediate dense layer with 16 hidden units each and a 3rd (sigmoid)2 layer that will output the scalar prediction regarding the sentiment of the current review.
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Because this is a binary classification problem and the output of the network is a probability, it’s best to use the binary_crossentropy loss.3
model %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
Optimizer parameters can be configured by passing an optimizer instance as the optimizer argument.
model %>% compile(
optimizer = optimizer_rmsprop(learning_rate=0.001),
loss = "binary_crossentropy",
metrics = c("accuracy")
)
Custom loss or metric functions can also be passed the respective arguments.
model %>% compile(
optimizer = optimizer_rmsprop(learning_rate = 0.001),
loss = loss_binary_crossentropy,
metrics = metric_binary_accuracy
)
Create a validation set by setting apart 10,000 samples from the original training data.
val_indices <- 1:10000
x_val <- x_train[val_indices,]
partial_x_train <- x_train[-val_indices,]
y_val <- y_train[val_indices]
partial_y_train <- y_train[-val_indices]
Train the model for 20 epochs (20 iterations over all samples in the x_train and y_train tensors), in mini-batches of 512 samples. At the same time, monitor loss and accuracy on the 10,000 samples that were set apart in the validation step.
history <- model %>% fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val)
)
Epoch 1/20
1/30 [>.............................] - ETA: 7s - loss: 0.6938 - accuracy: 0.4863
6/30 [=====>........................] - ETA: 0s - loss: 0.6568 - accuracy: 0.6279
12/30 [===========>..................] - ETA: 0s - loss: 0.6034 - accuracy: 0.7085
18/30 [=================>............] - ETA: 0s - loss: 0.5637 - accuracy: 0.7472
27/30 [==========================>...] - ETA: 0s - loss: 0.5198 - accuracy: 0.7780
30/30 [==============================] - 0s 8ms/step - loss: 0.5095 - accuracy: 0.7856
30/30 [==============================] - 1s 16ms/step - loss: 0.5095 - accuracy: 0.7856 - val_loss: 0.3816 - val_accuracy: 0.8704
Epoch 2/20
1/30 [>.............................] - ETA: 0s - loss: 0.3508 - accuracy: 0.8926
9/30 [========>.....................] - ETA: 0s - loss: 0.3363 - accuracy: 0.8941
16/30 [===============>..............] - ETA: 0s - loss: 0.3194 - accuracy: 0.8997
25/30 [========================>.....] - ETA: 0s - loss: 0.3060 - accuracy: 0.9027
30/30 [==============================] - 0s 6ms/step - loss: 0.3030 - accuracy: 0.9013
30/30 [==============================] - 0s 9ms/step - loss: 0.3030 - accuracy: 0.9013 - val_loss: 0.3207 - val_accuracy: 0.8752
Epoch 3/20
1/30 [>.............................] - ETA: 0s - loss: 0.2202 - accuracy: 0.9355
11/30 [==========>...................] - ETA: 0s - loss: 0.2210 - accuracy: 0.9331
20/30 [===================>..........] - ETA: 0s - loss: 0.2244 - accuracy: 0.9267
29/30 [============================>.] - ETA: 0s - loss: 0.2206 - accuracy: 0.9269
30/30 [==============================] - 0s 6ms/step - loss: 0.2206 - accuracy: 0.9269
30/30 [==============================] - 0s 8ms/step - loss: 0.2206 - accuracy: 0.9269 - val_loss: 0.2839 - val_accuracy: 0.8879
Epoch 4/20
1/30 [>.............................] - ETA: 0s - loss: 0.1642 - accuracy: 0.9492
11/30 [==========>...................] - ETA: 0s - loss: 0.1706 - accuracy: 0.9492
20/30 [===================>..........] - ETA: 0s - loss: 0.1748 - accuracy: 0.9442
30/30 [==============================] - 0s 5ms/step - loss: 0.1735 - accuracy: 0.9431
30/30 [==============================] - 0s 8ms/step - loss: 0.1735 - accuracy: 0.9431 - val_loss: 0.2745 - val_accuracy: 0.8916
Epoch 5/20
1/30 [>.............................] - ETA: 0s - loss: 0.1485 - accuracy: 0.9512
10/30 [=========>....................] - ETA: 0s - loss: 0.1481 - accuracy: 0.9535
21/30 [====================>.........] - ETA: 0s - loss: 0.1415 - accuracy: 0.9558
30/30 [==============================] - 0s 5ms/step - loss: 0.1423 - accuracy: 0.9547
30/30 [==============================] - 0s 7ms/step - loss: 0.1423 - accuracy: 0.9547 - val_loss: 0.2871 - val_accuracy: 0.8875
Epoch 6/20
1/30 [>.............................] - ETA: 0s - loss: 0.1218 - accuracy: 0.9746
11/30 [==========>...................] - ETA: 0s - loss: 0.1108 - accuracy: 0.9711
22/30 [=====================>........] - ETA: 0s - loss: 0.1137 - accuracy: 0.9677
30/30 [==============================] - 0s 5ms/step - loss: 0.1160 - accuracy: 0.9652
30/30 [==============================] - 0s 7ms/step - loss: 0.1160 - accuracy: 0.9652 - val_loss: 0.2940 - val_accuracy: 0.8863
Epoch 7/20
1/30 [>.............................] - ETA: 0s - loss: 0.0742 - accuracy: 0.9883
10/30 [=========>....................] - ETA: 0s - loss: 0.0875 - accuracy: 0.9783
20/30 [===================>..........] - ETA: 0s - loss: 0.0913 - accuracy: 0.9748
28/30 [===========================>..] - ETA: 0s - loss: 0.0961 - accuracy: 0.9714
30/30 [==============================] - 0s 6ms/step - loss: 0.0973 - accuracy: 0.9703
30/30 [==============================] - 0s 9ms/step - loss: 0.0973 - accuracy: 0.9703 - val_loss: 0.3214 - val_accuracy: 0.8787
Epoch 8/20
1/30 [>.............................] - ETA: 0s - loss: 0.0842 - accuracy: 0.9766
11/30 [==========>...................] - ETA: 0s - loss: 0.0723 - accuracy: 0.9840
21/30 [====================>.........] - ETA: 0s - loss: 0.0728 - accuracy: 0.9821
26/30 [=========================>....] - ETA: 0s - loss: 0.0784 - accuracy: 0.9787
30/30 [==============================] - 0s 7ms/step - loss: 0.0777 - accuracy: 0.9789
30/30 [==============================] - 0s 10ms/step - loss: 0.0777 - accuracy: 0.9789 - val_loss: 0.3349 - val_accuracy: 0.8802
Epoch 9/20
1/30 [>.............................] - ETA: 0s - loss: 0.0502 - accuracy: 0.9863
11/30 [==========>...................] - ETA: 0s - loss: 0.0551 - accuracy: 0.9881
21/30 [====================>.........] - ETA: 0s - loss: 0.0637 - accuracy: 0.9826
30/30 [==============================] - 0s 5ms/step - loss: 0.0634 - accuracy: 0.9829
30/30 [==============================] - 0s 8ms/step - loss: 0.0634 - accuracy: 0.9829 - val_loss: 0.3796 - val_accuracy: 0.8692
Epoch 10/20
1/30 [>.............................] - ETA: 0s - loss: 0.0482 - accuracy: 0.9902
10/30 [=========>....................] - ETA: 0s - loss: 0.0496 - accuracy: 0.9879
19/30 [==================>...........] - ETA: 0s - loss: 0.0526 - accuracy: 0.9860
29/30 [============================>.] - ETA: 0s - loss: 0.0527 - accuracy: 0.9863
30/30 [==============================] - 0s 5ms/step - loss: 0.0525 - accuracy: 0.9863
30/30 [==============================] - 0s 8ms/step - loss: 0.0525 - accuracy: 0.9863 - val_loss: 0.3840 - val_accuracy: 0.8751
Epoch 11/20
1/30 [>.............................] - ETA: 0s - loss: 0.0377 - accuracy: 0.9941
11/30 [==========>...................] - ETA: 0s - loss: 0.0377 - accuracy: 0.9925
20/30 [===================>..........] - ETA: 0s - loss: 0.0427 - accuracy: 0.9897
30/30 [==============================] - 0s 6ms/step - loss: 0.0416 - accuracy: 0.9904
30/30 [==============================] - 0s 8ms/step - loss: 0.0416 - accuracy: 0.9904 - val_loss: 0.4203 - val_accuracy: 0.8758
Epoch 12/20
1/30 [>.............................] - ETA: 0s - loss: 0.0348 - accuracy: 0.9941
10/30 [=========>....................] - ETA: 0s - loss: 0.0251 - accuracy: 0.9980
20/30 [===================>..........] - ETA: 0s - loss: 0.0314 - accuracy: 0.9945
29/30 [============================>.] - ETA: 0s - loss: 0.0342 - accuracy: 0.9931
30/30 [==============================] - 0s 6ms/step - loss: 0.0343 - accuracy: 0.9931
30/30 [==============================] - 0s 8ms/step - loss: 0.0343 - accuracy: 0.9931 - val_loss: 0.4352 - val_accuracy: 0.8755
Epoch 13/20
1/30 [>.............................] - ETA: 0s - loss: 0.0222 - accuracy: 0.9941
11/30 [==========>...................] - ETA: 0s - loss: 0.0215 - accuracy: 0.9968
21/30 [====================>.........] - ETA: 0s - loss: 0.0224 - accuracy: 0.9965
30/30 [==============================] - 0s 5ms/step - loss: 0.0262 - accuracy: 0.9945
30/30 [==============================] - 0s 7ms/step - loss: 0.0262 - accuracy: 0.9945 - val_loss: 0.4682 - val_accuracy: 0.8749
Epoch 14/20
1/30 [>.............................] - ETA: 0s - loss: 0.0131 - accuracy: 1.0000
10/30 [=========>....................] - ETA: 0s - loss: 0.0150 - accuracy: 0.9986
21/30 [====================>.........] - ETA: 0s - loss: 0.0169 - accuracy: 0.9979
30/30 [==============================] - 0s 5ms/step - loss: 0.0205 - accuracy: 0.9960
30/30 [==============================] - 0s 8ms/step - loss: 0.0205 - accuracy: 0.9960 - val_loss: 0.5022 - val_accuracy: 0.8714
Epoch 15/20
1/30 [>.............................] - ETA: 0s - loss: 0.0110 - accuracy: 1.0000
9/30 [========>.....................] - ETA: 0s - loss: 0.0114 - accuracy: 0.9998
17/30 [================>.............] - ETA: 0s - loss: 0.0113 - accuracy: 0.9997
28/30 [===========================>..] - ETA: 0s - loss: 0.0125 - accuracy: 0.9992
30/30 [==============================] - 0s 6ms/step - loss: 0.0128 - accuracy: 0.9990
30/30 [==============================] - 0s 8ms/step - loss: 0.0128 - accuracy: 0.9990 - val_loss: 0.5363 - val_accuracy: 0.8683
Epoch 16/20
1/30 [>.............................] - ETA: 0s - loss: 0.0077 - accuracy: 1.0000
11/30 [==========>...................] - ETA: 0s - loss: 0.0105 - accuracy: 0.9986
20/30 [===================>..........] - ETA: 0s - loss: 0.0124 - accuracy: 0.9986
30/30 [==============================] - 0s 5ms/step - loss: 0.0117 - accuracy: 0.9989
30/30 [==============================] - 0s 8ms/step - loss: 0.0117 - accuracy: 0.9989 - val_loss: 0.5709 - val_accuracy: 0.8701
Epoch 17/20
1/30 [>.............................] - ETA: 0s - loss: 0.0067 - accuracy: 1.0000
11/30 [==========>...................] - ETA: 0s - loss: 0.0061 - accuracy: 0.9998
21/30 [====================>.........] - ETA: 0s - loss: 0.0095 - accuracy: 0.9990
30/30 [==============================] - 0s 5ms/step - loss: 0.0087 - accuracy: 0.9991
30/30 [==============================] - 0s 8ms/step - loss: 0.0087 - accuracy: 0.9991 - val_loss: 0.6230 - val_accuracy: 0.8637
Epoch 18/20
1/30 [>.............................] - ETA: 0s - loss: 0.0059 - accuracy: 1.0000
10/30 [=========>....................] - ETA: 0s - loss: 0.0050 - accuracy: 0.9996
20/30 [===================>..........] - ETA: 0s - loss: 0.0103 - accuracy: 0.9981
29/30 [============================>.] - ETA: 0s - loss: 0.0088 - accuracy: 0.9987
30/30 [==============================] - 0s 5ms/step - loss: 0.0087 - accuracy: 0.9987
30/30 [==============================] - 0s 8ms/step - loss: 0.0087 - accuracy: 0.9987 - val_loss: 0.6395 - val_accuracy: 0.8681
Epoch 19/20
1/30 [>.............................] - ETA: 0s - loss: 0.0034 - accuracy: 1.0000
11/30 [==========>...................] - ETA: 0s - loss: 0.0032 - accuracy: 1.0000
21/30 [====================>.........] - ETA: 0s - loss: 0.0035 - accuracy: 0.9999
30/30 [==============================] - 0s 5ms/step - loss: 0.0039 - accuracy: 0.9998
30/30 [==============================] - 0s 8ms/step - loss: 0.0039 - accuracy: 0.9998 - val_loss: 0.7504 - val_accuracy: 0.8534
Epoch 20/20
1/30 [>.............................] - ETA: 0s - loss: 0.0078 - accuracy: 1.0000
11/30 [==========>...................] - ETA: 0s - loss: 0.0031 - accuracy: 1.0000
21/30 [====================>.........] - ETA: 0s - loss: 0.0030 - accuracy: 0.9999
30/30 [==============================] - 0s 5ms/step - loss: 0.0063 - accuracy: 0.9989
30/30 [==============================] - 0s 7ms/step - loss: 0.0063 - accuracy: 0.9989 - val_loss: 0.7101 - val_accuracy: 0.8658
str(history)
List of 2
$ params :List of 3
..$ verbose: int 1
..$ epochs : int 20
..$ steps : int 30
$ metrics:List of 4
..$ loss : num [1:20] 0.51 0.303 0.221 0.174 0.142 ...
..$ accuracy : num [1:20] 0.786 0.901 0.927 0.943 0.955 ...
..$ val_loss : num [1:20] 0.382 0.321 0.284 0.274 0.287 ...
..$ val_accuracy: num [1:20] 0.87 0.875 0.888 0.892 0.887 ...
- attr(*, "class")= chr "keras_training_history"
plot(history)
`geom_smooth()` using formula 'y ~ x'
Validation loss and accuracy show evidence of overfitting after about the 5th epoch where the accuracy peaks and the loss bottoms out. To mitigate this, we can train for only 5 epochs.
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
model %>% fit(x_train, y_train, epochs = 5, batch_size = 512)
Epoch 1/5
1/49 [..............................] - ETA: 13s - loss: 0.6944 - accuracy: 0.5020
7/49 [===>..........................] - ETA: 0s - loss: 0.6606 - accuracy: 0.6088
13/49 [======>.......................] - ETA: 0s - loss: 0.6196 - accuracy: 0.6813
20/49 [===========>..................] - ETA: 0s - loss: 0.5780 - accuracy: 0.7319
27/49 [===============>..............] - ETA: 0s - loss: 0.5458 - accuracy: 0.7598
34/49 [===================>..........] - ETA: 0s - loss: 0.5182 - accuracy: 0.7822
41/49 [========================>.....] - ETA: 0s - loss: 0.4942 - accuracy: 0.7982
49/49 [==============================] - 1s 8ms/step - loss: 0.4730 - accuracy: 0.8116
49/49 [==============================] - 1s 8ms/step - loss: 0.4730 - accuracy: 0.8116
Epoch 2/5
1/49 [..............................] - ETA: 0s - loss: 0.3425 - accuracy: 0.8887
9/49 [====>.........................] - ETA: 0s - loss: 0.3063 - accuracy: 0.9054
15/49 [========>.....................] - ETA: 0s - loss: 0.2971 - accuracy: 0.9069
23/49 [=============>................] - ETA: 0s - loss: 0.2914 - accuracy: 0.9088
32/49 [==================>...........] - ETA: 0s - loss: 0.2833 - accuracy: 0.9098
41/49 [========================>.....] - ETA: 0s - loss: 0.2764 - accuracy: 0.9103
49/49 [==============================] - 0s 7ms/step - loss: 0.2750 - accuracy: 0.9090
49/49 [==============================] - 0s 7ms/step - loss: 0.2750 - accuracy: 0.9090
Epoch 3/5
1/49 [..............................] - ETA: 0s - loss: 0.2068 - accuracy: 0.9355
11/49 [=====>........................] - ETA: 0s - loss: 0.2085 - accuracy: 0.9304
20/49 [===========>..................] - ETA: 0s - loss: 0.2092 - accuracy: 0.9299
30/49 [=================>............] - ETA: 0s - loss: 0.2104 - accuracy: 0.9275
40/49 [=======================>......] - ETA: 0s - loss: 0.2085 - accuracy: 0.9279
49/49 [==============================] - 0s 6ms/step - loss: 0.2091 - accuracy: 0.9275
49/49 [==============================] - 0s 6ms/step - loss: 0.2091 - accuracy: 0.9275
Epoch 4/5
1/49 [..............................] - ETA: 0s - loss: 0.1721 - accuracy: 0.9512
11/49 [=====>........................] - ETA: 0s - loss: 0.1662 - accuracy: 0.9499
21/49 [===========>..................] - ETA: 0s - loss: 0.1703 - accuracy: 0.9461
32/49 [==================>...........] - ETA: 0s - loss: 0.1705 - accuracy: 0.9446
43/49 [=========================>....] - ETA: 0s - loss: 0.1721 - accuracy: 0.9424
49/49 [==============================] - 0s 5ms/step - loss: 0.1733 - accuracy: 0.9412
49/49 [==============================] - 0s 5ms/step - loss: 0.1733 - accuracy: 0.9412
Epoch 5/5
1/49 [..............................] - ETA: 0s - loss: 0.1372 - accuracy: 0.9648
9/49 [====>.........................] - ETA: 0s - loss: 0.1394 - accuracy: 0.9566
18/49 [==========>...................] - ETA: 0s - loss: 0.1566 - accuracy: 0.9465
29/49 [================>.............] - ETA: 0s - loss: 0.1520 - accuracy: 0.9492
39/49 [======================>.......] - ETA: 0s - loss: 0.1517 - accuracy: 0.9477
49/49 [==============================] - 0s 5ms/step - loss: 0.1511 - accuracy: 0.9476
49/49 [==============================] - 0s 5ms/step - loss: 0.1511 - accuracy: 0.9476
results <- model %>% evaluate(x_test, y_test)
1/782 [..............................] - ETA: 48s - loss: 0.3780 - accuracy: 0.8438
82/782 [==>...........................] - ETA: 0s - loss: 0.3170 - accuracy: 0.8784
163/782 [=====>........................] - ETA: 0s - loss: 0.3142 - accuracy: 0.8765
248/782 [========>.....................] - ETA: 0s - loss: 0.3133 - accuracy: 0.8769
325/782 [===========>..................] - ETA: 0s - loss: 0.3271 - accuracy: 0.8719
410/782 [==============>...............] - ETA: 0s - loss: 0.3263 - accuracy: 0.8735
498/782 [==================>...........] - ETA: 0s - loss: 0.3251 - accuracy: 0.8745
574/782 [=====================>........] - ETA: 0s - loss: 0.3209 - accuracy: 0.8759
647/782 [=======================>......] - ETA: 0s - loss: 0.3194 - accuracy: 0.8762
739/782 [===========================>..] - ETA: 0s - loss: 0.3201 - accuracy: 0.8761
782/782 [==============================] - 1s 604us/step - loss: 0.3191 - accuracy: 0.8763
782/782 [==============================] - 1s 604us/step - loss: 0.3191 - accuracy: 0.8763
results
loss accuracy
0.3190779 0.8762800
This fairly naive approach achieves an accuracy of 88%. State-of-the-art approaches can get 95%.
Use the predict method directly. The value indicates how likely the review is to be positive.
model %>% predict(x_test[1:10,])
[,1]
[1,] 0.117030084
[2,] 0.996890545
[3,] 0.652673364
[4,] 0.769265294
[5,] 0.925853550
[6,] 0.658951819
[7,] 0.999153733
[8,] 0.002696246
[9,] 0.933195472
[10,] 0.988951802
mse loss function instead of binary_crossentropy.tanh activation (popular in the early days of neural nets) instead of relu.relu activations can solve a wide range of problems (including sentiment classification), and you’ll likely use them frequently.sigmoid output on a binary classification problem, the loss function you should use is binary_crossentropy.rmsprop optimizer is generally a good enough choice, whatever your problem. That’s one less thing for you to worry about.rectified linear unit: zeroes out negative values↩︎
The sigmoid activation function “squashes” arbitrary values into the [0, 1] interval, outputting something that can be interpreted as a probability.↩︎
Crossentropy is a quantity from the field of Information Theory that measures the distance between probability distributions or, in this case, between the ground-truth distribution and your predictions.↩︎