Deep learning is the most promising new subfield of machine learning , with similarities with the structure and function of the brain (neurons) and therefore, is usually called Artificial Neural Networks (ANN). Deep learning is leading the way in solving many problems including robotics, image recognition and Artificial Intelligence (AI).
keras is a high level neural network API developed for fast experimentation in building neural networks. Lets explore how to implement ANNs using Keras package in R.
keras can be installed from CRAN as below. The installed keras interface uses the TensorFlow backend engine. To additionally install both the core keras library and the TensorFlow backend - after installing the keras and loading it’s library, use the **install_keras() function.
# install keras package
install.packages("keras")
# load the library afer install
library(keras)
#install TensorFlow
install_keras()
From the documentation at keras.rstudio.com, this ensures default CPU-based installation of Keras and TensorFlow. For more customized installation, e.g. if you want to take advantage of NVIDIA GPUs, see the documentation for install_keras().
Recognizing handwritten digits neural networks is the example we will follow to understand keras. We will use the MNIST dataset. MNIST data contains 28 X 28 grayscale images of handwritten digits and the labels.
The labels for these images are 5, 0, 4, 1.
MNIST dataset is included in keras package we just installed, therefore, we load the data dataset_mnist() and create variables for training and test sets.
# load library
library(keras)
#load dataset
mnist <- dataset_mnist()
#create test and training sets
x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y
The x data is 3-d array (images,width,height) of grayscale values . Therefore, to prepare the data is flattened into matrices - e.g using reshape(), the width and height values are converted into a single dimension (every 28x28 image is converted to vector of length 784).
As well, the data is scaled such that grayscale values of integers ranging between 0 to 255 is converted to floating point values ranging between 0 and 1, thru diving the values by 255.
# reshape
dim(x_train) <- c(nrow(x_train), 784)
dim(x_test) <- c(nrow(x_test), 784)
# rescale
x_train <- x_train / 255
x_test <- x_test / 255
The labels, eg y data, is a vector of integers ranging from 0 to 9. Here each label can be any integer value from 0 to 9. To format this for training, each of the label data is converted into vectors of 10 values - only one of which is 1 and rest is 0. The labels we saw before of 5, 0, 4, 1 are converted to
## [,1] [,2] [,3] [,4]
## [1,] 0 1 0 0
## [2,] 0 0 0 1
## [3,] 0 0 0 0
## [4,] 0 0 0 0
## [5,] 0 0 1 0
## [6,] 1 0 0 0
## [7,] 0 0 0 0
## [8,] 0 0 0 0
## [9,] 0 0 0 0
## [10,] 0 0 0 0
This process is called one-hot encoding - ‘one hot’ since out of the 10 binary values for each labels, only one value is equal to 1, rest are 0s. This is a widely used data conversion and therefore, keras package comes with a pre-built function to_categorical()-
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the Sequential model, a linear stack of layers.
We begin by creating a sequential model and then adding layers using the pipe (%>%) operator:
# define model as sequential
model <- keras_model_sequential()
# add layers - type 'dense', use pipe operator( %>%)
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
layer_dropout(rate = 0.4) %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dropout(rate = 0.3) %>%
layer_dense(units = 10, activation = 'softmax')
The input_shape argument to the first layer specifies the shape of the input data (a length 784 numeric vector representing a grayscale image). The final layer outputs a length 10 numeric vector (probabilities for each digit) using a softmax activation function.
Use the summary() function to print the details of the model:
# output the model summary
summary(model)
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense_1 (Dense) (None, 256) 200960
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 256) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 128) 32896
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 128) 0
## ___________________________________________________________________________
## dense_3 (Dense) (None, 10) 1290
## ===========================================================================
## Total params: 235,146
## Trainable params: 235,146
## Non-trainable params: 0
## ___________________________________________________________________________
Next, compile the model with appropriate loss function, optimizer, and metrics:
# compile model
model %>% compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
Use the fit() function to train the model for 30 epochs using batches of 128 images:
# fit model
history <- model %>% fit(
x_train, y_train,
epochs = 30, batch_size = 128,
validation_split = 0.2
)
The history object returned by fit() includes loss and accuracy metrics which we can plot:
# plot the model
plot(history)
Evaluate the model’s performance on the test data:
model %>% evaluate(x_test, y_test)
## $loss
## [1] 0.1085874
##
## $acc
## [1] 0.9801
Generate predictions on new data:
model %>% predict_classes(x_test) %>% head(5)
## [1] 7 2 1 0 4
Models can be saved or loaded easily using save_model_hdf5() and load_model_hdf5() functions
save_model_hdf5(model, "my_model.h5")
model <- load_model_hdf5("my_model.h5")
The trained model weights can be saved and loaded similarly using save_model_weights_hdf5() and load_model_weights_hdf5() functions
save_model_weights_hdf5("my_model_weights.h5")
model %>% load_model_weights_hdf5("my_model_weights.h5")
As well, export model configuration to JSON or YAML using model_to_json() and model_to_yaml(). To load the configurations back into workspace, you can just use the model_from_json() and model_from yaml() functions:
json_string <- model_to_json(model)
model <- model_from_json(json_string)
yaml_string <- model_to_yaml(model)
model <- model_from_yaml(yaml_string)