1 Deep Learning in R

This is a demo on end-to-end implementation of deep neural networks (DNN), a subclass of machine learning (artificial intelligence) class,in R using R interface to Keras, a high-level neural networks API developed in Python. In this demo, we apply DNN models to two data sets, MNIST data set and a loan default data set. This demo is organized as follows:

  • In Section Why Keras?, we provide an overview on why Keras is important when dealing with DNN models. Other alternatives to Keras are also provided. In Section Why R interface?, we highlight key points on what makes R and R interface to Keras very useful tool to work with. Installation procedure is also documented along with the required R codes.

  • We demonstrate the important steps in the exploratory data analysis step in the model fitting process in Section Preparing the Data. Two data sets are studies: MNIST and credit default loans data sets.

  • Section [Keras model training and evaluating] is devoted to the intermediary steps between the inputs and the predicted outputs of a DNN model. That is, we show how to use R interface for Keras to define a model, train it, and evaluate it (DTE) on the test set. In fact, we break down the demo into four different but related subsections as follows:
    • In Subsection Baseline Model, we demonstrate how to implement DTE for a feed-forward DNN using keras. In this part, we fit a plain DNN model to our data sets without employing any regularization techniques;
    • In subsection Multiple models with regularization, we demonstrate the overfitting problem and perform DTE on the baseline model and regularized versions of the model. Two regularization techniques are discussed: \(l1\), \(l2\) regularization and dropout techniques. We graphically show how to compare the performance of these models;
    • How to apply hyperparameters tuning in Keras is another question we address in Subsection Hyperparameters tuning.
    • Some nice training visualization features of Tensorflow and Keras are presented in Subsection Training visualization.
  • Model performance metrics including ROC/AUC is discussed in the last Section Model Performance.

1.1 Why Keras?

There are various deep learning frameworks available today enabling developers, academicians, and practitioners turns ideas into results. Just to name a few, Tensorflow, Theano, Caffe and Keras.

However, there are several advantages of using Keras over other frameworks which include:

  • Keras in designed for human beings, not machines; it has the capability to minimize the number of user actions required for common use cases;
  • Keras is easy to learn and to use, hence allowing the users to be faster and more productive; and
  • Models developed in Keras can be easily deployed across R and Python web apps like Shiny or Flask app.

It is worth highlighting that Keras, after Tensorflow, has the strongest adoption among all other deep learning frameworks in the industry and the research community (including CERN and NASA).

This statistics is based on the total mentions of deep learning frameworks in scientific papers uploaded to the preprint server arXiv.org

This statistics is based on the total mentions of deep learning frameworks in scientific papers uploaded to the preprint server arXiv.org

1.2 Why R interface?

1.2.1 Keras in R

Due to the user friendly feature of R software, this program has a strong influence among different industries and academics. From a data science perspective, R has numerous packages helping implement deep learning models similar to the other machine learning models. For an overview of deep learning packages in R just try to click here. No danger, I promise! :)

However, due to the advantages of using Keras over other frameworks and the user friendly feature of R, there exists two R interfaces to Keras, kerasR package and RStudio’s keras package.

Question: What does it mean when a package, such as the RStudio’s keras package, is “an interface” to another package, the Python Keras?

Answer: In short, R interface means that you, as an user, can enjoy the flexibility and user friendly features of R and at the same time have access to the strong power of the Python Keras package. The best of both worlds

We demonstrate how to install RStudio’s keras package in the next tab.

1.2.2 RStudio’s keras package installation

To install RStudio’s keras package, first install R package from CRAN as follows:

install.packages("keras")

In the next step we need to install Tensorflow and Keras libraries. This is because the Keras R interface uses the TensorFlow backend engine by default. In order to install both libraries together, we use install_keras().

library(keras)
install_keras()

install_keras function has several arguments as follows:

install_keras(method = c("auto", "virtualenv", "conda"),
  conda = "auto", tensorflow = "default",
  extra_packages = c("tensorflow-hub"))

As can be seen from previous chunk of codes, there are three methods to install Keras and Tensorflow when using install_keras function. It is worth mentioning that the only supported installation method on Windows is “conda”.

The default version of Tensorflow is the CPU version; however, if you wish to enjoy your GPU, you are welcomed to change the configuration and specify tensorflow = “gpu”.

It is highly recommended to visit custom installation if you wish to do a custom installation of Keras and Tensorflow.

2 Preparing the Data

2.1 Exploratory analysis

3 Keras model training and evaluation

Similar to any other predictive models, the process of DTE in Keras takes the following steps:

  • Defining the model, which can be broken into:
    • providing the number of layers and neurons for each layer; and
    • adding any regularization technique to avoid overfitting
  • Compilng the model, which includes:
    • defining the optimizer;
    • providing the loss function; and
    • identifying the metrics
  • Fitting the model, which involves:
    • introducing the number of batches;
    • the number of epochs; and
    • the validation split
  • Evaluating the model; for instance:
    • evaluating the model on the test data set; and
    • demonstrating the plots.
  • Predicting the classes/probabilities for the test data set.

a compact flowchart of the process is demonstrated in the following figure.

3.1 Fundamental functions in Keras

3.1.1 Basic functions

In this section a detailed breakdown of the required basic built-in functions and procedures in Keras library, which are used in the aforementioned steps, are introduced. Supplementary functions including those which are used for regularization techniques, tuning parameters task, among others, are discussed in the Supplementary functions subsection.

Define

  • The first step in the implementation of a multi-perceptron neural network is to define a model. By model we mean the way to organize layers. For our training we use the sequential model. It can be defined as follows:
model = keras_model_sequential()
  • Once a sequential model has been defined, we can then add layers to the model (input, hidden and output layers) by calling layer_dense function as follows:
 model %>% 
    layer_dense(units = 256, activation = 'relu', input_shape = c(784))

Note 1: The %>% is the pipe operator in R. Pipe operator in simple term helps avoid opening and closing lots of parentheses when you write your code. This helps the code be more readable. If you are interested in knowing more about pipe operator in R see pip.

  • The common arguments used in practice in layer_dense function for deep learning models are:
    • units: which is a positive integer representing the dimension of the output space of the layer;
    • activation: which is the name of the activation function to use. For a list of activation functions available in Keras see activation;
    • bias_initializer: which specifies the initializer for the bias vector. For a list of initializers available in Keras see initializer;
    • weights: which specifies the initial weights for layer;
    • input_shape: which specifies the dimentionality of the input in the first layer of the model. This argument is only used when defining the first layer of the model.
  • After adding customized layers and neurons to the model, we can see the details of the model in a more organized way by calling summary() function as follows:
summary(model)

Compile

  • In this part, we need to compile the previously defined model with appropriate loss function, optimization technique, to name a few. compile() function which is the appropriate function to employ for this purpose is called as follows:
model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999),
  metrics = c('accuracy')
)
  • The common arguments used in practice in compile() function for deep learning models are:
    • optimizer: it specifies the optimization technique (e.g., SGD, Adam, etc.)used for weights and biases updating. For a list of various optimization techniques in Keras, please visit optimizer and optimizerRStudio;
    • loss: it specifies the name of objective function or objective function (e.g., binary_crossentropy, categorical_crossentropy, etc.). For a list of various loss functions in Keras, please visit loss or lossRStudio;
    • metrics: it specifies the list of metrics to be evaluated by the model during training and testing (e.g., binary_accuracy, etc.). List of various metrics in Keras can be found in metrics and kerasRStudio. A custom metrics can also be defined and used in the metrics argument of compile() function. It is also worth mentioning that more than one metrics can be used at the same time when training and testing a model. Please visit kerasRStudio for a test case.

Fit

  • So far, we have not discussed about training date sets, training configurations like the number of epochs and the batch size, among others. To specify these arguments and other training-related configurations, we employ fit() function in Keras. The fit() function can be employed as follows:
Fitted_model = model %>% fit(
  x_train, y_train,
  epochs = 30, batch_size = 128,
  validation_split = 0.2
  )
  • Full list of arguments for fit() function can be found in fit(). However, the common arguments mostly used in practice are:
    • x,y: the training data and the label data respectively;
    • batch_size: The number of samples per gradient updates. The default is 32;
    • epochs: The number of epochs to train the model;
    • callbacks: List of callbacks to be called during training;
    • validation_split: Float between 0 and 1. The model sets apart the last fraction of the x and y data provided and use it as a validation set.
    • validation_data: Data provided to evaluate the model metrics and loss at the end of each epoch. If provided, it will override validation_split.
Evaluate
  • The final step in the modeling process is to evaluate the trained model on the test data and predict on the new data set. The Keras function used for the evaluation purpose is evaluate(). This function can be deployed as follows:
score = model %>% evaluate(
  x_test, y_test
   )

cat('Test loss:', score$loss, '\n')
cat('Test accuracy:', score$acc, '\n')

To generate predictions on new data, we can use predict_classes as follows:

model %>% predict_classes(x_test)

3.1.2 Supplementary functions

This subsection deals with the functions in R interface to Keras which are used for regularization techniques, tuning parameters task, among others.

  • The first regularization technique is \(l1\)/\(l2\) regularization technique. Depending on which norm we use in the penalty function, we call either \(l1\)-related function or \(l2\)-related function in layer_dense function in Keras. The arrangement in layer_dense() function dealing with this type of regularization is kernel_regularizer which can be used as follows:
layer_dense(units = 128, activation = 'relu',
              kernel_regularizer = regularizer_l2(l = 0.001))

regularizer_l2(l = 0.001) specifies l2 regularization with regularization factor \(l = 0.001\).

  • The other regularization technique which can be employed in KerasRStudio is the dropout technique. This technique can be called through layer_dropout as follows:
layer_dense(units = 128, activation = 'relu'
              ) %>%
  layer_dropout(rate = .4) %>%

As can be seen from the above code, layer_dropout() function is added to the layer we wants to add to our model. There is rate argument in layer_dropout() which specifies a probability/fraction rate which is used to randomly set that fraction of inputs neurons to zero.

Tuning parameters is another step in the process of training a model with the hope to improve the metrics as much as we can.

To perform hyperparameters tuning in KerasRStudio, we can implement all three methods of hyperparameters tuning, manual, grid, and Bayesian hyperparameter optimization. In this subsection, we explain how to specify the parameters for which we need to perform the tuning process and how to call them in the body of the model. The final step which is to call the tuning method (mainly grid or Bayesian) is discussed and implemented in Hyperparameters tuning section.

Regardless of which method we use to tune the parameters, we need to specify/identify the parameters we need to tune and assign flag to them. Depending on the class of the parameter, there are four different types of flags in KerasRStudio.

  • These flags are:
    • flag_numeric(name, default, description = NULL)

    • flag_integer(name, default, description = NULL)

    • flag_boolean(name, default, description = NULL)

    • flag_string(name, default, description = NULL)

In all these four flags types, there is the argument name which specifies the name of the parameter (that can be called when training the model), the argument default which specifies the default value of the name argument, and description argument which provides an explanation of the name argument.

The following example demonstrates how the flags are used:

FLAGS = flags(
  flag_numeric("dropout1", 0.4),
  flag_numeric("dropout2", 0.3),
  flag_string("activation1", "relu"),
  flag_string("activation2", "relu"),
  flag_string("activation3", "softmax")
)

the defined flags then need to be called in the body of the model in the DTE process. The following is an example of how the aforementioned flags are used:

model = keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = FLAGS$activation1, input_shape = c(784)) %>%
  layer_dropout(rate = FLAGS$dropout1) %>%
  layer_dense(units = 128, activation = FLAGS$activation2,
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dropout(rate = FLAGS$dropout2) %>%
  layer_dense(units = 10, activation = FLAGS$activation3)

3.2 Baseline model

In this section, we provide a plain implementation of a multi-perceptron neural network using Keras in R. The fundamental functions required to perform DTE task are presented in Fundamental functions in Keras section. Treating any overfitting or regularization problem and other model tuning are discussed in different sections.

In the following piece of code we put all together the DTE process for a plain multi-perceptron neural network.

model = keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

summary(model)

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999),
  metrics = c('accuracy')
)


Fitted_model = model %>% fit(
  x_train, y_train,
  epochs = 30, batch_size = 128,
  validation_split = 0.2
  )


plot(Fitted_model)

score <- model %>% evaluate(
  x_test, y_test,
  verbose = 0
)

cat('Test loss:', score$loss, '\n')
cat('Test accuracy:', score$acc, '\n')

3.3 Multiple models with regularization

This section is devoted to the demonstration of the overfitting problem and DTE implementation of the regularized versions of the baseline model. Two regularization techniques are discussed: \(l1\), \(l2\) regularization and dropout techniques. We graphically show how to compare the performance of these models.

We break down the DTE implementation of the regularized versions of the baseline model in four pieces. In the first part we provide the baseline implementation of the DTE process and in the second part we demonstrate how DTE is implemented using \(l2\) regularization technique. In the third part we provide the DTE implementation of dropout regularization technique for neural networks. In the last part, the combined methods are implemented.

It is worth highlighting that at each step (when implementing the regularization) we graphically compare the loss and other metrics between the baseline and the regularized version of it.

  • Baseline model
model = keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dense(units = 10, activation = 'softmax')

model %>% summary()

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999),
  metrics = c('accuracy')
)


Baseline_history = model %>% fit(
  x_train, y_train,
  epochs = 30, batch_size = 128,
  view_metrics = TRUE,
  validation_split = 0.2
)
  • Regularized model using \(l2\) technique
l1_model <- keras_model_sequential()
l1_model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dense(units = 128, activation = 'relu',
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dense(units = 10, activation = 'softmax')

l1_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(lr = 0.001),
  metrics = c('accuracy')
)

l1_history <- l1_model %>% fit(
  x_train, y_train,
  batch_size = 128,
  epochs = 30,
  verbose = 1,
  view_metrics = TRUE,
  validation_split = 0.2
)

comparison_l1 = data.frame(
  Baseline_train= Baseline_history$metrics$loss,
  Baseline_val = Baseline_history$metrics$val_loss,
  l1_train = l1_history$metrics$loss,
  l1_val = l1_history$metrics$val_loss
)%>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)

ggplot(comparison_l1, aes(x = rowname, y = value, color = type)) +
  geom_line() +
  xlab("epoch") +
  ylab("loss")
  • Regularized model using dropout technique
drop_model <- keras_model_sequential()
drop_model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = .3) %>%
  layer_dense(units = 128, activation = 'relu'
              ) %>%
  layer_dropout(rate = .4) %>%
  layer_dense(units = 10, activation = 'softmax')

drop_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(lr = 0.001),
  metrics = c('accuracy')
)

drop_history <- drop_model %>% fit(
  x_train, y_train,
  batch_size = 128,
  epochs = 30,
  verbose = 1,
  view_metrics = TRUE,
  validation_split = 0.2
)

comparison_drop = data.frame(
  Baseline_train= Baseline_history$metrics$loss,
  Baseline_val = Baseline_history$metrics$val_loss,
  drop_train = drop_history$metrics$loss,
  drop_val = drop_history$metrics$val_loss
)%>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)

ggplot(comparison_drop, aes(x = rowname, y = value, color = type)) +
  geom_line() +
  xlab("epoch") +
  ylab("loss")
  • Regularized model using both \(l2\) and dropout technique
dropl1_model <- keras_model_sequential()
dropl1_model %>%
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>%
  layer_dropout(rate = .3) %>%
  layer_dense(units = 128, activation = 'relu',
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dropout(rate = .4) %>%
  layer_dense(units = 10, activation = 'softmax')

dropl1_model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(lr = 0.001),
  metrics = c('accuracy')
)

dropl1_history <- dropl1_model %>% fit(
  x_train, y_train,
  batch_size = 128,
  epochs = 30,
  view_metrics = TRUE,
  verbose = 1,
  validation_split = 0.2
)


comparison_dropl1 = data.frame(
  Baseline_train= Baseline_history$metrics$loss,
  Baseline_val = Baseline_history$metrics$val_loss,
  dropl1_train = dropl1_history$metrics$loss,
  dropl1_val = dropl1_history$metrics$val_loss
)%>%
  rownames_to_column() %>%
  mutate(rowname = as.integer(rowname)) %>%
  gather(key = "type", value = "value", -rowname)

ggplot(comparison_dropl1, aes(x = rowname, y = value, color = type)) +
  geom_line() +
  xlab("epoch") +
  ylab("loss")

3.3.1 Model fitting

3.3.2 Results

3.4 Hyperparameters tuning

In this section we provide the full implementation of hyperparameters tuning process using flags in RStudio. We implement both grid search and Bayesian hyperparameter optimization techniques. The important functions to call for this purpose are discussed in Supplementary functions subsection.

Note 2: It is imperative to call appropriate type of flags when defining the hyperparameters in the model. If the parameter is a numeric one, we need to call flag_numeric(), if it’s a string type of parameter, we must call flag_string(), and so on. We refer to flag for further information on these functions and the relevant arguments.

The following chunk of code, demonstrates the full implementation of the flag labeling as well as their call in the body of the model before using a tuning technique. For the sake of later recalling, we save the following R script as “mnist_mlp.R”.

rm(list = ls())

FLAGS = flags(
  flag_numeric("dropout1", 0.4),
  flag_numeric("dropout2", 0.3),
  flag_string("activation1", "relu"),
  flag_string("activation2", "relu"),
  flag_string("activation3", "softmax")
)

model = keras_model_sequential()
model %>%
  layer_dense(units = 256, activation = FLAGS$activation1, input_shape = c(784)) %>%
  layer_dropout(rate = FLAGS$dropout1) %>%
  layer_dense(units = 128, activation = FLAGS$activation2,
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dropout(rate = FLAGS$dropout2) %>%
  layer_dense(units = 10, activation = FLAGS$activation3)

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999),
  metrics = c('accuracy')
)


history = model %>% fit(
  x_train, y_train,
  batch_size = 128,
  epochs = 20,
  view_metrics = TRUE,
  verbose = 1,
  validation_split = 0.2
)


score = model %>% evaluate(
  x_test, y_test,
  verbose = 0
)

cat('Test loss:', score$loss, '\n')
cat('Test accuracy:', score$acc, '\n')

3.4.1 Grid approach

To perform the tuning process under the grid approach in KerasRStudio, we call tuning_run() function in tfruns library as follows:

runs <- tuning_run("mnist_mlp.R", sample = 0.3, flags = list(
   dropout1 = c(0.2, 0.4, 0.5),
   dropout2 = c(0.1, 0.3, 0.5),
   activation1 = c("relu", "softmax", "sigmoid"),
   activation2 = c("relu", "softmax", "sigmoid"),
   activation3 = c("relu", "softmax", "sigmoid")
 ))

As can be seen from the previous chunk of code, the tuning_run() function takes different arguments among which are the file name, sample, and flags. File name provides path to training script (in our example it is “mnist_mlp.R”), sample specifies the sampling rate for flag combinations. Sometimes the combination of different flags with multiple values makes the tuning process quite computationally expensive. In this case sample argument randomly performs the tuning on the sampling rate for flag combinations instead of all combinations. The flags arguments in tuning_run() function specifies the list of all parameters names with multiple flag values.

3.4.2 Bayesian approach (CloudML)

Another approach to the hyperparameter tuning process is the Bayesian approach. This approach can be employed in RStudio using CloudML and R interface to Google CloudML, cloudml package. To install “cloudml” package as well as “Google Cloud SDK”, we refer to CloudML.

Once the required packages are installed, we set up the training configuration for the later use in cloudml-related functions. The following script demonstrates how to create a training configuration file. We bname it “tuning.yml”" for later recalling.

trainingInput:
  scaleTier: CUSTOM
masterType: standard_gpu
hyperparameters:
  goal: MAXIMIZE
hyperparameterMetricTag: val_acc
maxTrials: 10
maxParallelTrials: 2
params:
  - parameterName: dropout1
    type: DOUBLE
    minValue: 0.2
    maxValue: 0.5
    scaleType: UNIT_LINEAR_SCALE
  - parameterName: dropout2
    type: DOUBLE
    minValue: 0.1
    maxValue: 0.5
    scaleType: UNIT_LINEAR_SCALE
    - parameterName: activation1
    type: CATEGORICAL
    categoricalValues: [relu, softmax, sigmoid]
    - parameterName: activation2
    type: CATEGORICAL
    categoricalValues: [relu, softmax, sigmoid]
    - parameterName: activation3
    type: CATEGORICAL
    categoricalValues: [relu, softmax, sigmoid]

Note 3: It is clear from tuning.yml file that there are several parameters which need to be specified when defining the training configuration file for the purpose of tuning job. For instance, hyperparameterMetricTag specifies the metric to optimze for (either maximize or minimize) when training the model. For Keras, there are “acc”, “loss”, “val_acc” and “val_loss”. The Type parameters can take one of “integer”, “double”, “categorical” or “discrete”. For other configurations we refer to training config.

The next step is to submit a hyperparameter tuning job using CloudML. This task can be done by calling cloudml_train function as follows:

cloudml_train("mnist_mlp.R", config = "tuning.yml")

As can be seen from the previous line of code, we need to specify the training configuration when we call “cloudml_train” function for the sake tuning job.

3.4.3 Bayesian optimization in R

In this subsection, we provide an overview of how to employ rBayesianOptimization package in R when trying to tune hyperparameters using Bayesian approach.

The full implementation of hyperparameter tuning using rBayesianOptimization package does not require any flag labeling to the hyperparameters.

To apply BayesianOptimization() function, we first define a function (with hyperparameters as inputs) which performs the DTE process as follows:

rm(list = ls())

training_credit = function(initParams){

urlToData = "https://assets.datacamp.com/production/course_1025/datasets/loan_data_ch1.rds"
savePath = tempfile(fileext = ".rds")
download.file(urlToData, destfile = savePath, mode = "wb")
loanData = readRDS(savePath)
# Clean up
rm(urlToData, savePath)
# Convert default status to factor
setDT(loanData)
loanData[, `:=`(loan_status, factor(loan_status))]
loanData= as.data.frame.matrix(loanData) 

for(i in c(3,5)){
  loanData[which(is.na(loanData[,i])), i] = mean(loanData[,i], na.rm = TRUE)
}

loanData$loan_status = factor(loanData$loan_status)
loanData$loan_status = as.numeric(loanData$loan_status)
loanData$grade = factor(loanData$grade)
loanData$grade = as.numeric(loanData$grade)
loanData$home_ownership = factor(loanData$home_ownership)
loanData$home_ownership = as.numeric(loanData$home_ownership)

maxs = apply(loanData, 2, max) 
mins = apply(loanData, 2, min)

scaled = as.data.frame(scale(loanData, 
                             center = mins, scale = maxs - mins))


splitData = initial_split(scaled, prop = 2/3, strata = "loan_status")
trainData = training(splitData)
testData = setdiff(scaled,trainData)

x_train = trainData[-1]
x_train = data.matrix(x_train)
y_train = trainData$loan_status
y_train = data.matrix(y_train)
x_test = testData[-1]
x_test = data.matrix(x_test)
y_test = testData$loan_status
y_test = data.matrix(y_test) # need to convert dataframe to matrix


# Define Model --------------------------------------------------------------

model = keras_model_sequential()
model %>%
  layer_dense(units = 20, activation = 'sigmoid', input_shape = c(7)) %>%
  layer_dropout(rate = initParams$dropout1) %>%
  layer_dense(units = 10, activation = 'sigmoid',
              kernel_regularizer = regularizer_l2(l = 0.001)) %>%
  layer_dropout(rate = initParams$dropout2) %>%
  layer_dense(units = 1, activation = 'sigmoid')

summary(model)

model %>% compile(
  loss = 'binary_crossentropy',
  #optimizer = optimizer_rmsprop(lr = 0.001),
  optimizer = optimizer_adam(lr = 0.001, beta_1 = 0.9, beta_2 = 0.999),
  metrics = c('accuracy')
)



# Training & Evaluation ----------------------------------------------------

history = model %>% fit(
  x_train, y_train,
  batch_size = 128,
  epochs = 20,
  view_metrics = TRUE,
  verbose = 1,
  #callbacks = callback_tensorboard("logs/run_a"),
  validation_split = 0.2
)

return(history$metrics$val_acc[20])
}

In the next step, we define another function maximizing/minimizing our desired metrics. This function then needs to be fed into the “BayesianOptimization()” function to perform the tuning process.

maximizeACC = function(dropout1, dropout2) {
  
  replaceParams = list ( dropout1 = dropout1, dropout2 = dropout2)
  updatedParams = modifyList(initParams, replaceParams)
  
  score = training_credit(updatedParams)
  results = list (Score = score,  Pred = 0)
  return(results)
}


boundsParams = list (dropout1 = c(0.1, 0.7), dropout2 = c(0.1, 0.7))

Final_calibrated = BayesianOptimization(maximizeACC, bounds = boundsParams, 
  init_grid_dt = as.data.table(boundsParams), 
    init_points = 10, n_iter = 30, acq = "ucb", 
  kappa = 2.576, eps = 0, verbose = FALSE)


tail(Final_calibrated$History)

3.5 Training visualization

4 Model Performance