Deep neural networks for regression problems

Introduction
Data
The model
Testing the model
Conclusion

library(readr)
library(keras)
library(plotly)

Introduction

Unlike classification problems where the target variable is categorical in nature, regression problems have numerical variables as target.

This chapter creates a deep neural network to predict a numerical outcome.

Data

The dataset contains simulated data. There are $4898$ samples over $10$ feature variables and a single target variable. This data is saved in a .csv file in the same folder as this R markdown file.

data.set <- read_csv("RegressionData.csv",
                     col_names = FALSE)

The dimensions are confirmed below.

dim(data.set)

## [1] 4898   11

Transformation into a matrix

The data structure is transformed into a mathematical matrix using the as_matrix() function before removing the variable (column) names.

# Cast dataframe as a matrix
data.set <- as.matrix(data.set)

# Remove column names
dimnames(data.set) = NULL

Distribution of the target variable

The summary statistics of the target variable is shown below.

summary(data.set[, 11])

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.500   5.200   5.900   5.879   6.400   9.300

This can be represented as a histogram, as is shown in figure 1 below.

f1 <- plot_ly() %>% 
  add_histogram(x = ~data.set[, 11],
                name = "Target variable") %>% 
  layout(title = "Target variable",
         xaxis = list(title = "Values",
                      zeroline = FALSE),
         yaxis = list(title = "Count",
                      zeroline = FALSE))
f1

Fig 1 Histogram of the target variable

Note that the values range from $2.5$ to $9.3$.

Train and test split

The dataset, which now exists as a matrix, must be split into a training and a test set. There are various ways in R to perform this split. The method employed in previous chapters is used here. With such a small dataset, the test set will comprise $20% of the samples.

# Split for train and test data
set.seed(123)
indx <- sample(2,
               nrow(data.set),
               replace = TRUE,
               prob = c(0.8, 0.2)) # Makes index with values 1 and 2

x_train <- data.set[indx == 1, 1:10]
x_test <- data.set[indx == 2, 1:10]
y_train <- data.set[indx == 1, 11]
y_test <- data.set[indx == 2, 11]

Normalizing the data

To improve learning, the feature variables must be normalized. As before, the method of standardization is used.

The mean and standard deviation of the feature variables are calculated and stored in the objects mean.train and sd.train. The apply() function calculates the required test statistic along the axis required (the 2 indicating each column). Finally, the scale() function performs the standardization.

mean.train <- apply(x_train,
                    2,
                    mean)
sd.train <- apply(x_train,
                  2,
                  sd)
x_test <- scale(x_test,
                center = mean.train,
                scale = sd.train)

The training data is standardized with a simple use of the scale() function.

x_train <- scale(x_train)

The model

The code below is used to create a densely connected deep neural network with three hidden layers and an output layer.

Creating the model

Note that there is no activation function for the output layer. Each hidden layer has $25$ nodes and the rectified linear unit is used as activation function. Dropout is employed to prevent overfitting.

model <- keras_model_sequential() %>% 
  layer_dense(units = 25,
              activation = "relu",
              input_shape = c(10)) %>% 
  layer_dropout(0.2) %>% 
  layer_dense(units = 25,
              activation = "relu") %>% 
  layer_dropout(0.2) %>% 
  layer_dense(units = 25,
              activation = "relu") %>% 
  layer_dropout(0.2) %>% 
  layer_dense(units = 1)

The summary of the model shows $1601$ learnable parameters.

model %>% summary()

## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense_1 (Dense)                  (None, 25)                    275         
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 25)                    0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 25)                    650         
## ___________________________________________________________________________
## dropout_2 (Dropout)              (None, 25)                    0           
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 25)                    650         
## ___________________________________________________________________________
## dropout_3 (Dropout)              (None, 25)                    0           
## ___________________________________________________________________________
## dense_4 (Dense)                  (None, 1)                     26          
## ===========================================================================
## Total params: 1,601
## Trainable params: 1,601
## Non-trainable params: 0
## ___________________________________________________________________________

Detailed information that shows all the arguments (including those that were left at their default values) can be viewed with the get_config() function.

model %>% get_config()

## [{'class_name': 'Dense', 'config': {'name': 'dense_1', 'trainable': True, 'batch_input_shape': (None, 10), 'dtype': 'float32', 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_1', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_2', 'trainable': True, 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_2', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_3', 'trainable': True, 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_3', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_4', 'trainable': True, 'units': 1, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}]

Compiling the model

Since this is a regression problem, the mean squared error is used as the loss function. The rmsprop optimizer is used, with its default values, i.e. lr = 0.001, rho = 0.9, epsilon = NULL, decay = 0, clipnorm = NULL, clipvalue = NULL.

model %>% compile(loss = "mse",
                  optimizer = optimizer_rmsprop(),
                  metrics = c("mean_absolute_error"))

Fitting the data

All that remains, is to fit the data, with $0.1$ of the training data reserved as validation set. A mini-batch size of $32$ is used. To avoid overfitting (and prevent an unnecessary long run), early stopping is employed. The mean absolute error of the validation set is used as callback monitor, with a patience level of five.

history <- model %>% 
  fit(x_train,
      y_train,
      epoch = 50,
      batch_size = 32,
      validation_split = 0.1,
      callbacks = c(callback_early_stopping(monitor = "val_mean_absolute_error",
                                            patience = 5)),
      verbose = 2)

Testing the model

The test data can be used to show the loss and the mean absolute error of the model. The code chunk below creates two object, loss and mae to hold these values. The mean absolute error is pasted into a sprintf() function using the paste0() function. The "%.2f" argument stipulate two decimal places.

c(loss, mae) %<-% (model %>% evaluate(x_test, y_test, verbose = 0))

paste0("Mean absolute error on test set: ", sprintf("%.2f", mae))

## [1] "Mean absolute error on test set: 0.62"

Conclusion

The chapter introduced a model to solve a regression problem. The following are important notes when dealing with regression models:

The feature variables were standardized according to the mean and standard deviation of the test set
No activation function is used in the output layer
The mean squared error is a typical loss function in this setting
The mean absolute error is a useful metric