library(readr)
library(keras)
library(plotly)
Unlike classification problems where the target variable is categorical in nature, regression problems have numerical variables as target.
This chapter creates a deep neural network to predict a numerical outcome.
The dataset contains simulated data. There are \(4898\) samples over \(10\) feature variables and a single target variable. This data is saved in a .csv file in the same folder as this R markdown file.
data.set <- read_csv("RegressionData.csv",
col_names = FALSE)
The dimensions are confirmed below.
dim(data.set)
## [1] 4898 11
The data structure is transformed into a mathematical matrix using the as_matrix() function before removing the variable (column) names.
# Cast dataframe as a matrix
data.set <- as.matrix(data.set)
# Remove column names
dimnames(data.set) = NULL
The summary statistics of the target variable is shown below.
summary(data.set[, 11])
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.500 5.200 5.900 5.879 6.400 9.300
This can be represented as a histogram, as is shown in figure 1 below.
f1 <- plot_ly() %>%
add_histogram(x = ~data.set[, 11],
name = "Target variable") %>%
layout(title = "Target variable",
xaxis = list(title = "Values",
zeroline = FALSE),
yaxis = list(title = "Count",
zeroline = FALSE))
f1
Fig 1 Histogram of the target variable
Note that the values range from \(2.5\) to \(9.3\).
The dataset, which now exists as a matrix, must be split into a training and a test set. There are various ways in R to perform this split. The method employed in previous chapters is used here. With such a small dataset, the test set will comprise $20% of the samples.
# Split for train and test data
set.seed(123)
indx <- sample(2,
nrow(data.set),
replace = TRUE,
prob = c(0.8, 0.2)) # Makes index with values 1 and 2
x_train <- data.set[indx == 1, 1:10]
x_test <- data.set[indx == 2, 1:10]
y_train <- data.set[indx == 1, 11]
y_test <- data.set[indx == 2, 11]
To improve learning, the feature variables must be normalized. As before, the method of standardization is used.
The mean and standard deviation of the feature variables are calculated and stored in the objects mean.train and sd.train. The apply() function calculates the required test statistic along the axis required (the 2 indicating each column). Finally, the scale() function performs the standardization.
mean.train <- apply(x_train,
2,
mean)
sd.train <- apply(x_train,
2,
sd)
x_test <- scale(x_test,
center = mean.train,
scale = sd.train)
The training data is standardized with a simple use of the scale() function.
x_train <- scale(x_train)
The code below is used to create a densely connected deep neural network with three hidden layers and an output layer.
Note that there is no activation function for the output layer. Each hidden layer has \(25\) nodes and the rectified linear unit is used as activation function. Dropout is employed to prevent overfitting.
model <- keras_model_sequential() %>%
layer_dense(units = 25,
activation = "relu",
input_shape = c(10)) %>%
layer_dropout(0.2) %>%
layer_dense(units = 25,
activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(units = 25,
activation = "relu") %>%
layer_dropout(0.2) %>%
layer_dense(units = 1)
The summary of the model shows \(1601\) learnable parameters.
model %>% summary()
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense_1 (Dense) (None, 25) 275
## ___________________________________________________________________________
## dropout_1 (Dropout) (None, 25) 0
## ___________________________________________________________________________
## dense_2 (Dense) (None, 25) 650
## ___________________________________________________________________________
## dropout_2 (Dropout) (None, 25) 0
## ___________________________________________________________________________
## dense_3 (Dense) (None, 25) 650
## ___________________________________________________________________________
## dropout_3 (Dropout) (None, 25) 0
## ___________________________________________________________________________
## dense_4 (Dense) (None, 1) 26
## ===========================================================================
## Total params: 1,601
## Trainable params: 1,601
## Non-trainable params: 0
## ___________________________________________________________________________
Detailed information that shows all the arguments (including those that were left at their default values) can be viewed with the get_config() function.
model %>% get_config()
## [{'class_name': 'Dense', 'config': {'name': 'dense_1', 'trainable': True, 'batch_input_shape': (None, 10), 'dtype': 'float32', 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_1', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_2', 'trainable': True, 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_2', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_3', 'trainable': True, 'units': 25, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}, {'class_name': 'Dropout', 'config': {'name': 'dropout_3', 'trainable': True, 'rate': 0.2, 'noise_shape': None, 'seed': None}}, {'class_name': 'Dense', 'config': {'name': 'dense_4', 'trainable': True, 'units': 1, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'class_name': 'VarianceScaling', 'config': {'scale': 1.0, 'mode': 'fan_avg', 'distribution': 'uniform', 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}}]
Since this is a regression problem, the mean squared error is used as the loss function. The rmsprop optimizer is used, with its default values, i.e. lr = 0.001, rho = 0.9, epsilon = NULL, decay = 0, clipnorm = NULL, clipvalue = NULL.
model %>% compile(loss = "mse",
optimizer = optimizer_rmsprop(),
metrics = c("mean_absolute_error"))
All that remains, is to fit the data, with \(0.1\) of the training data reserved as validation set. A mini-batch size of \(32\) is used. To avoid overfitting (and prevent an unnecessary long run), early stopping is employed. The mean absolute error of the validation set is used as callback monitor, with a patience level of five.
history <- model %>%
fit(x_train,
y_train,
epoch = 50,
batch_size = 32,
validation_split = 0.1,
callbacks = c(callback_early_stopping(monitor = "val_mean_absolute_error",
patience = 5)),
verbose = 2)
The test data can be used to show the loss and the mean absolute error of the model. The code chunk below creates two object, loss and mae to hold these values. The mean absolute error is pasted into a sprintf() function using the paste0() function. The "%.2f" argument stipulate two decimal places.
c(loss, mae) %<-% (model %>% evaluate(x_test, y_test, verbose = 0))
paste0("Mean absolute error on test set: ", sprintf("%.2f", mae))
## [1] "Mean absolute error on test set: 0.62"
The chapter introduced a model to solve a regression problem. The following are important notes when dealing with regression models: