Loading dependencies

Keras is the package that allows for creating deep neural learning models in R ggplot loaded for creating plots

library(keras)
library(ggplot2)

Test train split

The boston housing tries to predict the house prices using various parameters like crime rate in the locality the test train split

dataset<-dataset_boston_housing()
c(c(train_data, train_targets), c(test_data, test_targets))%<-%dataset
str(train_data)
##  num [1:404, 1:13] 1.2325 0.0218 4.8982 0.0396 3.6931 ...
str(test_data)
##  num [1:102, 1:13] 18.0846 0.1233 0.055 1.2735 0.0715 ...
str(train_targets)
##  num [1:404(1d)] 15.2 42.3 50 21.1 17.7 18.5 11.3 15.6 15.6 14.4 ...

Data scaling

data scaling is necessary for regression model beacuase various parameters can be measured in various units The training data mean and std deviation has to be used in test data as well. No assumptions can be made in the test data

mean<-apply(train_data, MARGIN = 2, mean)
std<-apply(train_data, 2, sd)
train_data<-scale(train_data, center = mean, scale=std)
test_data<-scale(test_data, center = mean, scale=std)

Creating model function

The following function uses Keras package to first create a dense layer (where all nodes in input layer are connected to all layers in the deeper layers) Smaller units allows for non-overfitting however under-fitting should be kept in mind

build_model<-function(){
  model<-keras_model_sequential()%>%
    layer_dense(units=64, activation='relu', input_shape = dim(train_data)[2])%>%
    layer_dense(units=64, activation="relu")%>%
    layer_dense(units = 1)
  model%>%compile(
    optimizer="rmsprop",
    loss="mse",
    metrics=c('mae')
  )
}

Creating cross validated model

The boston housing dataset is small hence. We use k fold cross validation -here 4 fold The following function genereates the MAE (mean absolute error) for each epoch (500) and stores in all_mae_history

k<-4
indices<-sample(1:nrow(train_data))
folds<-cut(indices, breaks = k, labels = F)
num_epochs <- 500
all_mae_histories <- NULL
for (i in 1:k) {
  cat("processing fold #", i, "\n")
  val_indices <- which(folds == i, arr.ind = TRUE) 
  val_data <- train_data[val_indices,]
  val_targets <- train_targets[val_indices]
  partial_train_data <- train_data[-val_indices,] 
  partial_train_targets <- train_targets[-val_indices]
  model <- build_model() 
  history <- model %>% fit( partial_train_data, partial_train_targets,validation_data = list(val_data, val_targets), epochs = num_epochs, batch_size = 1, verbose = 0
 ) 
  mae_history <- history$metrics$val_mean_absolute_error
all_mae_histories <- rbind(all_mae_histories, mae_history)
}
## processing fold # 1 
## processing fold # 2 
## processing fold # 3 
## processing fold # 4

Calculating average MAE in each validation set and plots the MAE for each of 500 epoch

The steep drop and relative pleauteaued phase of MAE is hypothesized the optimal epoch size

average_mae_history <- data.frame(
  epoch = seq(1:ncol(all_mae_histories)),
  validation_mae = apply(all_mae_histories, 2, mean)
)

Plotting the average MAE vs number of epoch

ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_line()

ggplot(average_mae_history, aes(x = epoch, y = validation_mae)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

# Building a general model to fit on the test data Using the optimal epoch size for fitting onto the training data

model <- build_model()
model %>% fit(train_data, train_targets, 
              epochs = 80, batch_size = 16, verbose = 0)
result <- model %>% evaluate(test_data, test_targets)
result
## $loss
## [1] 19.47835
## 
## $mean_absolute_error
## [1] 2.807683