1 Objectives

2 Install and Load Packages

3 Importing the Data

4 Exploratory Data Analyisis

4.1 Correlations
4.2 Visualizing Feature relationships with a scatterplot matrix

5 Modelling

5.1 Create training and test sets
5.2 Training Parameters
5.3 Model Training

5.3.1 Grid Search
5.3.2 Random Search
5.3.3 Genetic Algorithm
5.3.4 Differential Evolution
5.3.5 Particle Swarm Optimization

6 Model Performance

6.1 Performance on Training Set
6.2 Performance on Test Set

7 Summary

1 Objectives

The objective of this document is to assess the use of various search methods in finding the optimal values of hyperparameters of a machine learning model. The population-based search methods to be tested are genetic algorithms (GA), differential evolution (DE) and particle swarm optimization (PSO). Grid and random search will also be performed and used as benchmarks.

XGBoost with generalized linear models as learners will be used to predict the compressive strength of concrete based on its composition. The compressive strength of concrete is determined by its age and composition.

The dataset used comes from the research paper Modeling of strength of high performance concrete using artificial neural networks by I-Cheng Yeh published in Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). The dataset can be downloaded through the UCI Machine learning Repository.

The dataset contains 1030 examples and the following features:

Input Variable: Cement (kg in a m³ mixture)
Input Variable: Blast Furnace Slag (kg in a m³ mixture)
Input Variable: Fly Ash (kg in a m³ mixture)
Input Variable: Water (kg in a m³ mixture)
Input Variable: Superplasticizer (kg in a m³ mixture)
Input Variable: Coarse Aggregate (kg in a m³ mixture)
Input Variable: Fine Aggregate (kg in a m³ mixture)
Input Variable: Age (days)
Output Variable: Concrete compressive strength (MPa)

2 Install and Load Packages

The pacman package is used to install and load all necessary packages.

# install.packages("pacman", verbose = F, quiet = T)
pacman::p_load(caret, tidyverse, readr, readxl, parallel, doParallel, gridExtra, plyr, pso, GA, DEoptim, GGally, xgboost, broom, knitr, kableExtra, tictoc, install = T)

3 Importing the Data

The data was downloaded from the UCI Machine Learning Data Repository:

# Load library
# download.file(url = "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls", destfile = "Concrete_Data.xls", method = "curl", quiet = TRUE)

# Import Data
concrete_data <- read_xls(path = "Concrete_Data.xls", sheet = 1)

# Rename variables
colnames(concrete_data) <- c("Cement", "Slag", "Ash", "Water", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate", "Age", "Strength")

4 Exploratory Data Analyisis

Check the structure of the dataset:

# Check structure of the dataset
glimpse(concrete_data)

Observations: 1,030
Variables: 9
$ Cement           <dbl> 540.0, 540.0, 332.5, 332.5, 198.6, 266.0, 380...
$ Slag             <dbl> 0.0, 0.0, 142.5, 142.5, 132.4, 114.0, 95.0, 9...
$ Ash              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Water            <dbl> 162, 162, 228, 228, 192, 228, 228, 228, 228, ...
$ Superplasticizer <dbl> 2.5, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
$ Coarse_Aggregate <dbl> 1040.0, 1055.0, 932.0, 932.0, 978.4, 932.0, 9...
$ Fine_Aggregate   <dbl> 676.0, 676.0, 594.0, 594.0, 825.5, 670.0, 594...
$ Age              <dbl> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 9...
$ Strength         <dbl> 79.986111, 61.887366, 40.269535, 41.052780, 4...

The values of the components of the concrete were recalculated so their values range from 0 to 1.

# Recalculate composition as proportions ranging from 0 to 1
concrete_data[, 1:7] <- t(apply(X = concrete_data[, 1:7], MARGIN = 1, FUN = function(x) {x/sum(x)}))

# Print summary statistics
glimpse(concrete_data)

Observations: 1,030
Variables: 9
$ Cement           <dbl> 0.22309440, 0.22172039, 0.14917003, 0.1491700...
$ Slag             <dbl> 0.00000000, 0.00000000, 0.06393001, 0.0639300...
$ Ash              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Water            <dbl> 0.06692832, 0.06651612, 0.10228802, 0.1022880...
$ Superplasticizer <dbl> 0.001032844, 0.001026483, 0.000000000, 0.0000...
$ Coarse_Aggregate <dbl> 0.4296633, 0.4331759, 0.4181247, 0.4181247, 0...
$ Fine_Aggregate   <dbl> 0.2792811, 0.2775611, 0.2664872, 0.2664872, 0...
$ Age              <dbl> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 9...
$ Strength         <dbl> 79.986111, 61.887366, 40.269535, 41.052780, 4...

Print summary statistics for all variables and check for missing (NA) values:

summary(concrete_data)

     Cement             Slag               Ash              Water        
 Min.   :0.04482   Min.   :0.000000   Min.   :0.00000   Min.   :0.05139  
 1st Qu.:0.08205   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.06955  
 Median :0.11528   Median :0.009455   Median :0.00000   Median :0.07862  
 Mean   :0.11955   Mean   :0.031643   Mean   :0.02317   Mean   :0.07773  
 3rd Qu.:0.14917   3rd Qu.:0.061972   3rd Qu.:0.05033   3rd Qu.:0.08386  
 Max.   :0.22541   Max.   :0.150339   Max.   :0.08884   Max.   :0.11222  
 Superplasticizer   Coarse_Aggregate Fine_Aggregate        Age        
 Min.   :0.000000   Min.   :0.3459   Min.   :0.2480   Min.   :  1.00  
 1st Qu.:0.000000   1st Qu.:0.3923   1st Qu.:0.3112   1st Qu.:  7.00  
 Median :0.002727   Median :0.4205   Median :0.3305   Median : 28.00  
 Mean   :0.002620   Mean   :0.4152   Mean   :0.3301   Mean   : 45.66  
 3rd Qu.:0.004338   3rd Qu.:0.4376   3rd Qu.:0.3541   3rd Qu.: 56.00  
 Max.   :0.013149   Max.   :0.4798   Max.   :0.4141   Max.   :365.00  
    Strength     
 Min.   : 2.332  
 1st Qu.:23.707  
 Median :34.443  
 Mean   :35.818  
 3rd Qu.:46.136  
 Max.   :82.599

# Check number of NA values in each column
colSums(is.na(concrete_data), na.rm = F)

          Cement             Slag              Ash            Water 
               0                0                0                0 
Superplasticizer Coarse_Aggregate   Fine_Aggregate              Age 
               0                0                0                0 
        Strength 
               0

There are no NA values in the dataset however input variables have different ranges of values.

concrete_data %>% 
  gather(key = Variable, value = Value) %>% 
  ggplot() +
    geom_histogram(aes(x = Value), bins = 20, fill = "blue") +
    facet_wrap(~Variable, scales='free') +
    theme_bw() +
    theme(aspect.ratio = 0.5, axis.title = element_blank(), panel.grid = element_blank())

4.1 Correlations

Plot correlation heatmap using the ggcorr() function from GGally package.

# Plot correlation heatmap
ggcorr(concrete_data, label = TRUE, palette = "RdBu", name = "Correlation", hjust = 0.75, label_size = 3, label_round = 2)

There seems to be a considerable positive correlation between the amount of cement and superplasticizer used and cement strength. Age is also positively correlation with concrete compressive strength. Most pairwise correlations between predictors are generally low.

4.2 Visualizing Feature relationships with a scatterplot matrix

We can use a scatterplot matrix (a collection of scatterplots organized in a grid) to understand the relationship between each predictor and the target feature.

ggduo(data = concrete_data, 
      columnsX = 1:8, 
      columnsY = 9, 
      types = list(continuous = "smooth_lm"),
      mapping = ggplot2::aes(color = -Strength, alpha = 0.3)
      ) +
  theme_bw()

5 Modelling

5.1 Create training and test sets

80% of the samples from the dataset are randomly selected for the training set, the remaining 20% are allocated for testing the models. Stratified partioning on the target feature will be applied for the creation of these subsets.

# Remove any observations with missing values
concrete_data <- concrete_data[complete.cases(concrete_data), ]

# Average the values of compressive strength of replicate experiments
concrete_data <- ddply(.data = concrete_data, 
                       .variables = .(Cement, Slag, Ash, Water, Superplasticizer, `Coarse_Aggregate`, `Fine_Aggregate`, Age), 
                       .fun = function(x) c(Strength = mean(x$Strength)))

# Create training and test set using stratified partioning
set.seed(1)
training_index <- createDataPartition(y = concrete_data$Strength, p = 0.80)[[1]]
training_set <- concrete_data[training_index, ]
test_set <- concrete_data[-training_index, ]

# Check distributtion of strength on training set and test set
par(mfrow = c(1, 2))
hist(training_set$Strength, main = "Training Set", xlab = "Concrete Compressive Strength (MPa)", freq = FALSE)
hist(test_set$Strength, main = "Test Set", xlab = "Concrete Compressive Strength (MPa)", freq = FALSE)

# Print summary statistics for training and test set
bind_rows(summary(training_set$Strength), summary(test_set$Strength)) %>% as_tibble() %>% add_column(Subset = c("Training", "Test"), .before = 1)

ABCDEFGHIJ0123456789

Subset <chr>	Min. <dbl>	1st Qu. <dbl>	Median <dbl>	Mean <dbl>	3rd Qu. <dbl>	Max. <dbl>
Training	2.331808	23.52147	33.75950	35.18998	44.86524	82.59922
Test	6.267337	23.52216	33.75847	35.13567	44.73222	79.29663

The distribution of values of the target feature on both the training and test set are similar.

5.2 Training Parameters

# Training Parameters
CV_folds <- 5 # number of folds
CV_repeats <- 3 # number of repeats
minimum_resampling <- 5 # minimum number of resamples

Parameter tuning and selection will be done using repeated 5-fold cross-validation. Each round of cross-validation will be repeated 3 times. Adaptive resampling will be used for model training with grid and random search.

The caret package will be used for model training, tuning and evaluation.

# Training Settings
set.seed(1)

# trainControl object for standard repeated cross-validation
train_control <- caret::trainControl(method = "repeatedcv", number = CV_folds, repeats = CV_repeats, 
                                     verboseIter = FALSE, returnData = FALSE) 

# trainControl object for repeated cross-validation with grid search
adapt_control_grid <- caret::trainControl(method = "adaptive_cv", number = CV_folds, repeats = CV_repeats, 
                                     adaptive = list(min = minimum_resampling, # minimum number of resamples tested before model is excluded
                                                     alpha = 0.05, # confidence level used to exclude parameter settings
                                                     method = "gls", # generalized least squares
                                                     complete = TRUE), 
                                     search = "grid", # execute grid search
                                     verboseIter = FALSE, returnData = FALSE) 

# trainControl object for repeated cross-validation with random search
adapt_control_random <- caret::trainControl(method = "adaptive_cv", number = CV_folds, repeats = CV_repeats, 
                                     adaptive = list(min = minimum_resampling, # minimum number of resamples tested before model is excluded
                                                     alpha = 0.05, # confidence level used to exclude parameter settings
                                                     method = "gls", # generalized least squares
                                                     complete = TRUE), 
                                     search = "random", # execute random search
                                     verboseIter = FALSE, returnData = FALSE) 

5.3 Model Training

Extreme Gradient Boosting (XGBoost) will be used to create the regression models. XGBoost is similar to gradient boosting but has the capacity to do parallel computation on a single machine and perform regularization to avoid overfitting. It was developed by Tianqi Chen. Additional advantages of the XGBoost algorithm include its internal cross-validation function, its ability to handle missing values, its flexibility and its ability to prune the tree until the improvement in loss function is below a threshold.

Similar to GBM, XGBoost uses the errors of previous models to reduce them on the next iteration. The final model is a weighted combination of the models obtained on previous iterations. XGBoost has several tuning parameters some of which depend on the type of booster used (CART or generalized linear model) while others are general, regularization and learning task parameters.

Models will be created using generalized linear models as learners.

When using the xgbLinear method in caret, there are four hyperparameters to optimize:

nrounds: number of boosting iterations
eta: step size shrinkage
lambda: L2 Regularization
alpha: L1 Regularization

The method above requires the xgboost package.

5.3.1 Grid Search

# Create grid
XGBoost_Linear_grid <- expand.grid(
                                  nrounds = c(50, 100, 250, 500), # number of boosting iterations
                                  eta = c(0.01, 0.1, 1),  # learning rate, low value means model is more robust to overfitting
                                  lambda = c(0.1, 0.5, 1), # L2 Regularization (Ridge Regression)
                                  alpha =  c(0.1, 0.5, 1) # L1 Regularization (Lasso Regression)
                                  ) 

Train XGBoost models with grid search. 108 different combinations of hyperparameter values are tested. Training will be done using adaptive resampling with a minimum resampling number of 5.

GS_T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1); 
# Train model with grid search
GS_XGBoost_Linear_model <- caret::train(Strength ~., 
                                        data = training_set,
                                        method = "xgbLinear",
                                        trControl = adapt_control_grid,
                                        verbose = FALSE, 
                                        silent = 1,
                                        # tuneLength = 20
                                        tuneGrid = XGBoost_Linear_grid
                                        )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ(); #  force R to return to single threaded processing
GS_T1 <- Sys.time()
GS_T1-GS_T0

Time difference of 4.815483 mins

GS_XGBoost_Linear_model

eXtreme Gradient Boosting 

No pre-processing
Resampling: Adaptively Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results across tuning parameters:

  nrounds  eta   lambda  alpha  RMSE      Rsquared   MAE       Resamples
   50      0.01  0.1     0.1    4.936943  0.9085218  3.397857   5       
   50      0.01  0.1     0.5    4.921067  0.9082835  3.351815   6       
   50      0.01  0.1     1.0    4.918529  0.9087952  3.352367  11       
   50      0.01  0.5     0.1    4.903861  0.9089026  3.311722   7       
   50      0.01  0.5     0.5    4.902812  0.9093434  3.304593   6       
   50      0.01  0.5     1.0    4.859590  0.9099550  3.295093   6       
   50      0.01  1.0     0.1    4.925628  0.9065976  3.292862   6       
   50      0.01  1.0     0.5    4.838971  0.9123154  3.260168   5       
   50      0.01  1.0     1.0    4.844277  0.9118023  3.241693   5       
   50      0.10  0.1     0.1    4.936943  0.9085218  3.397857   5       
   50      0.10  0.1     0.5    4.914577  0.9088890  3.347280  11       
   50      0.10  0.1     1.0    4.918529  0.9087952  3.352367  11       
   50      0.10  0.5     0.1    4.911200  0.9085197  3.293861   6       
   50      0.10  0.5     0.5    4.902812  0.9093434  3.304593   6       
   50      0.10  0.5     1.0    4.859590  0.9099550  3.295093   6       
   50      0.10  1.0     0.1    4.948306  0.9058229  3.318685   7       
   50      0.10  1.0     0.5    4.838971  0.9123154  3.260168   5       
   50      0.10  1.0     1.0    4.844277  0.9118023  3.241693   5       
   50      1.00  0.1     0.1    4.936943  0.9085218  3.397857   5       
   50      1.00  0.1     0.5    4.914577  0.9088890  3.347280  11       
   50      1.00  0.1     1.0    4.918529  0.9087952  3.352367  11       
   50      1.00  0.5     0.1    4.911200  0.9085197  3.293861   6       
   50      1.00  0.5     0.5    4.798290  0.9136036  3.296048   9       
   50      1.00  0.5     1.0    4.737880  0.9150095  3.225162  15       
   50      1.00  1.0     0.1    4.925628  0.9065976  3.292862   6       
   50      1.00  1.0     0.5    4.838971  0.9123154  3.260168   5       
   50      1.00  1.0     1.0    4.844277  0.9118023  3.241693   5       
  100      0.01  0.1     0.1    4.898302  0.9098388  3.336662   5       
  100      0.01  0.1     0.5    4.874645  0.9099645  3.293021   6       
  100      0.01  0.1     1.0    4.757659  0.9143535  3.239076  15       
  100      0.01  0.5     0.1    4.851921  0.9106150  3.203380   6       
  100      0.01  0.5     0.5    4.725445  0.9158630  3.198738  15       
  100      0.01  0.5     1.0    4.687353  0.9167605  3.155085  15       
  100      0.01  1.0     0.1    4.869945  0.9086808  3.228721   6       
  100      0.01  1.0     0.5    4.763857  0.9145973  3.171491  15       
  100      0.01  1.0     1.0    4.863265  0.9098747  3.185777   6       
  100      0.10  0.1     0.1    4.898302  0.9098388  3.336662   5       
  100      0.10  0.1     0.5    4.874645  0.9099645  3.293021   6       
  100      0.10  0.1     1.0    4.835983  0.9110961  3.243810   6       
  100      0.10  0.5     0.1    4.851921  0.9106150  3.203380   6       
  100      0.10  0.5     0.5    4.725445  0.9158630  3.198738  15       
  100      0.10  0.5     1.0    4.805628  0.9117228  3.217893   6       
  100      0.10  1.0     0.1    4.869945  0.9086808  3.228721   6       
  100      0.10  1.0     0.5    4.878031  0.9094280  3.222654   6       
  100      0.10  1.0     1.0    4.701045  0.9165749  3.131927  15       
  100      1.00  0.1     0.1    4.898302  0.9098388  3.336662   5       
  100      1.00  0.1     0.5    4.874645  0.9099645  3.293021   6       
  100      1.00  0.1     1.0    4.835983  0.9110961  3.243810   6       
  100      1.00  0.5     0.1    4.851921  0.9106150  3.203380   6       
  100      1.00  0.5     0.5    4.860876  0.9108230  3.237303   6       
  100      1.00  0.5     1.0    4.805628  0.9117228  3.217893   6       
  100      1.00  1.0     0.1    4.869945  0.9086808  3.228721   6       
  100      1.00  1.0     0.5    4.878031  0.9094280  3.222654   6       
  100      1.00  1.0     1.0    4.701045  0.9165749  3.131927  15       
  250      0.01  0.1     0.1    4.894695  0.9099257  3.322348   5       
  250      0.01  0.1     0.5    4.869165  0.9101566  3.286642   6       
  250      0.01  0.1     1.0    4.755846  0.9144164  3.236857  15       
  250      0.01  0.5     0.1    4.769295  0.9142414  3.202799  15       
  250      0.01  0.5     0.5    4.717203  0.9161507  3.186846  15       
  250      0.01  0.5     1.0    4.802161  0.9118293  3.213810   6       
  250      0.01  1.0     0.1    4.751859  0.9144018  3.155247  15       
  250      0.01  1.0     0.5    4.864625  0.9099013  3.202043   6       
  250      0.01  1.0     1.0    4.852260  0.9102658  3.170016   6       
  250      0.10  0.1     0.1    4.894695  0.9099257  3.322348   5       
  250      0.10  0.1     0.5    4.869165  0.9101566  3.286642   6       
  250      0.10  0.1     1.0    4.835181  0.9111145  3.242163   6       
  250      0.10  0.5     0.1    4.769295  0.9142414  3.202799  15       
  250      0.10  0.5     0.5    4.717203  0.9161507  3.186846  15       
  250      0.10  0.5     1.0    4.802161  0.9118293  3.213810   6       
  250      0.10  1.0     0.1    4.751859  0.9144018  3.155247  15       
  250      0.10  1.0     0.5    4.752829  0.9149809  3.153207  15       
  250      0.10  1.0     1.0    4.693967  0.9168326  3.119964  15       
  250      1.00  0.1     0.1    4.894695  0.9099257  3.322348   5       
  250      1.00  0.1     0.5    4.738729  0.9150016  3.230586  15       
  250      1.00  0.1     1.0    4.835181  0.9111145  3.242163   6       
  250      1.00  0.5     0.1    4.834886  0.9112484  3.180803   6       
  250      1.00  0.5     0.5    4.717203  0.9161507  3.186846  15       
  250      1.00  0.5     1.0    4.802161  0.9118293  3.213810   6       
  250      1.00  1.0     0.1    4.751859  0.9144018  3.155247  15       
  250      1.00  1.0     0.5    4.752829  0.9149809  3.153207  15       
  250      1.00  1.0     1.0    4.693967  0.9168326  3.119964  15       
  500      0.01  0.1     0.1    4.894695  0.9099257  3.322348   5       
  500      0.01  0.1     0.5    4.738729  0.9150016  3.230586  15       
  500      0.01  0.1     1.0    4.755846  0.9144164  3.236857  15       
  500      0.01  0.5     0.1    4.834886  0.9112484  3.180803   6       
  500      0.01  0.5     0.5    4.851940  0.9111468  3.223346   6       
  500      0.01  0.5     1.0    4.682904  0.9168975  3.150107  15       
  500      0.01  1.0     0.1    4.751860  0.9144018  3.155249  15       
  500      0.01  1.0     0.5    4.752829  0.9149809  3.153207  15       
  500      0.01  1.0     1.0    4.852260  0.9102658  3.170016   6       
  500      0.10  0.1     0.1    4.894695  0.9099257  3.322348   5       
  500      0.10  0.1     0.5    4.869165  0.9101566  3.286642   6       
  500      0.10  0.1     1.0    4.755846  0.9144164  3.236857  15       
  500      0.10  0.5     0.1    4.769295  0.9142414  3.202799  15       
  500      0.10  0.5     0.5    4.851940  0.9111468  3.223346   6       
  500      0.10  0.5     1.0    4.682904  0.9168975  3.150107  15       
  500      0.10  1.0     0.1    4.855385  0.9092012  3.206600   6       
  500      0.10  1.0     0.5    4.864625  0.9099013  3.202043   6       
  500      0.10  1.0     1.0    4.852260  0.9102658  3.170016   6       
  500      1.00  0.1     0.1    4.894695  0.9099257  3.322348   5       
  500      1.00  0.1     0.5    4.869165  0.9101566  3.286642   6       
  500      1.00  0.1     1.0    4.835181  0.9111145  3.242163   6       
  500      1.00  0.5     0.1    4.769295  0.9142414  3.202799  15       
  500      1.00  0.5     0.5    4.717203  0.9161507  3.186846  15       
  500      1.00  0.5     1.0    4.802161  0.9118293  3.213810   6       
  500      1.00  1.0     0.1    4.751860  0.9144018  3.155249  15       
  500      1.00  1.0     0.5    4.864625  0.9099013  3.202043   6       
  500      1.00  1.0     1.0    4.693967  0.9168326  3.119964  15       

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nrounds = 500, lambda =
 0.5, alpha = 1 and eta = 0.01.

saveRDS(object = GS_XGBoost_Linear_model, file = "Models/GS_XGBoost_Linear_model.rds")
saveRDS(object = GS_XGBoost_Linear_model$finalModel, file = paste0("Models/GS_XGBoost_Linear_model_", class(GS_XGBoost_Linear_model$finalModel)[1],".rds"))

The best model from the grid search has an average cross-validated RMSE of 4.683. This model has 500 iterations and a alpha and lambda of 1 and 0.5, respectively. The step size shrinkage (eta) is 0.01.

The effect of the hyperparameters values tested on the cross-validation statistics can be seen on the charts below.

# library(tidyverse, quietly = T, verbose = F)

RMSE_plot_XGBoost_Linear <- GS_XGBoost_Linear_model$results %>% 
  dplyr::select(nrounds, lambda, alpha, eta, RMSE) %>% 
  gather(key = Parameter, value = Value, -nrounds, -lambda, -alpha, -eta) %>% 
  filter(Parameter == "RMSE") %>% 
  ggplot(mapping = aes(x = nrounds, y = Value, col = factor(alpha), shape = factor(lambda))) +
    geom_line(size = 1) +
    geom_point(size = 2) +
    facet_wrap(~eta) +
    labs(title = "Average Cross-validated RMSE of XGBoost Linear Model",
       subtitle = "Horizontal tiles representative of step size shrinkage",
       x = "# Boosting Iterations",
       y = "RMSE",
       col = "alpha",
       shape = "lambda"
       ) +
  theme_bw() +
  theme(
        # panel.grid = element_blank(),
        legend.title = element_text(size = 10, face="bold"),
        legend.text = element_text(size = 9),
        plot.title = element_text(size=16),
        axis.title=element_text(size=10, face="bold"),
        axis.text.x = element_text(angle = 0)) +
  scale_x_continuous(breaks = unique(GS_XGBoost_Linear_model$results$nrounds)) +
  scale_color_brewer(type = "qual", palette = "Set1") 

MAE_plot_XGBoost_Linear <- GS_XGBoost_Linear_model$results %>% 
  dplyr::select(nrounds, lambda, alpha, eta, MAE) %>% 
  gather(key = Parameter, value = Value, -nrounds, -lambda, -alpha, -eta) %>% 
  filter(Parameter == "MAE") %>% 
  ggplot(mapping = aes(x = nrounds, y = Value, col = factor(alpha), shape = factor(lambda))) +
    geom_line(size = 1) +
    geom_point(size = 2) +
    facet_wrap(~eta) +
    labs(title = "Average Cross-validated MAE of XGBoost Linear Model",
       subtitle = "Horizontal tiles representative of step size shrinkage",
       x = "# Boosting Iterations",
       y = "MAE",
       col = "alpha",
       shape = "lambda"
       ) +
  theme_bw() +
  theme(
        # panel.grid = element_blank(),
        legend.title = element_text(size = 10, face="bold"),
        legend.text = element_text(size = 9),
        plot.title = element_text(size=16),
        axis.title=element_text(size=10, face="bold"),
        axis.text.x = element_text(angle = 0)) +
  scale_x_continuous(breaks = unique(GS_XGBoost_Linear_model$results$nrounds)) +
  scale_color_brewer(type = "qual", palette = "Set1") 

Rsquared_plot_XGBoost_Linear <- GS_XGBoost_Linear_model$results %>% 
  dplyr::select(nrounds, lambda, alpha, eta, Rsquared) %>% 
  gather(key = Parameter, value = Value, -nrounds, -lambda, -alpha, -eta) %>% 
  filter(Parameter == "Rsquared") %>% 
  ggplot(mapping = aes(x = nrounds, y = Value, col = factor(alpha), shape = factor(lambda))) +
    geom_line(size = 1) +
    geom_point(size = 2) +
    facet_wrap(~eta) +
    labs(title = "Average Cross-validated R-squared of XGBoost Linear Model",
       subtitle = "Horizontal tiles representative of step size shrinkage",
       x = "# Boosting Iterations",
       y = "R-squared",
       col = "alpha",
       shape = "lambda"
       ) +
  theme_bw() +
  theme(
        # panel.grid = element_blank(),
        legend.title = element_text(size = 10, face="bold"),
        legend.text = element_text(size = 9),
        plot.title = element_text(size=16),
        axis.title=element_text(size=10, face="bold"),
        axis.text.x = element_text(angle = 0)) +
  scale_x_continuous(breaks = unique(GS_XGBoost_Linear_model$results$nrounds)) +
  # scale_y_continuous(limits = c(0,1), expand = c(0,0), breaks = seq(0,1,0.1)) +
  scale_color_brewer(type = "qual", palette = "Set1") 

grid.arrange(grobs = list(RMSE_plot_XGBoost_Linear, MAE_plot_XGBoost_Linear, Rsquared_plot_XGBoost_Linear), ncol = 1, nrow = 3)

5.3.2 Random Search

n_combinations <- nrow(XGBoost_Linear_grid)

Train XGBoost models with random search. The same number of different combinations of hyperparameters used in the grid search (108) will be tested. Training will be done using adaptive resampling with a minimum resampling of 5.

RS_T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1); 
RS_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set,
                          method = "xgbLinear",
                          trControl = adapt_control_random,
                          verbose = FALSE, 
                          silent = 1,
                          tuneLength = n_combinations
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ(); #force R to return to single threaded processing
RS_T1 <- Sys.time()
RS_T1-RS_T0

Time difference of 1.647595 mins

RS_XGBoost_Linear_model

eXtreme Gradient Boosting 

No pre-processing
Resampling: Adaptively Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results across tuning parameters:

  lambda         alpha          nrounds  eta         RMSE       Rsquared 
  0.00001166680  0.43961853855   92      1.45844731   4.865185  0.9097359
  0.00001308145  0.00317837300   28      1.31079333   4.824031  0.9124347
  0.00001970933  0.08208365473   82      1.42747512   4.889797  0.9097199
  0.00002036720  0.00003280358   52      2.43639457   4.842262  0.9114066
  0.00002256292  0.00170285087    7      0.53158361   6.339241  0.8938471
  0.00002637756  0.01665631074    5      2.77568981   8.624420  0.8834739
  0.00003142902  0.07941900632   11      1.47995619   5.223782  0.9019127
  0.00003465119  0.00183169921   61      0.99983395   4.833112  0.9116754
  0.00004059379  0.00897151513   47      1.07221724   4.852947  0.9108773
  0.00004244001  0.00002428409   46      0.51540025   4.841662  0.9114240
  0.00004434674  0.12392914171   41      2.80713740   4.794418  0.9135740
  0.00005206213  0.15941719400  100      2.37872707   4.802243  0.9127665
  0.00007634556  0.01586738011    5      1.09109043   8.624447  0.8834739
  0.00008532730  0.01017902283   95      0.24671769   4.815530  0.9122510
  0.00010195526  0.00060762350   27      0.80363439   4.795173  0.9132362
  0.00010314811  0.43302247997   10      2.82092614   5.392007  0.8986113
  0.00010712057  0.00170381228   26      0.23145501   4.838417  0.9119208
  0.00011500391  0.00951831299   56      0.80788134   4.842006  0.9113067
  0.00011642351  0.54932804700   32      2.63829989   4.816356  0.9128500
  0.00015781456  0.00038888252   18      2.11757702   4.887056  0.9104146
  0.00016748904  0.29886187358   17      2.59442265   4.875649  0.9115530
  0.00016882726  0.00003562895   38      2.24290519   4.950446  0.9079688
  0.00019502213  0.00090530810   53      2.88993605   4.942667  0.9082245
  0.00021682200  0.00001505577   31      1.10756839   4.782407  0.9136753
  0.00022454362  0.64128293345   97      0.25593302   4.794100  0.9137367
  0.00029377637  0.00048761456   19      0.72243479   4.885247  0.9103315
  0.00038138055  0.25298523453   39      1.28914010   4.772605  0.9147588
  0.00042340966  0.00088008678   93      2.34305262   4.817152  0.9121547
  0.00045916986  0.15961465739   78      2.49501515   4.824722  0.9119901
  0.00046652471  0.00009042751   23      0.93531712   4.812826  0.9127775
  0.00049586641  0.01062960760   33      0.46455305   4.792935  0.9134798
  0.00050320504  0.00426715698   22      1.25496491   4.836277  0.9120769
  0.00054127435  0.00029420549   14      1.97946308   4.999371  0.9075781
  0.00058342678  0.03887553510   36      1.21729075   4.770496  0.9140492
  0.00072547007  0.00985836369   96      2.86064519   4.831944  0.9116684
  0.00079465002  0.00359624208   40      1.62639127   4.863138  0.9096078
  0.00081646916  0.00978277605   15      0.19140740   4.966165  0.9080985
  0.00083275757  0.00300585405   86      0.47014240   4.828021  0.9117396
  0.00085225678  0.01636731576   58      2.17621582   4.846151  0.9112129
  0.00089114365  0.00008830151   33      0.55520989   4.798575  0.9133414
  0.00099993517  0.03620089757    6      1.11107120   7.200007  0.8886407
  0.00108180986  0.00047581427   20      2.93566219   4.878923  0.9105179
  0.00112310513  0.00391634270   72      1.54419796   4.770870  0.9142264
  0.00113860254  0.05413664628   11      0.83054953   5.198039  0.9027402
  0.00149038141  0.52314742436   10      2.83109116   5.399193  0.8984438
  0.00155054915  0.00026543147   63      1.00056242   4.860818  0.9095714
  0.00188961039  0.00008050848   85      1.30415292   4.766447  0.9144526
  0.00197391584  0.01419288267   48      2.39050825   4.763246  0.9144462
  0.00240855310  0.27055447183   23      1.05472023   4.893746  0.9102036
  0.00243304603  0.00008022908   79      1.68653036   4.843901  0.9113257
  0.00244398261  0.00024478210    7      0.66544126   6.343672  0.8916655
  0.00245804694  0.00034790579  100      0.21738190   4.840619  0.9114964
  0.00247016658  0.00079885770   89      0.44196870   4.821344  0.9121330
  0.00257276772  0.84231807350   43      2.94305425   4.888188  0.9106423
  0.00293566527  0.02593823337   47      2.54364645   4.868477  0.9093761
  0.00307961347  0.00007365660   14      2.91048654   5.040747  0.9058021
  0.00391896438  0.00116311069   17      1.69279153   4.984745  0.9071137
  0.00445243821  0.00044043984   53      2.09647361   4.891125  0.9096062
  0.00582346619  0.00610657498   94      2.10833623   4.831512  0.9117647
  0.00731588410  0.76006979586   96      2.43691528   4.887755  0.9091146
  0.00995013853  0.00345309965   14      0.84865187   4.992648  0.9078624
  0.01058440505  0.00004423686   50      0.42949976   4.860176  0.9098220
  0.01398202967  0.03782947642   93      1.19805332   4.769999  0.9137204
  0.01470504379  0.06548072022   67      2.51515220   4.789739  0.9132805
  0.01627201338  0.00021754096    7      1.48396358   6.488043  0.8884524
  0.01719099322  0.21024454922   50      2.51019681   4.867108  0.9108439
  0.01796190253  0.19083512932    3      0.34052312  14.541466  0.8545979
  0.01812879051  0.00748791648   43      1.40775283   4.883606  0.9103741
  0.01877671612  0.00019135421    9      1.03346218   5.638841  0.8943194
  0.02013673072  0.00001162486   52      0.88081665   4.864664  0.9109380
  0.02041857276  0.00002084694   30      1.96848695   4.896350  0.9098602
  0.02199549251  0.00019535730   94      1.41579219   4.821363  0.9119901
  0.02723417559  0.91031793705   42      1.32777740   4.845254  0.9121017
  0.02908424420  0.00002378914   17      0.69139924   4.938593  0.9089842
  0.03594232049  0.00026071289   50      2.98124768   4.869771  0.9097557
  0.03652358451  0.00547309435   12      2.17107225   5.204474  0.9021391
  0.03873401722  0.05944354791   38      2.96849950   4.828483  0.9129120
  0.04154844077  0.00007509895   49      2.51224971   4.899110  0.9100942
  0.04587422004  0.00444787951   71      2.78544968   4.889278  0.9104265
  0.06101487700  0.00003194226   68      0.03344853   4.776638  0.9145554
  0.06785054700  0.30007314495   38      0.43088083   4.978050  0.9070961
  0.07066544530  0.00264087364   35      1.74661581   4.866144  0.9114304
  0.07701999643  0.06173269856   51      2.86313441   4.862852  0.9110164
  0.07713049288  0.00010907509   69      1.15291168   4.928169  0.9092637
  0.07844646668  0.00383693270   15      2.08503674   5.195738  0.8999984
  0.08216063959  0.01183654241   38      0.21346207   4.898698  0.9096936
  0.08846697113  0.00184355979   32      1.39188714   4.905081  0.9106511
  0.09358350430  0.04431473094   48      0.32730289   4.907084  0.9101427
  0.09694917761  0.00022118362   20      2.46837992   5.024453  0.9062964
  0.11333165056  0.00651500388   27      1.98903290   4.966606  0.9074648
  0.12727159024  0.00003349178   25      1.76110542   4.937835  0.9088706
  0.13704607222  0.00015643841   77      2.65937625   4.865828  0.9107673
  0.15747142260  0.32764840451   16      1.04490616   5.013988  0.9068075
  0.20232399144  0.00011574376   76      1.26064768   4.779859  0.9139700
  0.20974777047  0.24282182291   99      1.97663283   4.815528  0.9131152
  0.22307670889  0.00637534571   42      2.35363869   4.799614  0.9141553
  0.23801627284  0.05074499845   14      1.52110681   5.137590  0.9036836
  0.24062794573  0.00012394426   41      0.92415735   4.929298  0.9070122
  0.28906244995  0.00328622601   14      0.44353712   5.162869  0.9019632
  0.31041909243  0.00143670781   17      2.28645459   4.981053  0.9083225
  0.34756733184  0.04559976026   34      2.34654635   4.894994  0.9105269
  0.36675802155  0.04155561683   26      0.69648345   4.862457  0.9104208
  0.41722694621  0.01769690431   86      1.13627832   4.849221  0.9124058
  0.47154826963  0.00013908834   69      2.02849215   4.748181  0.9149854
  0.52890338074  0.00005508810   33      2.95893477   4.824172  0.9140048
  0.63546258646  0.04192706515   69      2.69354548   4.785829  0.9143780
  0.91102537606  0.00185984797   64      0.52935611   4.716984  0.9158768
  0.91922177981  0.00003167410   63      2.79771578   4.904099  0.9079109
  MAE        Resamples
   3.265145   6       
   3.315687   5       
   3.278550  14       
   3.252682  15       
   4.703495   5       
   6.699729   5       
   3.773791   5       
   3.239584  15       
   3.286941  15       
   3.254489  15       
   3.253794   5       
   3.248363  15       
   6.699745   5       
   3.231117  15       
   3.304393   5       
   3.896897   5       
   3.351050   5       
   3.279953  15       
   3.336945   5       
   3.415774   5       
   3.482950   5       
   3.355507   8       
   3.324773   8       
   3.282567   5       
   3.230458   5       
   3.398585   5       
   3.318142   5       
   3.228920  15       
   3.276581  15       
   3.321647   5       
   3.298679   5       
   3.367502   5       
   3.504715   5       
   3.254937   5       
   3.248753  15       
   3.327192   6       
   3.491687   5       
   3.257345  15       
   3.273596  15       
   3.295706   5       
   5.452193   5       
   3.404261   5       
   3.260008   5       
   3.735350   5       
   3.891521   5       
   3.271073   6       
   3.223376   5       
   3.246935   5       
   3.381842   5       
   3.249781  15       
   4.729057   5       
   3.246989  15       
   3.254889  15       
   3.307073   5       
   3.315395   6       
   3.591745   5       
   3.522804   5       
   3.316614  14       
   3.263030  15       
   3.285789   6       
   3.541355   5       
   3.328930   6       
   3.241932   5       
   3.246862   5       
   4.813778   5       
   3.357272   5       
  12.150879   5       
   3.344584   5       
   4.110250   5       
   3.312951   5       
   3.405524   5       
   3.245586   5       
   3.338079   5       
   3.486762   5       
   3.304539   6       
   3.739179   5       
   3.320758   5       
   3.367888   5       
   3.329203   5       
   3.251263   5       
   3.413008   5       
   3.289877   5       
   3.303001   5       
   3.313653   5       
   3.669615   5       
   3.394568   5       
   3.432132   5       
   3.349682   5       
   3.530443   5       
   3.427504   5       
   3.442305   5       
   3.303625   5       
   3.540174   5       
   3.209377   5       
   3.250812   5       
   3.303090   5       
   3.689163   5       
   3.338002   6       
   3.672631   5       
   3.511943   5       
   3.334173   5       
   3.377023   6       
   3.258499   5       
   3.255889  15       
   3.275509   5       
   3.260058   5       
   3.210694  15       
   3.303863   6       

RMSE was used to select the optimal model using the smallest value.
The final values used for the model were nrounds = 64, lambda =
 0.9110254, alpha = 0.001859848 and eta = 0.5293561.

saveRDS(object = RS_XGBoost_Linear_model, file = "Models/RS_XGBoost_Linear_model.rds")
saveRDS(object = RS_XGBoost_Linear_model$finalModel, file = paste0("Models/RS_XGBoost_Linear_model_", class(RS_XGBoost_Linear_model$finalModel)[1],".rds"))

The best model from the grid search has an average cross-validated RMSE of 4.717. This model has 64 iterations and a alpha and lambda of 0.0018598 and 0.9110254, respectively. The step size shrinkage (eta) is 0.5293561.

5.3.3 Genetic Algorithm

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The GA package will be used to optimize the hyperparameters of the XGBoost model using a genetic algorithm. Prior to run the search method, an objective function needs to be created. Similarly to the grid and random search, the average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x1, x2, x3, x4, data, train_settings) {
  
suppressWarnings(
  XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = data,
                          method = "xgbLinear",
                          trControl = train_settings,
                          verbose = FALSE, 
                          silent = 1,
                          tuneGrid = expand.grid(
                                                nrounds = round(x1), # number of boosting iterations
                                                eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                ) 
                          )
)

    return(-XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)

The population size was set to 10 with a maximum of 10 round of iterations. Minimum and maximum values for each parameters were defined above.

set.seed(1)
n_cores <- detectCores()-1

GA_T0 <- Sys.time()
# Run genetic algorithm
GA_model_XGBoost_Linear <- GA::ga(type = "real-valued", 
                                fitness = function(x) eval_function_XGBoost_Linear(x[1],x[2],x[3],x[4], 
                                                                                   data = training_set, 
                                                                                   train_settings = train_control), 
                                lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]), # minimum values
                                upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), # maximum values
                                popSize = pop_size, # population size
                                maxiter = max_iter, # number of iterations
                                pmutation = 0.5, # probability of mutation
                                elitism = 0.3, # percentage of elitism (fraction of best current solutions used on next round)
                                # suggestions = starting_point,
                                parallel = n_cores, 
                                optim = F, 
                                keepBest = T,
                                seed = 1
               )
GA_T1 <- Sys.time()
GA_T1-GA_T0 

Time difference of 11.49358 mins

# Print summary of search method
summary(GA_model_XGBoost_Linear)

-- Genetic Algorithm ------------------- 

GA settings: 
Type                  =  real-valued 
Population size       =  10 
Number of generations =  10 
Elitism               =  0.3 
Crossover probability =  0.8 
Mutation probability  =  0.5 
Search domain = 
        x1 x2 x3 x4
lower   10 -5 -3 -3
upper 1000  3  1  1

GA results: 
Iterations             = 10 
Fitness function value = -4.344492 
Solutions = 
           x1        x2         x3       x4
[1,] 419.5652 -3.814308 -0.2802111 0.591394
[2,] 419.5652 -2.778735 -0.2802111 0.591394

GA::plot.ga(GA_model_XGBoost_Linear, main = "Genetic Algorithm: RMSE values at each iteration")

The search was completed a found the optimal RMSE value to be 4.344.

Once the optimal values are found for each hyperparameter, the final model is trained with these values.

# Grid of optimal hyperparameter values
GA_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(GA_model_XGBoost_Linear@solution[1]),  # learning rate, low value means model is more robust to overfitting
                                  eta = 10^GA_model_XGBoost_Linear@solution[2], # number of boosting iterations
                                  alpha = 10^GA_model_XGBoost_Linear@solution[3], # L2 Regularization (Ridge Regression)
                                  lambda = 10^GA_model_XGBoost_Linear@solution[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
GA_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          # tuneLength = 1
                          tuneGrid = GA_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0

Time difference of 16.56507 secs

GA_XGBoost_Linear_model

eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.832614  0.9115271  3.241825

Tuning parameter 'nrounds' was held constant at a value of 420
 0.001664427
Tuning parameter 'alpha' was held constant at a value
 of 0.0001533531
Tuning parameter 'eta' was held constant at a value of Inf

saveRDS(object = GA_XGBoost_Linear_model, file = paste0("Models/GA_XGBoost_Linear_model_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = GA_XGBoost_Linear_model$finalModel, file = paste0("Models/GA_XGBoost_Linear_model_", class(GA_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the genetic algorithm has an average cross-validated RMSE of 4.833.

This model has 420 iterations and a alpha and lambda of 0.0001534 and 0.0016644, respectively. The step size shrinkage (eta) is .

5.3.4 Differential Evolution

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The DEoptim package will be used to optimize the hyperparameters of the XGBoost model using differential evolution. Prior to run the search method, a custom objective function needs to be created. The average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x, data, train_settings) {
  
  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]
  
suppressWarnings(
  XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = data,
                          method = "xgbLinear",
                          trControl = train_settings,
                          verbose = FALSE, 
                          silent = 1,
                          tuneGrid = expand.grid(
                                                nrounds = round(x1), # number of boosting iterations
                                                eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                ) 
                          )
)

    return(XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)

set.seed(1)
n_cores <- detectCores()-1

DE_T0 <- Sys.time()
# Run differential evolution algorithm
DE_model_XGBoost_Linear <- DEoptim::DEoptim(
  fn = eval_function_XGBoost_Linear, 
  lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]),
  upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), 
  control = DEoptim.control(
                            NP = pop_size, # population size
                            itermax = max_iter, # maximum number of iterations
                            CR = 0.5, # probability of crossover
                            storepopfreq = 1, # store every population
                            parallelType = 1 # run parallel processing
                            ),
  data = training_set,
  train_settings = train_control
  )

Iteration: 1 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 2 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 3 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 4 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 5 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 6 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 7 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 8 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 9 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 10 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738

DE_T1 <- Sys.time()
DE_T1-DE_T0

Time difference of 18.51168 mins

# Print search results
summary(DE_model_XGBoost_Linear)


***** summary of DEoptim object ***** 
best member   :  577.1248 -0.69281 -2.81211 0.84874 
best value    :  4.32335 
after         :  10 generations 
fn evaluated  :  22 times 
*************************************

DE_solutions <- DE_model_XGBoost_Linear$optim$bestmem

# Plot results
ggplot(mapping = aes(x = 1:length(DE_model_XGBoost_Linear$member$bestvalit), y = DE_model_XGBoost_Linear$member$bestvalit)) +
    geom_line(col = "grey50") + 
    geom_point(col = "grey50") +
    theme_bw() +
    theme(aspect.ratio = 0.9) +
    labs(x = "Iteration", y = "RMSE", title = "Best RMSE value at each iteration", subtitle = "Results using Differential Evolution") +
    scale_x_continuous(breaks = 1:DE_model_XGBoost_Linear$optim$iter, minor_breaks = NULL)

The search was completed a found the optimal RMSE value to be 4.323 after 10 iterations and 22 function evaluations.

The final model is trained with the optimal values are found for each hyperparameter.

# Grid of optimal hyperparameter values
DE_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(DE_solutions[1]),  # learning rate, low value means model is more robust to overfitting
                                  eta = 10^DE_solutions[2], # number of boosting iterations
                                  alpha = 10^DE_solutions[3], # L2 Regularization (Ridge Regression)
                                  lambda = 10^DE_solutions[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
DE_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          # tuneLength = 1
                          tuneGrid = DE_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0

Time difference of 26.43967 secs

DE_XGBoost_Linear_model

eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.589265  0.9202668  3.089244

Tuning parameter 'nrounds' was held constant at a value of 577
 0.001541313
Tuning parameter 'eta' was held constant at a value
 of 0.2028573

saveRDS(object = DE_XGBoost_Linear_model, file = paste0("Models/DE_XGBoost_Linear_model_PSO_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = DE_XGBoost_Linear_model$finalModel, file = paste0("Models/DE_XGBoost_Linear_model_PSO_", class(DE_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the differential evolution algorithm has an average cross-validated RMSE of 4.589.

This model has 577 iterations and a alpha and lambda of 0.0015413 and 7.0589095, respectively. The step size shrinkage (eta) is 0.2028573.

5.3.5 Particle Swarm Optimization

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The PSO package will be used to optimize the hyperparameters of the XGBoost model using particle swarm optimization. Prior to run the search method, a custom objective function needs to be created. The average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x, data, train_settings) {
  
  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]
  
suppressWarnings(
  # Create dataframe with proportion of each solid component
  XGBoost_Linear_model <- caret::train(Strength ~., 
                                      data = data,
                                      method = "xgbLinear",
                                      trControl = train_settings,
                                      verbose = FALSE, 
                                      silent = 1,
                                      tuneGrid = expand.grid(
                                                            nrounds = round(x1), # number of boosting iterations
                                                            eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                            alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                            lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                            ) 
                                      )
)

    return(XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)

The psoptim() function from the pso package is used to run the search algorithm. The SPSO2011 method will be used for this search.

set.seed(1)
n_cores <- detectCores()-1

PSO_T0 <- Sys.time()
# Run search algorithm
PSO_model_XGBoost_Linear <- pso::psoptim(
  par = rep(NA, 4),
  fn = eval_function_XGBoost_Linear, 
  lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]),
  upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), 
  control = list(
                trace = 1, #  produce tracing information on the progress of the optimization
                maxit = max_iter, # maximum number of iterations
                REPORT = 1, #  frequency for reports
                trace.stats = T,
                s = pop_size, # Swarm Size,
                maxit.stagnate = round(0.75*max_iter), # maximum number of iterations without improvement
                vectorize = T,
                type = "SPSO2011" # method used
                ),
  data = training_set,
  train_settings = train_control
  )
PSO_T1 <- Sys.time()
PSO_T1-PSO_T0

Time difference of 24.26905 mins

PSO_summary <- data.frame(
                          Iteration = PSO_model_XGBoost_Linear$stats$it,
                          Mean = PSO_model_XGBoost_Linear$stats$f %>% sapply(FUN = mean),
                          Median = PSO_model_XGBoost_Linear$stats$f %>% sapply(FUN = median),
                          Best = PSO_model_XGBoost_Linear$stats$error %>% sapply(FUN = min)
                          )
PSO_summary %>% 
  gather(key = "Parameter", value = "Value", - Iteration) %>% 
  ggplot(mapping = aes(x = Iteration, y = Value, col = Parameter)) +
    geom_line() +
    geom_point() +
    theme_bw() +
    theme(aspect.ratio = 0.9) +
    scale_x_continuous(breaks = PSO_model_XGBoost_Linear$stats$it, minor_breaks = NULL) +
    labs(x = "Iteration", y = "RMSE", title = "RMSE values at each iteration", subtitle = "Results using Particle Swarm Optimization") +
    scale_color_brewer(type = "qual", palette = "Set1")

The search was completed a found the optimal RMSE value to be 4.362 after 10 iterations and 100 function evaluations.

The final model is trained with the optimal values are found for each hyperparameter.

# Grid of optimal hyperparameter values
PSO_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(PSO_model_XGBoost_Linear$par[1]),  # number of boosting iterations
                                  eta = PSO_model_XGBoost_Linear$par[2], # learning rate, low value means model is more robust to overfitting
                                  alpha = PSO_model_XGBoost_Linear$par[3], # L2 Regularization (Ridge Regression)
                                  lambda = PSO_model_XGBoost_Linear$par[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
PSO_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          tuneGrid = PSO_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0

Time difference of 13.50072 secs

PSO_XGBoost_Linear_model

eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.693967  0.9168326  3.119964

Tuning parameter 'nrounds' was held constant at a value of 740

Tuning parameter 'alpha' was held constant at a value of 1

Tuning parameter 'eta' was held constant at a value of 3

saveRDS(object = PSO_XGBoost_Linear_model, file = paste0("Models/PSO_XGBoost_Linear_model_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = PSO_XGBoost_Linear_model$finalModel, file = paste0("Models/PSO_XGBoost_Linear_model_", class(PSO_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the particle swarm optimization algorithm has an average cross-validated RMSE of 4.694.

This model has 740 iterations and a alpha and lambda of 1 and 1, respectively. The step size shrinkage (eta) is 3.

6 Model Performance

6.1 Performance on Training Set

The training set statistics and hyperparameters values for each search method tested are summarised on the table below.

# Create summary table
Summary_Table_Training <- bind_rows(
  GS_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  RS_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  GA_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  DE_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  PSO_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5))

Summary_Table_Training <- Summary_Table_Training %>% 
  add_column(Method = c("Grid Search", "Random Search", "Genetic Algorithm", "Differential Evolution", "Particle Swarm Optimization"), .before = 1) %>% 
  add_column(`Processing Time` = round(c(GS_T1-GS_T0, RS_T1-RS_T0, GA_T1-GA_T0, DE_T1-GS_T0, PSO_T1-PSO_T0),0))

# Print table
Summary_Table_Training %>% 
  kable(align = "c", caption = "Training Set Statistics and Hyperparameter Values of XGBoost Models.") %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "center") %>%
  kableExtra::footnote(general = paste0("Note:\nSummary statistics obtained using ", CV_folds, "-fold cross validation repeated ", CV_repeats, " times.","\nGrid and Random Search  performed with adaptive resampling."), general_title = "\n ")

Training Set Statistics and Hyperparameter Values of XGBoost Models.
Method	RMSE	MAE	Rsquared	nrounds	eta	lambda	alpha	Processing Time
Grid Search	4.68290	3.15011	0.91690	500	0.01000	0.50000	1.00000	5 mins
Random Search	4.71698	3.21069	0.91588	64	0.52936	0.91103	0.00186	2 mins
Genetic Algorithm	4.83261	3.24182	0.91153	420	Inf	0.00166	0.00015	11 mins
Differential Evolution	4.58926	3.08924	0.92027	577	0.20286	7.05891	0.00154	37 mins
Particle Swarm Optimization	4.69397	3.11996	0.91683	740	3.00000	1.00000	1.00000	24 mins

Note: Summary statistics obtained using 5-fold cross validation repeated 3 times. Grid and Random Search performed with adaptive resampling.

Although there are some considerable differences between the values of hyperparameters obtained from each search method, the performance metrics are relatively similar.

Differential Evolution obtained the lowest RMSE value (4.5893).

6.2 Performance on Test Set

# Custom functions to plot observed, predicted and residual values for each method
library(ggplot2, quietly = T, verbose = F)

# Function to plot observed vs predicted values
predicted_observed_plot <- function(predicted_val, observed_val, residual_val, model_name = "", R_squared, ...) {
  
  plot <- ggplot(mapping = aes(x = predicted_val, y = observed_val, col = abs(residual_val))) +
  geom_point(alpha = 0.9, size = 2) +
  geom_abline(intercept = 0, slope = 1) +
    # facet_wrap(~) +
    labs(title = paste0(model_name, "\nPredicted vs Observed: Test Set"),
         subtitle = paste0("R-squared: ", R_squared),
         x = "Predicted",
         y = "Observed",
         col = "Absolute Deviation") +
  theme_bw() +
  theme(aspect.ratio = 0.9, panel.grid.minor.x = element_blank(), legend.title = element_text(size = 10, face="bold"), legend.text = element_text(size = 9), plot.title = element_text(size=12, face="bold"), axis.title=element_text(size=10, face="bold"), axis.text.x = element_text(angle = 0), legend.position = "none") +
  # scale_x_continuous(expand = c(0,0)) +
  # scale_y_continuous(expand = c(0,0)) + 
  coord_equal() + scale_color_viridis_c(direction = -1)

  return (plot)
}

# Function to plot residuals
residuals_plot <- function(predicted_val, residual_val, model_name = "", MAE, RMSE, ...) {

  plot <- ggplot(mapping = aes(x = predicted_val, y = residual_val, col = abs(residual_val))) +
  geom_point(alpha = 0.9, size = 2) +
  geom_abline(intercept = 0, slope = 0) +
    # facet_wrap(~) +
    labs(
       title = paste0(model_name, "\nResiduals: Test Set"),
       subtitle = paste0("RMSE: ", RMSE, ", MAE: ", round(MAE, 3)),
       x = "Predicted",
       y = "Residual",
       col = "Absolute Deviation"
       ) +
  theme_bw() +
  theme(aspect.ratio = 0.9, panel.grid.minor.x = element_blank(), legend.title = element_text(size = 10, face="bold"), legend.text = element_text(size = 9), plot.title = element_text(size=12, face="bold"), axis.title=element_text(size=10, face="bold"), axis.text.x = element_text(angle = 0), legend.position = "none") +
  # scale_x_continuous(expand = c(0,0)) +
  # scale_y_continuous(expand = c(0,0)) +
  coord_equal() + scale_color_viridis_c(direction = -1)

  return (plot)
}

### Grid Search (GS)

# Make predictions on test set
test_set$XGBoost_Linear_GS <- predict(GS_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_GS_residual <- test_set$Strength - test_set$XGBoost_Linear_GS

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_GS, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_GS, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_GS, obs = test_set$Strength), 6)

GS_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_GS_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_GS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GS_residual, R_squared = R_squared, model_name = "Grid Search")
XGBoost_Linear_GS_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_GS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GS_residual, MAE = MAE, RMSE = RMSE, model_name = "Grid Search")

### Random Search

# Make predictions on test set
test_set$XGBoost_Linear_RS <- predict(RS_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_RS_residual <- test_set$Strength - test_set$XGBoost_Linear_RS

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_RS, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_RS, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_RS, obs = test_set$Strength), 6)

RS_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_RS_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_RS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_RS_residual, R_squared = R_squared, model_name = "Random Seach")
XGBoost_Linear_RS_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_RS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_RS_residual, MAE = MAE, RMSE = RMSE, model_name = "Random Seach")

### Genetic Algorithm (GA)

# Make predictions on test set
test_set$XGBoost_Linear_GA <- predict(GA_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_GA_residual <- test_set$Strength - test_set$XGBoost_Linear_GA

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_GA, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_GA, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_GA, obs = test_set$Strength), 6)

GA_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_GA_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_GA, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GA_residual, R_squared = R_squared, model_name = "Genetic Algorithm")
XGBoost_Linear_GA_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_GA, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GA_residual, MAE = MAE, RMSE = RMSE, model_name = "Genetic Algorithm")

### Differential evolution (DE)

# Make predictions on test set
test_set$XGBoost_Linear_DE <- predict(DE_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_DE_residual <- test_set$Strength - test_set$XGBoost_Linear_DE

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_DE, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_DE, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_DE, obs = test_set$Strength), 6)

DE_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_DE_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_DE, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_DE_residual, R_squared = R_squared, model_name = "Differential Evolution")
XGBoost_Linear_DE_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_DE, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_DE_residual, MAE = MAE, RMSE = RMSE, model_name = "Differential Evolution")

### Particle Swarm Optimization (PSO)

# Make predictions on test set
test_set$XGBoost_Linear_PSO <- predict(PSO_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_PSO_residual <- test_set$Strength - test_set$XGBoost_Linear_PSO

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_PSO, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_PSO, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_PSO, obs = test_set$Strength), 6)

PSO_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_PSO_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_PSO, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_PSO_residual, R_squared = R_squared, model_name = "Particle Swarm Optimisation")
XGBoost_Linear_PSO_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_PSO, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_PSO_residual, MAE = MAE, RMSE = RMSE, model_name = "Particle Swarm Optimisation")

# Create summary table
Summary_Table_Test <- rbind(
  GS_Test_Set_Statistics,
  RS_Test_Set_Statistics,
  GA_Test_Set_Statistics,
  DE_Test_Set_Statistics,
  PSO_Test_Set_Statistics, deparse.level = 0 
  ) %>% data.frame()

colnames(Summary_Table_Test) <- c("RMSE", "MAE", "R-squared")
Summary_Table_Test <- Summary_Table_Test %>% add_column(Method = c("Grid Search", "Random Search", "Genetic Algorithm", "Differential Evolution", "Particle Swarm Optimization"), .before = 1)

# Print table
Summary_Table_Test %>% 
  kable(align = "c", caption = "Test Set Statistics of XGBoost Models.") %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "center") %>%
  kableExtra::footnote(general = paste0(""), general_title = "\n ")

Test Set Statistics of XGBoost Models.
Method	RMSE	MAE	R-squared
Grid Search	5.38333	3.03600	0.9432
Random Search	5.61246	3.21112	0.9385
Genetic Algorithm	5.15301	3.08178	0.9483
Differential Evolution	4.93595	2.80724	0.9529
Particle Swarm Optimization	5.74515	3.07984	0.9354

Based on test set statistics, the Differential Evolution obtained the lowest RMSE value (4.936).

g <- gridExtra::grid.arrange(XGBoost_Linear_GS_pred_obs, XGBoost_Linear_GS_residuals, 
                             XGBoost_Linear_RS_pred_obs, XGBoost_Linear_RS_residuals, 
                             XGBoost_Linear_GA_pred_obs, XGBoost_Linear_GA_residuals, 
                             XGBoost_Linear_DE_pred_obs, XGBoost_Linear_DE_residuals, 
                             XGBoost_Linear_PSO_pred_obs, XGBoost_Linear_PSO_residuals, 
                             ncol = 2)

7 Summary

The use of population-based search methods such as genetic algorithms, differential evolution and particle swarm optimation show some potential for finding the optimal values of hyperparameters of regression models.

The use of Differential Evolution resulted on the lowest test set RMSE value of all the methods tested.

Due to the high computational demands, the use of these methods may be more suitable for models with several hyperparameters used on relatively small datasets and/or models that are easily trained. Performing a grid search across a wide range of values for beforehand may also help to narrow the limits of the constraints and make the overall process more efficient.

Search Methods for Hyperparameter Tuning

Jean Dos Santos