• 1 Objectives
  • 2 Install and Load Packages
  • 3 Importing the Data
  • 4 Exploratory Data Analyisis
    • 4.1 Correlations
    • 4.2 Visualizing Feature relationships with a scatterplot matrix
  • 5 Modelling
    • 5.1 Create training and test sets
    • 5.2 Training Parameters
    • 5.3 Model Training
      • 5.3.1 Grid Search
      • 5.3.2 Random Search
      • 5.3.3 Genetic Algorithm
      • 5.3.4 Differential Evolution
      • 5.3.5 Particle Swarm Optimization
  • 6 Model Performance
    • 6.1 Performance on Training Set
    • 6.2 Performance on Test Set
  • 7 Summary

1 Objectives

The objective of this document is to assess the use of various search methods in finding the optimal values of hyperparameters of a machine learning model. The population-based search methods to be tested are genetic algorithms (GA), differential evolution (DE) and particle swarm optimization (PSO). Grid and random search will also be performed and used as benchmarks.

XGBoost with generalized linear models as learners will be used to predict the compressive strength of concrete based on its composition. The compressive strength of concrete is determined by its age and composition.

The dataset used comes from the research paper Modeling of strength of high performance concrete using artificial neural networks by I-Cheng Yeh published in Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). The dataset can be downloaded through the UCI Machine learning Repository.

The dataset contains 1030 examples and the following features:

  • Input Variable: Cement (kg in a m3 mixture)
  • Input Variable: Blast Furnace Slag (kg in a m3 mixture)
  • Input Variable: Fly Ash (kg in a m3 mixture)
  • Input Variable: Water (kg in a m3 mixture)
  • Input Variable: Superplasticizer (kg in a m3 mixture)
  • Input Variable: Coarse Aggregate (kg in a m3 mixture)
  • Input Variable: Fine Aggregate (kg in a m3 mixture)
  • Input Variable: Age (days)
  • Output Variable: Concrete compressive strength (MPa)

2 Install and Load Packages

The pacman package is used to install and load all necessary packages.

# install.packages("pacman", verbose = F, quiet = T)
pacman::p_load(caret, tidyverse, readr, readxl, parallel, doParallel, gridExtra, plyr, pso, GA, DEoptim, GGally, xgboost, broom, knitr, kableExtra, tictoc, install = T)

3 Importing the Data

The data was downloaded from the UCI Machine Learning Data Repository:

# Load library
# download.file(url = "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls", destfile = "Concrete_Data.xls", method = "curl", quiet = TRUE)

# Import Data
concrete_data <- read_xls(path = "Concrete_Data.xls", sheet = 1)

# Rename variables
colnames(concrete_data) <- c("Cement", "Slag", "Ash", "Water", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate", "Age", "Strength")

4 Exploratory Data Analyisis

Check the structure of the dataset:

# Check structure of the dataset
glimpse(concrete_data)
Observations: 1,030
Variables: 9
$ Cement           <dbl> 540.0, 540.0, 332.5, 332.5, 198.6, 266.0, 380...
$ Slag             <dbl> 0.0, 0.0, 142.5, 142.5, 132.4, 114.0, 95.0, 9...
$ Ash              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Water            <dbl> 162, 162, 228, 228, 192, 228, 228, 228, 228, ...
$ Superplasticizer <dbl> 2.5, 2.5, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...
$ Coarse_Aggregate <dbl> 1040.0, 1055.0, 932.0, 932.0, 978.4, 932.0, 9...
$ Fine_Aggregate   <dbl> 676.0, 676.0, 594.0, 594.0, 825.5, 670.0, 594...
$ Age              <dbl> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 9...
$ Strength         <dbl> 79.986111, 61.887366, 40.269535, 41.052780, 4...

The values of the components of the concrete were recalculated so their values range from 0 to 1.

# Recalculate composition as proportions ranging from 0 to 1
concrete_data[, 1:7] <- t(apply(X = concrete_data[, 1:7], MARGIN = 1, FUN = function(x) {x/sum(x)}))

# Print summary statistics
glimpse(concrete_data)
Observations: 1,030
Variables: 9
$ Cement           <dbl> 0.22309440, 0.22172039, 0.14917003, 0.1491700...
$ Slag             <dbl> 0.00000000, 0.00000000, 0.06393001, 0.0639300...
$ Ash              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Water            <dbl> 0.06692832, 0.06651612, 0.10228802, 0.1022880...
$ Superplasticizer <dbl> 0.001032844, 0.001026483, 0.000000000, 0.0000...
$ Coarse_Aggregate <dbl> 0.4296633, 0.4331759, 0.4181247, 0.4181247, 0...
$ Fine_Aggregate   <dbl> 0.2792811, 0.2775611, 0.2664872, 0.2664872, 0...
$ Age              <dbl> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 9...
$ Strength         <dbl> 79.986111, 61.887366, 40.269535, 41.052780, 4...

Print summary statistics for all variables and check for missing (NA) values:

summary(concrete_data)
     Cement             Slag               Ash              Water        
 Min.   :0.04482   Min.   :0.000000   Min.   :0.00000   Min.   :0.05139  
 1st Qu.:0.08205   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.06955  
 Median :0.11528   Median :0.009455   Median :0.00000   Median :0.07862  
 Mean   :0.11955   Mean   :0.031643   Mean   :0.02317   Mean   :0.07773  
 3rd Qu.:0.14917   3rd Qu.:0.061972   3rd Qu.:0.05033   3rd Qu.:0.08386  
 Max.   :0.22541   Max.   :0.150339   Max.   :0.08884   Max.   :0.11222  
 Superplasticizer   Coarse_Aggregate Fine_Aggregate        Age        
 Min.   :0.000000   Min.   :0.3459   Min.   :0.2480   Min.   :  1.00  
 1st Qu.:0.000000   1st Qu.:0.3923   1st Qu.:0.3112   1st Qu.:  7.00  
 Median :0.002727   Median :0.4205   Median :0.3305   Median : 28.00  
 Mean   :0.002620   Mean   :0.4152   Mean   :0.3301   Mean   : 45.66  
 3rd Qu.:0.004338   3rd Qu.:0.4376   3rd Qu.:0.3541   3rd Qu.: 56.00  
 Max.   :0.013149   Max.   :0.4798   Max.   :0.4141   Max.   :365.00  
    Strength     
 Min.   : 2.332  
 1st Qu.:23.707  
 Median :34.443  
 Mean   :35.818  
 3rd Qu.:46.136  
 Max.   :82.599  
# Check number of NA values in each column
colSums(is.na(concrete_data), na.rm = F)
          Cement             Slag              Ash            Water 
               0                0                0                0 
Superplasticizer Coarse_Aggregate   Fine_Aggregate              Age 
               0                0                0                0 
        Strength 
               0 

There are no NA values in the dataset however input variables have different ranges of values.

concrete_data %>% 
  gather(key = Variable, value = Value) %>% 
  ggplot() +
    geom_histogram(aes(x = Value), bins = 20, fill = "blue") +
    facet_wrap(~Variable, scales='free') +
    theme_bw() +
    theme(aspect.ratio = 0.5, axis.title = element_blank(), panel.grid = element_blank())

4.1 Correlations

Plot correlation heatmap using the ggcorr() function from GGally package.

# Plot correlation heatmap
ggcorr(concrete_data, label = TRUE, palette = "RdBu", name = "Correlation", hjust = 0.75, label_size = 3, label_round = 2)

There seems to be a considerable positive correlation between the amount of cement and superplasticizer used and cement strength. Age is also positively correlation with concrete compressive strength. Most pairwise correlations between predictors are generally low.

4.2 Visualizing Feature relationships with a scatterplot matrix

We can use a scatterplot matrix (a collection of scatterplots organized in a grid) to understand the relationship between each predictor and the target feature.

ggduo(data = concrete_data, 
      columnsX = 1:8, 
      columnsY = 9, 
      types = list(continuous = "smooth_lm"),
      mapping = ggplot2::aes(color = -Strength, alpha = 0.3)
      ) +
  theme_bw()


5 Modelling

5.1 Create training and test sets

80% of the samples from the dataset are randomly selected for the training set, the remaining 20% are allocated for testing the models. Stratified partioning on the target feature will be applied for the creation of these subsets.

# Remove any observations with missing values
concrete_data <- concrete_data[complete.cases(concrete_data), ]

# Average the values of compressive strength of replicate experiments
concrete_data <- ddply(.data = concrete_data, 
                       .variables = .(Cement, Slag, Ash, Water, Superplasticizer, `Coarse_Aggregate`, `Fine_Aggregate`, Age), 
                       .fun = function(x) c(Strength = mean(x$Strength)))

# Create training and test set using stratified partioning
set.seed(1)
training_index <- createDataPartition(y = concrete_data$Strength, p = 0.80)[[1]]
training_set <- concrete_data[training_index, ]
test_set <- concrete_data[-training_index, ]

# Check distributtion of strength on training set and test set
par(mfrow = c(1, 2))
hist(training_set$Strength, main = "Training Set", xlab = "Concrete Compressive Strength (MPa)", freq = FALSE)
hist(test_set$Strength, main = "Test Set", xlab = "Concrete Compressive Strength (MPa)", freq = FALSE)

# Print summary statistics for training and test set
bind_rows(summary(training_set$Strength), summary(test_set$Strength)) %>% as_tibble() %>% add_column(Subset = c("Training", "Test"), .before = 1)
ABCDEFGHIJ0123456789
Subset
<chr>
Min.
<dbl>
1st Qu.
<dbl>
Median
<dbl>
Mean
<dbl>
3rd Qu.
<dbl>
Max.
<dbl>
Training2.33180823.5214733.7595035.1899844.8652482.59922
Test6.26733723.5221633.7584735.1356744.7322279.29663

The distribution of values of the target feature on both the training and test set are similar.

5.2 Training Parameters

# Training Parameters
CV_folds <- 5 # number of folds
CV_repeats <- 3 # number of repeats
minimum_resampling <- 5 # minimum number of resamples

Parameter tuning and selection will be done using repeated 5-fold cross-validation. Each round of cross-validation will be repeated 3 times. Adaptive resampling will be used for model training with grid and random search.

The caret package will be used for model training, tuning and evaluation.

# Training Settings
set.seed(1)

# trainControl object for standard repeated cross-validation
train_control <- caret::trainControl(method = "repeatedcv", number = CV_folds, repeats = CV_repeats, 
                                     verboseIter = FALSE, returnData = FALSE) 

# trainControl object for repeated cross-validation with grid search
adapt_control_grid <- caret::trainControl(method = "adaptive_cv", number = CV_folds, repeats = CV_repeats, 
                                     adaptive = list(min = minimum_resampling, # minimum number of resamples tested before model is excluded
                                                     alpha = 0.05, # confidence level used to exclude parameter settings
                                                     method = "gls", # generalized least squares
                                                     complete = TRUE), 
                                     search = "grid", # execute grid search
                                     verboseIter = FALSE, returnData = FALSE) 

# trainControl object for repeated cross-validation with random search
adapt_control_random <- caret::trainControl(method = "adaptive_cv", number = CV_folds, repeats = CV_repeats, 
                                     adaptive = list(min = minimum_resampling, # minimum number of resamples tested before model is excluded
                                                     alpha = 0.05, # confidence level used to exclude parameter settings
                                                     method = "gls", # generalized least squares
                                                     complete = TRUE), 
                                     search = "random", # execute random search
                                     verboseIter = FALSE, returnData = FALSE) 

5.3 Model Training

Extreme Gradient Boosting (XGBoost) will be used to create the regression models. XGBoost is similar to gradient boosting but has the capacity to do parallel computation on a single machine and perform regularization to avoid overfitting. It was developed by Tianqi Chen. Additional advantages of the XGBoost algorithm include its internal cross-validation function, its ability to handle missing values, its flexibility and its ability to prune the tree until the improvement in loss function is below a threshold.

Similar to GBM, XGBoost uses the errors of previous models to reduce them on the next iteration. The final model is a weighted combination of the models obtained on previous iterations. XGBoost has several tuning parameters some of which depend on the type of booster used (CART or generalized linear model) while others are general, regularization and learning task parameters.

Models will be created using generalized linear models as learners.

When using the xgbLinear method in caret, there are four hyperparameters to optimize:

  • nrounds: number of boosting iterations
  • eta: step size shrinkage
  • lambda: L2 Regularization
  • alpha: L1 Regularization

The method above requires the xgboost package.

5.3.3 Genetic Algorithm

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The GA package will be used to optimize the hyperparameters of the XGBoost model using a genetic algorithm. Prior to run the search method, an objective function needs to be created. Similarly to the grid and random search, the average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x1, x2, x3, x4, data, train_settings) {
  
suppressWarnings(
  XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = data,
                          method = "xgbLinear",
                          trControl = train_settings,
                          verbose = FALSE, 
                          silent = 1,
                          tuneGrid = expand.grid(
                                                nrounds = round(x1), # number of boosting iterations
                                                eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                ) 
                          )
)

    return(-XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)

The population size was set to 10 with a maximum of 10 round of iterations. Minimum and maximum values for each parameters were defined above.

set.seed(1)
n_cores <- detectCores()-1

GA_T0 <- Sys.time()
# Run genetic algorithm
GA_model_XGBoost_Linear <- GA::ga(type = "real-valued", 
                                fitness = function(x) eval_function_XGBoost_Linear(x[1],x[2],x[3],x[4], 
                                                                                   data = training_set, 
                                                                                   train_settings = train_control), 
                                lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]), # minimum values
                                upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), # maximum values
                                popSize = pop_size, # population size
                                maxiter = max_iter, # number of iterations
                                pmutation = 0.5, # probability of mutation
                                elitism = 0.3, # percentage of elitism (fraction of best current solutions used on next round)
                                # suggestions = starting_point,
                                parallel = n_cores, 
                                optim = F, 
                                keepBest = T,
                                seed = 1
               )
GA_T1 <- Sys.time()
GA_T1-GA_T0 
Time difference of 11.49358 mins
# Print summary of search method
summary(GA_model_XGBoost_Linear)
-- Genetic Algorithm ------------------- 

GA settings: 
Type                  =  real-valued 
Population size       =  10 
Number of generations =  10 
Elitism               =  0.3 
Crossover probability =  0.8 
Mutation probability  =  0.5 
Search domain = 
        x1 x2 x3 x4
lower   10 -5 -3 -3
upper 1000  3  1  1

GA results: 
Iterations             = 10 
Fitness function value = -4.344492 
Solutions = 
           x1        x2         x3       x4
[1,] 419.5652 -3.814308 -0.2802111 0.591394
[2,] 419.5652 -2.778735 -0.2802111 0.591394
GA::plot.ga(GA_model_XGBoost_Linear, main = "Genetic Algorithm: RMSE values at each iteration")

The search was completed a found the optimal RMSE value to be 4.344.

Once the optimal values are found for each hyperparameter, the final model is trained with these values.

# Grid of optimal hyperparameter values
GA_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(GA_model_XGBoost_Linear@solution[1]),  # learning rate, low value means model is more robust to overfitting
                                  eta = 10^GA_model_XGBoost_Linear@solution[2], # number of boosting iterations
                                  alpha = 10^GA_model_XGBoost_Linear@solution[3], # L2 Regularization (Ridge Regression)
                                  lambda = 10^GA_model_XGBoost_Linear@solution[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
GA_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          # tuneLength = 1
                          tuneGrid = GA_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0
Time difference of 16.56507 secs
GA_XGBoost_Linear_model
eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.832614  0.9115271  3.241825

Tuning parameter 'nrounds' was held constant at a value of 420
 0.001664427
Tuning parameter 'alpha' was held constant at a value
 of 0.0001533531
Tuning parameter 'eta' was held constant at a value of Inf
saveRDS(object = GA_XGBoost_Linear_model, file = paste0("Models/GA_XGBoost_Linear_model_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = GA_XGBoost_Linear_model$finalModel, file = paste0("Models/GA_XGBoost_Linear_model_", class(GA_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the genetic algorithm has an average cross-validated RMSE of 4.833.

This model has 420 iterations and a alpha and lambda of 0.0001534 and 0.0016644, respectively. The step size shrinkage (eta) is .


5.3.4 Differential Evolution

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The DEoptim package will be used to optimize the hyperparameters of the XGBoost model using differential evolution. Prior to run the search method, a custom objective function needs to be created. The average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x, data, train_settings) {
  
  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]
  
suppressWarnings(
  XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = data,
                          method = "xgbLinear",
                          trControl = train_settings,
                          verbose = FALSE, 
                          silent = 1,
                          tuneGrid = expand.grid(
                                                nrounds = round(x1), # number of boosting iterations
                                                eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                ) 
                          )
)

    return(XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)
set.seed(1)
n_cores <- detectCores()-1

DE_T0 <- Sys.time()
# Run differential evolution algorithm
DE_model_XGBoost_Linear <- DEoptim::DEoptim(
  fn = eval_function_XGBoost_Linear, 
  lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]),
  upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), 
  control = DEoptim.control(
                            NP = pop_size, # population size
                            itermax = max_iter, # maximum number of iterations
                            CR = 0.5, # probability of crossover
                            storepopfreq = 1, # store every population
                            parallelType = 1 # run parallel processing
                            ),
  data = training_set,
  train_settings = train_control
  )
Iteration: 1 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 2 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 3 bestvalit: 4.395160 bestmemit:  209.665112   -2.686955   -2.341378    0.912026
Iteration: 4 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 5 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 6 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 7 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 8 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 9 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
Iteration: 10 bestvalit: 4.323350 bestmemit:  577.124830   -0.692809   -2.812109    0.848738
DE_T1 <- Sys.time()
DE_T1-DE_T0
Time difference of 18.51168 mins
# Print search results
summary(DE_model_XGBoost_Linear)

***** summary of DEoptim object ***** 
best member   :  577.1248 -0.69281 -2.81211 0.84874 
best value    :  4.32335 
after         :  10 generations 
fn evaluated  :  22 times 
*************************************
DE_solutions <- DE_model_XGBoost_Linear$optim$bestmem

# Plot results
ggplot(mapping = aes(x = 1:length(DE_model_XGBoost_Linear$member$bestvalit), y = DE_model_XGBoost_Linear$member$bestvalit)) +
    geom_line(col = "grey50") + 
    geom_point(col = "grey50") +
    theme_bw() +
    theme(aspect.ratio = 0.9) +
    labs(x = "Iteration", y = "RMSE", title = "Best RMSE value at each iteration", subtitle = "Results using Differential Evolution") +
    scale_x_continuous(breaks = 1:DE_model_XGBoost_Linear$optim$iter, minor_breaks = NULL)

The search was completed a found the optimal RMSE value to be 4.323 after 10 iterations and 22 function evaluations.

The final model is trained with the optimal values are found for each hyperparameter.

# Grid of optimal hyperparameter values
DE_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(DE_solutions[1]),  # learning rate, low value means model is more robust to overfitting
                                  eta = 10^DE_solutions[2], # number of boosting iterations
                                  alpha = 10^DE_solutions[3], # L2 Regularization (Ridge Regression)
                                  lambda = 10^DE_solutions[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
DE_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          # tuneLength = 1
                          tuneGrid = DE_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0
Time difference of 26.43967 secs
DE_XGBoost_Linear_model
eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.589265  0.9202668  3.089244

Tuning parameter 'nrounds' was held constant at a value of 577
 0.001541313
Tuning parameter 'eta' was held constant at a value
 of 0.2028573
saveRDS(object = DE_XGBoost_Linear_model, file = paste0("Models/DE_XGBoost_Linear_model_PSO_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = DE_XGBoost_Linear_model$finalModel, file = paste0("Models/DE_XGBoost_Linear_model_PSO_", class(DE_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the differential evolution algorithm has an average cross-validated RMSE of 4.589.

This model has 577 iterations and a alpha and lambda of 0.0015413 and 7.0589095, respectively. The step size shrinkage (eta) is 0.2028573.


5.3.5 Particle Swarm Optimization

# Set parameter settings for search algorithm
max_iter <- 10 # maximum number of iterations
pop_size <- 10 # population size

The PSO package will be used to optimize the hyperparameters of the XGBoost model using particle swarm optimization. Prior to run the search method, a custom objective function needs to be created. The average cross-validated root-mean-square error (RMSE) will be used as the parameter to be optimized.

# Create custom function for assessing solutions
eval_function_XGBoost_Linear <- function(x, data, train_settings) {
  
  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]
  
suppressWarnings(
  # Create dataframe with proportion of each solid component
  XGBoost_Linear_model <- caret::train(Strength ~., 
                                      data = data,
                                      method = "xgbLinear",
                                      trControl = train_settings,
                                      verbose = FALSE, 
                                      silent = 1,
                                      tuneGrid = expand.grid(
                                                            nrounds = round(x1), # number of boosting iterations
                                                            eta = 10^x2, # learning rate, low value means model is more robust to overfitting
                                                            alpha = 10^x3, # L1 Regularization (equivalent to Lasso Regression) on weights
                                                            lambda = 10^x4 # L2 Regularization (equivalent to ridge Regression) on weights
                                                            ) 
                                      )
)

    return(XGBoost_Linear_model$results$RMSE) # minimize RMSE

}

# Define minimum and maximum values for each input
nrounds_min_max <- c(10,10^3)
eta_min_max <- c(-5,3)
alpha_min_max <- c(-3,1)
lambda_min_max <- c(-3,1)

The psoptim() function from the pso package is used to run the search algorithm. The SPSO2011 method will be used for this search.

set.seed(1)
n_cores <- detectCores()-1

PSO_T0 <- Sys.time()
# Run search algorithm
PSO_model_XGBoost_Linear <- pso::psoptim(
  par = rep(NA, 4),
  fn = eval_function_XGBoost_Linear, 
  lower = c(nrounds_min_max[1], eta_min_max[1], alpha_min_max[1], lambda_min_max[1]),
  upper = c(nrounds_min_max[2], eta_min_max[2], alpha_min_max[2], lambda_min_max[2]), 
  control = list(
                trace = 1, #  produce tracing information on the progress of the optimization
                maxit = max_iter, # maximum number of iterations
                REPORT = 1, #  frequency for reports
                trace.stats = T,
                s = pop_size, # Swarm Size,
                maxit.stagnate = round(0.75*max_iter), # maximum number of iterations without improvement
                vectorize = T,
                type = "SPSO2011" # method used
                ),
  data = training_set,
  train_settings = train_control
  )
PSO_T1 <- Sys.time()
PSO_T1-PSO_T0
Time difference of 24.26905 mins
PSO_summary <- data.frame(
                          Iteration = PSO_model_XGBoost_Linear$stats$it,
                          Mean = PSO_model_XGBoost_Linear$stats$f %>% sapply(FUN = mean),
                          Median = PSO_model_XGBoost_Linear$stats$f %>% sapply(FUN = median),
                          Best = PSO_model_XGBoost_Linear$stats$error %>% sapply(FUN = min)
                          )
PSO_summary %>% 
  gather(key = "Parameter", value = "Value", - Iteration) %>% 
  ggplot(mapping = aes(x = Iteration, y = Value, col = Parameter)) +
    geom_line() +
    geom_point() +
    theme_bw() +
    theme(aspect.ratio = 0.9) +
    scale_x_continuous(breaks = PSO_model_XGBoost_Linear$stats$it, minor_breaks = NULL) +
    labs(x = "Iteration", y = "RMSE", title = "RMSE values at each iteration", subtitle = "Results using Particle Swarm Optimization") +
    scale_color_brewer(type = "qual", palette = "Set1")

The search was completed a found the optimal RMSE value to be 4.362 after 10 iterations and 100 function evaluations.

The final model is trained with the optimal values are found for each hyperparameter.

# Grid of optimal hyperparameter values
PSO_XGBoost_Linear_grid <- expand.grid(
                                  nrounds = round(PSO_model_XGBoost_Linear$par[1]),  # number of boosting iterations
                                  eta = PSO_model_XGBoost_Linear$par[2], # learning rate, low value means model is more robust to overfitting
                                  alpha = PSO_model_XGBoost_Linear$par[3], # L2 Regularization (Ridge Regression)
                                  lambda = PSO_model_XGBoost_Linear$par[4] # L1 Regularization (Lasso Regression)
                                  )

T0 <- Sys.time()
cluster <- makeCluster(detectCores() - 1) # number of cores, convention to leave 1 core for OS
registerDoParallel(cluster) # register the parallel processing

set.seed(1)
# Train model with optimal values
PSO_XGBoost_Linear_model <- caret::train(Strength ~., 
                          data = training_set, 
                          method = "xgbLinear",
                          trControl = train_control,
                          verbose = F, metric = "RMSE", maximize = FALSE,
                          silent = 1,
                          tuneGrid = PSO_XGBoost_Linear_grid
                          )

stopCluster(cluster) # shut down the cluster 
registerDoSEQ() #  force R to return to single threaded processing
T1 <- Sys.time()
T1-T0
Time difference of 13.50072 secs
PSO_XGBoost_Linear_model
eXtreme Gradient Boosting 

No pre-processing
Resampling: Cross-Validated (5 fold, repeated 3 times) 
Summary of sample sizes: 640, 640, 640, 640, 640, 640, ... 
Resampling results:

  RMSE      Rsquared   MAE     
  4.693967  0.9168326  3.119964

Tuning parameter 'nrounds' was held constant at a value of 740

Tuning parameter 'alpha' was held constant at a value of 1

Tuning parameter 'eta' was held constant at a value of 3
saveRDS(object = PSO_XGBoost_Linear_model, file = paste0("Models/PSO_XGBoost_Linear_model_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))
saveRDS(object = PSO_XGBoost_Linear_model$finalModel, file = paste0("Models/PSO_XGBoost_Linear_model_", class(PSO_XGBoost_Linear_model$finalModel)[1], "_", stringr::str_remove_all(Sys.time(), pattern = ":"),".rds"))

The best model from the search with the particle swarm optimization algorithm has an average cross-validated RMSE of 4.694.

This model has 740 iterations and a alpha and lambda of 1 and 1, respectively. The step size shrinkage (eta) is 3.


6 Model Performance

6.1 Performance on Training Set

The training set statistics and hyperparameters values for each search method tested are summarised on the table below.

# Create summary table
Summary_Table_Training <- bind_rows(
  GS_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  RS_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  GA_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  DE_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5),
  PSO_XGBoost_Linear_model$results %>% arrange(RMSE) %>% .[1,] %>% select(RMSE, MAE, Rsquared, nrounds, eta, lambda, alpha) %>% round(5))

Summary_Table_Training <- Summary_Table_Training %>% 
  add_column(Method = c("Grid Search", "Random Search", "Genetic Algorithm", "Differential Evolution", "Particle Swarm Optimization"), .before = 1) %>% 
  add_column(`Processing Time` = round(c(GS_T1-GS_T0, RS_T1-RS_T0, GA_T1-GA_T0, DE_T1-GS_T0, PSO_T1-PSO_T0),0))

# Print table
Summary_Table_Training %>% 
  kable(align = "c", caption = "Training Set Statistics and Hyperparameter Values of XGBoost Models.") %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "center") %>%
  kableExtra::footnote(general = paste0("Note:\nSummary statistics obtained using ", CV_folds, "-fold cross validation repeated ", CV_repeats, " times.","\nGrid and Random Search  performed with adaptive resampling."), general_title = "\n ")
Training Set Statistics and Hyperparameter Values of XGBoost Models.
Method RMSE MAE Rsquared nrounds eta lambda alpha Processing Time
Grid Search 4.68290 3.15011 0.91690 500 0.01000 0.50000 1.00000 5 mins
Random Search 4.71698 3.21069 0.91588 64 0.52936 0.91103 0.00186 2 mins
Genetic Algorithm 4.83261 3.24182 0.91153 420 Inf 0.00166 0.00015 11 mins
Differential Evolution 4.58926 3.08924 0.92027 577 0.20286 7.05891 0.00154 37 mins
Particle Swarm Optimization 4.69397 3.11996 0.91683 740 3.00000 1.00000 1.00000 24 mins

Note:
Summary statistics obtained using 5-fold cross validation repeated 3 times.
Grid and Random Search performed with adaptive resampling.

Although there are some considerable differences between the values of hyperparameters obtained from each search method, the performance metrics are relatively similar.

Differential Evolution obtained the lowest RMSE value (4.5893).

6.2 Performance on Test Set

# Custom functions to plot observed, predicted and residual values for each method
library(ggplot2, quietly = T, verbose = F)

# Function to plot observed vs predicted values
predicted_observed_plot <- function(predicted_val, observed_val, residual_val, model_name = "", R_squared, ...) {
  
  plot <- ggplot(mapping = aes(x = predicted_val, y = observed_val, col = abs(residual_val))) +
  geom_point(alpha = 0.9, size = 2) +
  geom_abline(intercept = 0, slope = 1) +
    # facet_wrap(~) +
    labs(title = paste0(model_name, "\nPredicted vs Observed: Test Set"),
         subtitle = paste0("R-squared: ", R_squared),
         x = "Predicted",
         y = "Observed",
         col = "Absolute Deviation") +
  theme_bw() +
  theme(aspect.ratio = 0.9, panel.grid.minor.x = element_blank(), legend.title = element_text(size = 10, face="bold"), legend.text = element_text(size = 9), plot.title = element_text(size=12, face="bold"), axis.title=element_text(size=10, face="bold"), axis.text.x = element_text(angle = 0), legend.position = "none") +
  # scale_x_continuous(expand = c(0,0)) +
  # scale_y_continuous(expand = c(0,0)) + 
  coord_equal() + scale_color_viridis_c(direction = -1)

  return (plot)
}

# Function to plot residuals
residuals_plot <- function(predicted_val, residual_val, model_name = "", MAE, RMSE, ...) {

  plot <- ggplot(mapping = aes(x = predicted_val, y = residual_val, col = abs(residual_val))) +
  geom_point(alpha = 0.9, size = 2) +
  geom_abline(intercept = 0, slope = 0) +
    # facet_wrap(~) +
    labs(
       title = paste0(model_name, "\nResiduals: Test Set"),
       subtitle = paste0("RMSE: ", RMSE, ", MAE: ", round(MAE, 3)),
       x = "Predicted",
       y = "Residual",
       col = "Absolute Deviation"
       ) +
  theme_bw() +
  theme(aspect.ratio = 0.9, panel.grid.minor.x = element_blank(), legend.title = element_text(size = 10, face="bold"), legend.text = element_text(size = 9), plot.title = element_text(size=12, face="bold"), axis.title=element_text(size=10, face="bold"), axis.text.x = element_text(angle = 0), legend.position = "none") +
  # scale_x_continuous(expand = c(0,0)) +
  # scale_y_continuous(expand = c(0,0)) +
  coord_equal() + scale_color_viridis_c(direction = -1)

  return (plot)
}
### Grid Search (GS)

# Make predictions on test set
test_set$XGBoost_Linear_GS <- predict(GS_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_GS_residual <- test_set$Strength - test_set$XGBoost_Linear_GS

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_GS, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_GS, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_GS, obs = test_set$Strength), 6)

GS_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_GS_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_GS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GS_residual, R_squared = R_squared, model_name = "Grid Search")
XGBoost_Linear_GS_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_GS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GS_residual, MAE = MAE, RMSE = RMSE, model_name = "Grid Search")
### Random Search

# Make predictions on test set
test_set$XGBoost_Linear_RS <- predict(RS_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_RS_residual <- test_set$Strength - test_set$XGBoost_Linear_RS

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_RS, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_RS, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_RS, obs = test_set$Strength), 6)

RS_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_RS_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_RS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_RS_residual, R_squared = R_squared, model_name = "Random Seach")
XGBoost_Linear_RS_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_RS, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_RS_residual, MAE = MAE, RMSE = RMSE, model_name = "Random Seach")
### Genetic Algorithm (GA)

# Make predictions on test set
test_set$XGBoost_Linear_GA <- predict(GA_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_GA_residual <- test_set$Strength - test_set$XGBoost_Linear_GA

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_GA, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_GA, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_GA, obs = test_set$Strength), 6)

GA_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_GA_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_GA, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GA_residual, R_squared = R_squared, model_name = "Genetic Algorithm")
XGBoost_Linear_GA_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_GA, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_GA_residual, MAE = MAE, RMSE = RMSE, model_name = "Genetic Algorithm")
### Differential evolution (DE)

# Make predictions on test set
test_set$XGBoost_Linear_DE <- predict(DE_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_DE_residual <- test_set$Strength - test_set$XGBoost_Linear_DE

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_DE, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_DE, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_DE, obs = test_set$Strength), 6)

DE_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_DE_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_DE, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_DE_residual, R_squared = R_squared, model_name = "Differential Evolution")
XGBoost_Linear_DE_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_DE, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_DE_residual, MAE = MAE, RMSE = RMSE, model_name = "Differential Evolution")
### Particle Swarm Optimization (PSO)

# Make predictions on test set
test_set$XGBoost_Linear_PSO <- predict(PSO_XGBoost_Linear_model, test_set)
# Calculate Residuals on test set
test_set$XGBoost_Linear_PSO_residual <- test_set$Strength - test_set$XGBoost_Linear_PSO

# Calculate test set R-squared, RMSE, MAE
R_squared <- round(cor(test_set$XGBoost_Linear_PSO, test_set$Strength), 4)
RMSE <- signif(RMSE(pred = test_set$XGBoost_Linear_PSO, obs = test_set$Strength, na.rm = T), 6)
MAE <- signif(MAE(pred = test_set$XGBoost_Linear_PSO, obs = test_set$Strength), 6)

PSO_Test_Set_Statistics <- c(RMSE, MAE, R_squared)

# Plot predicted vs observed values and residuals
XGBoost_Linear_PSO_pred_obs <- predicted_observed_plot(predicted_val = test_set$XGBoost_Linear_PSO, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_PSO_residual, R_squared = R_squared, model_name = "Particle Swarm Optimisation")
XGBoost_Linear_PSO_residuals <- residuals_plot(predicted_val = test_set$XGBoost_Linear_PSO, observed_val = test_set$Strength, residual_val = test_set$XGBoost_Linear_PSO_residual, MAE = MAE, RMSE = RMSE, model_name = "Particle Swarm Optimisation")
# Create summary table
Summary_Table_Test <- rbind(
  GS_Test_Set_Statistics,
  RS_Test_Set_Statistics,
  GA_Test_Set_Statistics,
  DE_Test_Set_Statistics,
  PSO_Test_Set_Statistics, deparse.level = 0 
  ) %>% data.frame()

colnames(Summary_Table_Test) <- c("RMSE", "MAE", "R-squared")
Summary_Table_Test <- Summary_Table_Test %>% add_column(Method = c("Grid Search", "Random Search", "Genetic Algorithm", "Differential Evolution", "Particle Swarm Optimization"), .before = 1)

# Print table
Summary_Table_Test %>% 
  kable(align = "c", caption = "Test Set Statistics of XGBoost Models.") %>% 
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "center") %>%
  kableExtra::footnote(general = paste0(""), general_title = "\n ")
Test Set Statistics of XGBoost Models.
Method RMSE MAE R-squared
Grid Search 5.38333 3.03600 0.9432
Random Search 5.61246 3.21112 0.9385
Genetic Algorithm 5.15301 3.08178 0.9483
Differential Evolution 4.93595 2.80724 0.9529
Particle Swarm Optimization 5.74515 3.07984 0.9354

Based on test set statistics, the Differential Evolution obtained the lowest RMSE value (4.936).

g <- gridExtra::grid.arrange(XGBoost_Linear_GS_pred_obs, XGBoost_Linear_GS_residuals, 
                             XGBoost_Linear_RS_pred_obs, XGBoost_Linear_RS_residuals, 
                             XGBoost_Linear_GA_pred_obs, XGBoost_Linear_GA_residuals, 
                             XGBoost_Linear_DE_pred_obs, XGBoost_Linear_DE_residuals, 
                             XGBoost_Linear_PSO_pred_obs, XGBoost_Linear_PSO_residuals, 
                             ncol = 2)

7 Summary

The use of population-based search methods such as genetic algorithms, differential evolution and particle swarm optimation show some potential for finding the optimal values of hyperparameters of regression models.

The use of Differential Evolution resulted on the lowest test set RMSE value of all the methods tested.

Due to the high computational demands, the use of these methods may be more suitable for models with several hyperparameters used on relatively small datasets and/or models that are easily trained. Performing a grid search across a wide range of values for beforehand may also help to narrow the limits of the constraints and make the overall process more efficient.