1 Introduction

1.1 Objective

The objective of this document is to compare various search methods to find a concrete mixture with the highest predicted compressive strength.

1.2 Prediction of Compressive Strength

The compressive strength was predicted with neural networks using model averaging (see avNNet). The dataset used to train the predictive model comes from the research paper Modeling of strength of high performance concrete using artificial neural networks by I-Cheng Yeh published in Cement and Concrete Research, Vol. 28, No. 12, pp. 1797-1808 (1998). The compressive strength of concrete was predicted using its age and composition.

The dataset can be downloaded through the UCI Machine learning Repository. The final model was tuned using the caret package.

The data contains 1030 examples and the following features:

Input Variable: Cement (kg in a m³ mixture)
Input Variable: Blast Furnace Slag (kg in a m³ mixture)
Input Variable: Fly Ash (kg in a m³ mixture)
Input Variable: Water (kg in a m³ mixture)
Input Variable: Superplasticizer (kg in a m³ mixture)
Input Variable: Coarse Aggregate (kg in a m³ mixture)
Input Variable: Fine Aggregate (kg in a m³ mixture)
Input Variable: Age (days)
Output Variable: Concrete compressive strength (MPa)

1.3 Search Methods

Seven search methods are tested:

Grid search (GS)
Random search (RS)
Simulated Annealing (SA)
Genetic algorithm (GA)
Islands genetic algorithm (ISLGA)
Differential evolution (DE)
Particle swarm optimization (PSO)

Grid and random search will be used as benchmarks for comparing the other search methods.

2 Install and Load Packages

The pacman package is used to install and load all necessary packages.

# Install and load packages
if (!require(pacman)) {install.packages("pacman", verbose = F, quiet = T)} else require(pacman, quietly = T)
suppressWarnings(pacman::p_load(plyr, caret, tidyverse, tidyselect, readr, readxl, parallel, doParallel, gridExtra, pso, GA, GenSA, DEoptim, GGally, ggfortify, broom, knitr, kableExtra, install = T))

3 Importing the dataset

The dataset is downloaded from the data the UCI Machine Learning Data Repository:

# Load library
# download.file(url = "http://archive.ics.uci.edu/ml/machine-learning-databases/concrete/compressive/Concrete_Data.xls", destfile = "Concrete_Data.xls", method = "curl", quiet = TRUE)

# Import Data
concrete_data <- read_xls(path = "Concrete_Data.xls", sheet = 1)

4 Exploratory Data Analysis

Check the structure of the dataset:

# Check structure of the dataset
glimpse(concrete_data)

Observations: 1,030
Variables: 9
$ `Cement (component 1)(kg in a m^3 mixture)`             <dbl> 540.0,...
$ `Blast Furnace Slag (component 2)(kg in a m^3 mixture)` <dbl> 0.0, 0...
$ `Fly Ash (component 3)(kg in a m^3 mixture)`            <dbl> 0, 0, ...
$ `Water  (component 4)(kg in a m^3 mixture)`             <dbl> 162, 1...
$ `Superplasticizer (component 5)(kg in a m^3 mixture)`   <dbl> 2.5, 2...
$ `Coarse Aggregate  (component 6)(kg in a m^3 mixture)`  <dbl> 1040.0...
$ `Fine Aggregate (component 7)(kg in a m^3 mixture)`     <dbl> 676.0,...
$ `Age (day)`                                             <dbl> 28, 28...
$ `Concrete compressive strength(MPa, megapascals)`       <dbl> 79.986...

# Rename variables
colnames(concrete_data) <- c("Cement", "Slag", "Ash", "Water", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate", "Age", "Strength")

ingredients <- c("Cement", "Slag", "Ash", "Water", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate")

The values of the components of the concrete were recalculated so their values range from 0 to 1.

# Recalculate composition as proportions
concrete_data[, ingredients] <- t(apply(X = concrete_data[, ingredients], MARGIN = 1, FUN = function(x) {x/sum(x)}))

# Print summary statistics
glimpse(concrete_data)

Observations: 1,030
Variables: 9
$ Cement           <dbl> 0.223094, 0.221720, 0.149170, 0.149170, 0.085...
$ Slag             <dbl> 0.000000, 0.000000, 0.063930, 0.063930, 0.056...
$ Ash              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ Water            <dbl> 0.066928, 0.066516, 0.102288, 0.102288, 0.082...
$ Superplasticizer <dbl> 0.0010328, 0.0010265, 0.0000000, 0.0000000, 0...
$ Coarse_Aggregate <dbl> 0.42966, 0.43318, 0.41812, 0.41812, 0.42047, ...
$ Fine_Aggregate   <dbl> 0.27928, 0.27756, 0.26649, 0.26649, 0.35476, ...
$ Age              <dbl> 28, 28, 270, 365, 360, 90, 365, 28, 28, 28, 9...
$ Strength         <dbl> 79.9861, 61.8874, 40.2695, 41.0528, 44.2961, ...

Print statistics for each variable and check for missing (NA) values:

concrete_data %>% 
  signif(5) %>% 
  gather(key = "Feature", value = "Quantity") %>% 
  dplyr::group_by(Feature) %>% 
  summarise(
    `NA` = sum(is.array(Quantity), na.rm = T),
    Min = min(Quantity, na.rm = T),
    `1st Quartile` = quantile(Quantity, probs = 0.25, na.rm = T),
    Median = median(Quantity, na.rm = T),
    Mean = mean(Quantity, na.rm = T),
    `3rd Quartile` = quantile(Quantity, probs = 0.75),
    Max = max(Quantity, na.rm = T)
  ) %>% 
  kable(digits = 4, align = "c") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Feature	Min	1st Quartile	Median	Mean	3rd Quartile	Max
Age	1.0000	7.0000	28.0000	45.6621	56.0000	365.0000
Ash	0.0000	0.0000	0.0000	0.0232	0.0503	0.0888
Cement	0.0448	0.0821	0.1153	0.1196	0.1492	0.2254
Coarse_Aggregate	0.3459	0.3923	0.4205	0.4152	0.4376	0.4798
Fine_Aggregate	0.2480	0.3112	0.3305	0.3301	0.3541	0.4142
Slag	0.0000	0.0000	0.0095	0.0316	0.0620	0.1503
Strength	2.3318	23.7075	34.4430	35.8178	46.1365	82.5990
Superplasticizer	0.0000	0.0000	0.0027	0.0026	0.0043	0.0131
Water	0.0514	0.0695	0.0786	0.0777	0.0839	0.1122

There are no NA values in the dataset however input variables have different ranges of values.

# Plots histograms of features
concrete_data %>% 
  gather(key = Variable, value = Value) %>% 
  ggplot() +
    geom_histogram(aes(x = Value), bins = 20, fill = "#08519c") +
    facet_wrap(~Variable, scales='free') +
    theme_bw() +
    theme(aspect.ratio = 0.5, axis.title = element_blank(), panel.grid = element_blank())

5 Obtaining Optimal Concrete Mixtures

The aim is to find a concrete mixture with the highest compressive strength possible. Several search methods will be tested in order to obtain a concrete mixture with the highest predicted compressive strength.

The compressive strength will be obtained from a predictive model previously obtained from training data. This will be done with neural networks using model averaging that were obtained using the caret package.

Since the compressive strength generally increases with ageing and we are not interested in optimizing for this variable, predictions will be made using always using the same number of days of aging. All predictions will be made with the concrete age fixed at 28 days.

# Import predictive model
avNNet_model_final<- readRDS(file = "Models/avNNet_model.rds")

# Define minimum and maximum values for each input
margin <- 0.05 
Cement_min_max <- c(min(concrete_data$Cement)*(1-margin), 
                    max(concrete_data$Cement)*(1+margin)) %>% round(4)
Slag_min_max <- c(min(concrete_data$Slag)*(1-margin), 
                  max(concrete_data$Slag)*(1+margin)) %>% round(4)
Ash_min_max <- c(min(concrete_data$Ash)*(1-margin), 
                 max(concrete_data$Ash)*(1+margin)) %>% round(4)
Superplasticizer_min_max <- c(min(concrete_data$Superplasticizer)*(1-margin),
                              max(concrete_data$Superplasticizer)*(1+margin)) %>% round(4)
Coarse_Aggregate_min_max <- c(min(concrete_data$Coarse_Aggregate)*(1-margin),
                              max(concrete_data$Coarse_Aggregate)*(1+margin)) %>% round(4)
Fine_Aggregate_min_max <- c(min(concrete_data$Fine_Aggregate)*(1-margin),
                            max(concrete_data$Fine_Aggregate)*(1+margin)) %>% round(4)

lower_limits <- c(Cement_min_max[1], Slag_min_max[1], Ash_min_max[1], Superplasticizer_min_max[1], Coarse_Aggregate_min_max[1], Fine_Aggregate_min_max[1])
upper_limits <- c(Cement_min_max[2], Slag_min_max[2], Ash_min_max[2], Superplasticizer_min_max[2], Coarse_Aggregate_min_max[2], Fine_Aggregate_min_max[2])

# Set fixed value for aging
days_aging <- 28

# Set minimum and maxium acceptable amount of water in each mixture
maximum_water <- (max(concrete_data$Water)*1.05) %>% round(4) 
minimum_water <- (min(concrete_data$Water)*0.95) %>% round(4)

n_best_solutions <- 5 # Number of best solutions to keep (for selected methods)

# Optional: Create a starting point for the genetic algorithm
starting_point <- sapply(X = concrete_data, FUN = mean) %>% t() %>% data.frame() %>% select("Cement", "Slag", "Ash", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate") %>% as.matrix()

5.1 Grid Search

Grid search is a blind search method that reduces the space of solutions by implementing a regular hyper dimensional search space. The main downside of this method is the high computational demands of problems with several dimensions, this can be balanced by increasing the grid search step.

To perform a grid search, a search grid with a wide range of values for each ingredient was created. To exclude unfeasible solutions, combinations with too much or too little water were excluded from the grid.

5.1.1 Create Search Grid

# Create search grid
Search_Grid <- expand.grid(
  Cement = seq(Cement_min_max[1], Cement_min_max[2], length.out = 18),
  Slag = seq(Slag_min_max[1], Slag_min_max[2], length.out = 18),
  Ash = seq(Ash_min_max[1], Ash_min_max[2], length.out = 18),
  Superplasticizer = seq(Superplasticizer_min_max[1], Superplasticizer_min_max[2], length.out = 18),
  Coarse_Aggregate = seq(Coarse_Aggregate_min_max[1], Coarse_Aggregate_min_max[2], length.out = 18),
  Fine_Aggregate = seq(Fine_Aggregate_min_max[1], Fine_Aggregate_min_max[2], length.out = 18)
  ) 

# Set threshold for minimum and maximum solids: remove solutions with too little or too much water
minimum_solids <- 1-maximum_water 
maximum_solids <- 1-minimum_water

# Remove solutions with solids content below minimum threshold (too much water) or above maximum threshold (not enough water)
Search_Grid <- Search_Grid[rowSums(Search_Grid) >= minimum_solids & 
                           rowSums(Search_Grid) <= maximum_solids, ] 

# Add column for water content
Search_Grid$Water <- 1-rowSums(Search_Grid)
Search_Grid$Age <- 28

# Tabulate summary statistics of search grid
Search_Grid %>% 
  signif(5) %>% 
  gather(key = "Feature", value = "Quantity") %>% 
  dplyr::group_by(Feature) %>% 
  summarise(
    Min = min(Quantity, na.rm = T),
    `1st Quartile` = quantile(Quantity, probs = 0.25, na.rm = T),
    Median = median(Quantity, na.rm = T),
    Mean = mean(Quantity, na.rm = T),
    `3rd Quartile` = quantile(Quantity, probs = 0.75),
    Max = max(Quantity, na.rm = T)
  ) %>% 
  kable(digits = 4, align = "c") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Feature	Min	1st Quartile	Median	Mean	3rd Quartile	Max
Age	28.0000	28.0000	28.0000	28.0000	28.0000	28.0000
Ash	0.0000	0.0165	0.0384	0.0408	0.0659	0.0933
Cement	0.0426	0.0654	0.0997	0.1115	0.1454	0.2367
Coarse_Aggregate	0.3286	0.3492	0.3904	0.3939	0.4317	0.5038
Fine_Aggregate	0.2356	0.2590	0.2942	0.3053	0.3411	0.4349
Slag	0.0000	0.0279	0.0557	0.0612	0.0929	0.1579
Superplasticizer	0.0000	0.0032	0.0065	0.0068	0.0106	0.0138
Water	0.0488	0.0633	0.0791	0.0805	0.0969	0.1178

The search grid has a 5.35 million different solutions.

5.1.2 Generate Predictions

Use the model to generate predictions for each concrete mixture in the grid:

# Make predictions on search grid
GS_T0 <- Sys.time() # record start time

Search_Grid$Strength <- predict(avNNet_model_final, Search_Grid)

GS_T1 <- Sys.time() # record end time
(GS_Time <-  GS_T1 - GS_T0)

Time difference of 1.6414 mins

# Save the n best solutions of each model into a single dataframe
avNNet_GS <- Search_Grid %>%
  arrange(desc(Strength)) %>%
  .[1:n_best_solutions, ] %>%  # select n best solutions
  mutate(Method = "Grid Search") %>% 
  select(Method, Strength, everything())

avNNet_GS %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solutions", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_GS))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solutions
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Grid Search	93.807	0.2253	0.0464	0.0055	0.0024	0.3595	0.3059	0.0549	28
Grid Search	93.615	0.2253	0.0464	0.0110	0.0024	0.3595	0.3059	0.0494	28
Grid Search	92.516	0.2367	0.0557	0.0000	0.0024	0.3492	0.3059	0.0500	28
Grid Search	92.454	0.2367	0.0464	0.0055	0.0024	0.3595	0.2942	0.0552	28
Grid Search	92.360	0.2253	0.0557	0.0000	0.0024	0.3595	0.3059	0.0511	28

The grid search obtained a concrete mixture with a maximal predicted compressive strength of 93.807 MPa.

5.2 Random Search

For performing a random search, a grid of solutions with randomly generated values for each variable is created. Values for each variable are randomly generated using the runif() function. Only one cycle of random search was executed.

5.2.1 Create Random Solutions

Create random solutions of a wide range of ingredient combinations to perform a random search. To exclude unfeasible solutions, combinations with too much or too little water content were excluded.

set.seed(1)
Random_Solutions <- expand.grid(
  Cement = runif(18, Cement_min_max[1], Cement_min_max[2]),
  Slag = runif(18, Slag_min_max[1], Slag_min_max[2]),
  Ash = runif(18, Ash_min_max[1], Ash_min_max[2]),
  Superplasticizer = runif(18, Superplasticizer_min_max[1], Superplasticizer_min_max[2]),
  Coarse_Aggregate = runif(18, Coarse_Aggregate_min_max[1], Coarse_Aggregate_min_max[2]),
  Fine_Aggregate = runif(18, Fine_Aggregate_min_max[1], Fine_Aggregate_min_max[2])
  )

# Set threshold for minimum and maximum solids: remove solutions with too little or too much water
minimum_solids <- 1-maximum_water 
maximum_solids <- 1-minimum_water

# Remove solutions with solids content below minimum threshold (too much water) or above maximum threshold (not enough water)
Random_Solutions <- Random_Solutions[rowSums(Random_Solutions) >= minimum_solids & 
                                     rowSums(Random_Solutions) <= maximum_solids, ] 

# Add column for water content
Random_Solutions$Water <- 1-rowSums(Random_Solutions)
Random_Solutions$Age <- 28

# Tabulate summary statistics
Random_Solutions %>% 
  signif(5) %>% 
  gather(key = "Feature", value = "Quantity") %>% 
  dplyr::group_by(Feature) %>% 
  summarise(
    Min = min(Quantity, na.rm = T),
    `1st Quartile` = quantile(Quantity, probs = 0.25, na.rm = T),
    Median = median(Quantity, na.rm = T),
    Mean = mean(Quantity, na.rm = T),
    `3rd Quartile` = quantile(Quantity, probs = 0.75),
    Max = max(Quantity, na.rm = T)
  ) %>% 
  kable(digits = 4, align = "c") %>% 
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Feature	Min	1st Quartile	Median	Mean	3rd Quartile	Max
Age	28.0000	28.0000	28.0000	28.0000	28.0000	28.0000
Ash	0.0022	0.0384	0.0494	0.0469	0.0683	0.0804
Cement	0.0546	0.0817	0.0941	0.1121	0.1392	0.2351
Coarse_Aggregate	0.3499	0.3641	0.3871	0.3930	0.4048	0.4969
Fine_Aggregate	0.2474	0.2781	0.2895	0.3065	0.3309	0.4334
Slag	0.0021	0.0294	0.0537	0.0564	0.0761	0.1476
Superplasticizer	0.0010	0.0041	0.0056	0.0063	0.0091	0.0126
Water	0.0488	0.0620	0.0769	0.0789	0.0948	0.1178

There are 4.59 million different random solutions.

5.2.2 Generate Predictions

Use the model to generate predictions for each concrete mixture:

# Make predictions 
RS_T0 <- Sys.time() # record start time

Random_Solutions$Strength <- predict(avNNet_model_final, Random_Solutions)

RS_T1 <- Sys.time() # record end time
(RS_Time <-  RS_T1 - RS_T0)

Time difference of 1.4248 mins

# Save the n best solutions of each model into a single dataframe
avNNet_RS <- Random_Solutions %>%
  arrange(desc(Strength)) %>%
  .[1:n_best_solutions, ] %>%  # select n best solutions
  mutate(Method = "Random Search") %>% 
  select(Method, Strength, everything())

# Tabulate best solutions
avNNet_RS %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solutions", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_RS))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solutions
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Random Search	90.205	0.2260	0.0537	0.0101	0.0014	0.3537	0.306	0.0492	28
Random Search	89.937	0.2260	0.0422	0.0101	0.0014	0.3641	0.306	0.0503	28
Random Search	89.775	0.2260	0.0537	0.0101	0.0012	0.3537	0.306	0.0494	28
Random Search	89.535	0.2260	0.0537	0.0101	0.0010	0.3537	0.306	0.0496	28
Random Search	89.013	0.2189	0.0422	0.0101	0.0014	0.3641	0.306	0.0574	28

The random search obtained a concrete mixture with a maximal predicted compressive strength of 90.205 MPa.

5.3 Simulated Annealing

Simulated annealing (SA) is a local search method, i.e. a method that focuses its attention within the local neighbourhood of an initial starting point. This method is inspired on the heat treatment of metals where a material is heated above its recrystallization temperature to increase its ductility and then cooled under controlled conditions.

The SA algorithm starts with a starting point (randomly generated or provided) and a specified initial temperature that controls the probability of accepting inferior solutions. At each successive iteration the temperature decreases making the exploration of new and more different solutions less probable.

The GenSA package will be used to optimize the concrete mixture using simulated annealing. The main search parameters for this implementation of SA are:

the initial temperature (temperature)
the maximum number of iterations (maxit)

These parameters can be increased to improve the performance of complex optimization problems.

# Set parameter settings for search algorithm
max_iter <- 500 # maximum number of iterations
pop_size <- 10 # population size

We will use the dataset to generate a set of 10 starting points. These will be selected by creating a subset of the 10 most dissimilar solutions from a randomly selected solution.

# Randomly select one sample from the data as a starting point
set.seed(1)
start_index <- sample(x = 1:nrow(concrete_data), size = 1)
start_point <- concrete_data[start_index, ]

# Select the n most dissimilar observations from the starting point
n_observations <- pop_size
index_starting_observations <- caret::maxDissim(a = start_point, b = concrete_data, n = n_observations)
starting_observations <- concrete_data[index_starting_observations, ]

# Remove water from subset
starting_observations <- starting_observations %>% dplyr::select(-Water)

Because we require all the solutions to add to one, the search procedure will be run without water and the proportion of water will be determined by subtracting the sum of the proportion of all ingredients minus 1.

In order to assess each solution, a custom function to predict the compressive strength of each solution is created.

# Create custom function for assessing solutions
eval_function <- function(x, model, min_water, max_water, age = 28) {

  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]
  
  # Create dataframe with proportion of each solid component
  solution_df <- data.frame(Cement = x1, 
                            Slag = x2, 
                            Ash = x3, 
                            Superplasticizer = x4, 
                            Coarse_Aggregate = x5, 
                            Fine_Aggregate = x6)
  
  # Calculate proportion of water
  solution_df$Water <- 1-rowSums(solution_df) # Water = 1-sum(solids)
  
  # Create death-penalty score for solutions with water content outside acceptable range
  if(solution_df$Water >= min_water & solution_df$Water <= max_water & rowSums(solution_df) == 1) {

    # Add pre-defined age to temporary solution
    solution_df$Age <- age
    
    return(-predict(model, solution_df)) # maximize strength
    
  } else {
    
    return(0)
  }

}

set.seed(1)
SA_output <- starting_observations
SA_output$Water <- NA
SA_output$Strength <- NA
i_max <- NA
trace_max_SA <- NA

SA_T0 <- Sys.time() # record start time

# Repeat the process of finding the optimal solution for each starting observation
for(i in 1:nrow(SA_output)) {
  results <- GenSA::GenSA(
                          par = SA_output[i, 1:6] %>% as.matrix(),
                          fn = eval_function, 
                          lower = lower_limits,
                          upper = upper_limits,
                          control = list(
                            maxit = max_iter/pop_size, 
                            verbose = F),
                          model = avNNet_model_final,
                          min_water = minimum_water,
                          max_water = maximum_water
                          )
  
  # Save the predictions
  SA_output$Strength[i] <- abs(results$value)
  # Save the input variables
  SA_output[i, 1:6] <- results$par

  if (SA_output$Strength[i] == max(SA_output$Strength, na.rm = T)) {
    i_max <- i
    trace_SA_max <- results$trace.mat
  }
}

SA_T1 <- Sys.time() # record end time
(SA_Time <-  SA_T1 - SA_T0)

Time difference of 48.935 secs

# Save parameters of n best solutions
avNNet_SA <- SA_output %>% 
  arrange(desc(Strength)) %>% 
  .[1:n_best_solutions, ] %>%  
  mutate(Method = "Simulated Annealling", 
         Water = 1 - (Cement + Slag + Ash + Superplasticizer + Coarse_Aggregate + Fine_Aggregate), Age = days_aging) %>% 
  select(Method, Strength, everything()) 

# Tabulate best solutions
avNNet_SA %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solutions", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_SA))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solutions
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Age	Water
Simulated Annealling	90.599	0.2035	0.0677	0.0000	0.0027	0.3705	0.2975	28	0.0581
Simulated Annealling	89.706	0.2003	0.0883	0.0185	0.0050	0.3289	0.3074	28	0.0516
Simulated Annealling	89.706	0.2003	0.0883	0.0185	0.0050	0.3289	0.3074	28	0.0516
Simulated Annealling	87.805	0.1905	0.0676	0.0134	0.0047	0.3757	0.2885	28	0.0596
Simulated Annealling	87.805	0.1905	0.0676	0.0134	0.0047	0.3757	0.2885	28	0.0596

Simulated annealing obtained a concrete mixture with a maximal predicted compressive strength of 90.599 MPa.

5.4 Genetic Algorithm

Genetic algorithms (GA) are stochastic, population-based search methods inspired by Darwin’s theory of evolution. Genetic algorithms mimic the evolution of living organisms by creating and using a pool of candidate solutions to search for the global optimum. Although population-based methods tend to require more computations than single-state methods (e.g. hill climbing and simulated annealing), they work better as global optimisation methods as they tend to explore a wider range of regions in the search space.

The algorithm starts by creating a random population of feasible solutions. These are created within minimum and maximum values set by the user for each variable (gene). At each iteration (also known as generation) the individuals are assessed using a fitness function. A pre-defined number or percentage of the best individuals are then selected for the next generation (defined by elitism). This selected cohort can be recombined and modified. The probability of these two events is determined by the crossover and mutation rate, respectively.

Each generation of individuals goes through the steps of assessment, selection, crossover and mutation for a predefined number of iterations or until another stopping criteria is met.

The main parameters of a genetic algorithm are:

population size
elitism: number of best individuals kept
maximum number of iterations
fitness function
mutation rate
crossover rate

A random population seed may also be used for reproducibility.

The GA package developed by Luca Scrucca will be used to optimize the concrete mixture using the genetic algorithm. The package provides the ga() function whose main arguments are:

fitness: function to assess the fitness of the solutions
lower and upper: lower and upper limits for each parameter
popSize: population size
maxiter: maximum number of iterations
pmutation: probability of mutation
pcrossover: probability of crossover
elitism: fraction of best solutions to survive each generation

# Set parameter settings for search algorithm
max_iter <- 500 # maximum number of iterations
pop_size <- 100 # population size

# Create custom function for assessing solutions
eval_function <- function(x, model) {
  
  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]

  # Create dataframe with proportion of each solid component
  solution_df <- data.frame(Cement = x1, 
                            Slag = x2, 
                            Ash = x3, 
                            Superplasticizer = x4, 
                            Coarse_Aggregate = x5, 
                            Fine_Aggregate = x6)
  
  # Calculate proportion of water
  solution_df$Water <- 1-rowSums(solution_df) # Water = 1-sum(solids)
  
  # Create death-penalty score for solutions with water content outside acceptable range
  if(solution_df$Water >= minimum_water & solution_df$Water <= maximum_water & rowSums(solution_df) == 1) {

    # Add pre-defined age to temporary solution
    solution_df$Age <- days_aging
    
    return(predict(model, solution_df))
    
  } else {
    
    return(0)
  }

}

A Local search algorithm is incorporated with the GA by setting optim = TRUE.

set.seed(1)
n_cores <- detectCores()-1

GA_T0 <- Sys.time() # record start time

# Run Genetic Algorithm
GA_output <- GA::ga(
  type = "real-valued", 
  fitness = function(x) eval_function(x, model = avNNet_model_final), 
  lower = lower_limits,
  upper = upper_limits,
  popSize = pop_size, # population size
  maxiter = max_iter, # maximum nuber of iteriation
  pmutation = 0.3, # probability of mutation
  elitism = 0.3, # percentage elitism
  suggestions = starting_point, # Optional: starting point for genetic algorithm
  parallel = n_cores, 
  monitor = FALSE,
  optim = TRUE, # incorporate local search
  optimArgs = list(method = "L-BFGS-B",
                    poptim = 0.20,
                    pressel = 0.5,
                    control = list(fnscale = -1, maxit = 100)), # parameters for local search
  seed = 1
  )

GA_T1 <- Sys.time() # record end time
(GA_Time <-  GA_T1 - GA_T0)

Time difference of 6.0873 mins

# Print summary of GA
summary(GA_output)

-- Genetic Algorithm ------------------- 

GA settings: 
Type                  =  real-valued 
Population size       =  100 
Number of generations =  500 
Elitism               =  0.3 
Crossover probability =  0.8 
Mutation probability  =  0.3 
Search domain = 
          x1     x2     x3     x4     x5     x6
lower 0.0426 0.0000 0.0000 0.0000 0.3286 0.2356
upper 0.2367 0.1579 0.0933 0.0138 0.5038 0.4349
Suggestions = 
       x1       x2       x3        x4      x5      x6
1 0.11955 0.031643 0.023173 0.0026203 0.41517 0.33012

GA results: 
Iterations             = 500 
Fitness function value = 88.7 
Solution = 
          x1       x2      x3       x4      x5      x6
[1,] 0.18084 0.067048 0.01013 0.004546 0.37448 0.30375

GA_summary <- GA_output@summary %>% 
  .[,c("max", "mean", "median")] %>% 
  data.frame() %>% 
  select(Best = max, Mean = mean, Median = median) %>% 
  mutate(Iteration = 1:max_iter) 

# Plot results
GA_summary %>% 
  gather(key = "Parameter", value = "Value", -Iteration) %>% 
  ggplot(mapping = aes(x = Iteration, y = Value, col = Parameter)) +
    geom_line() +
    theme_bw() + 
    theme(aspect.ratio = 0.7) +
    scale_color_brewer(type = "qual", palette = "Set1") +
    labs(x = "Iteration", y = "Compressive Strength (Predicted)", title = "Best predicted compressive strength at each iteration", subtitle = "Results using Genetic Algorithm")

# Save best solution
avNNet_GA <- data.frame(Method = "Genetic Algorithm",
                        Strength = GA_output@fitnessValue,
                        Cement = GA_output@solution[1], 
                        Slag = GA_output@solution[2], 
                        Ash = GA_output@solution[3], 
                        Superplasticizer = GA_output@solution[4], 
                        Coarse_Aggregate = GA_output@solution[5], 
                        Fine_Aggregate = GA_output@solution[6]) %>% 
  mutate(Water = 1 - (Cement + Slag + Ash + Superplasticizer + Coarse_Aggregate + Fine_Aggregate), 
         Age = days_aging)

# Tabulate best solutions
avNNet_GA %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solutions", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_GA))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solutions
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Genetic Algorithm	88.7	0.1808	0.067	0.0101	0.0045	0.3745	0.3037	0.0592	28

The genetic algorithm obtained a concrete mixture with a maximal predicted compressive strength of 88.7 MPa.

5.5 Islands Genetic Algorithm

Islands Genetic Algorithms (GAISL) are a variant of Genetic Algorithms where the population of solutions is partitioned in a set of subpopulations. Each island performs it runs its own GA with occasional individuals being transfered from one island to another.

The GAISL is run using the gaisl() function from the GA package. In addition to the arguments of the ga() function, there are additional argument to control the number of islands and migrate rate:

numIslands: number of islands
migrationRate: proportion of individuals that migrate between islands in each exchange
migrationInterval: number of iterations at which exchanges are performed.

# Set parameter settings for search algorithm
max_iter <- 500 # maximum number of iterations
pop_size <- 100 # population size

A Local search algorithm is incorporated with the GAISL algorithm by setting optim = TRUE. Four islands will be used (numIslands = 4) with a migration rate (migrationRate) of 0.1 done at each 10 interations (migrationInterval = 10).

set.seed(1)
n_cores <- detectCores()-1

GAISL_T0 <- Sys.time() # record start time

# run GAISL algorithm
GAISL_output <- GA::gaisl(
  type = "real-valued", 
  fitness = function(x) eval_function(x, model = avNNet_model_final), 
  lower = lower_limits,
  upper = upper_limits,
  popSize = pop_size, # population size
  maxiter = max_iter, # maximum nuber of iteriation
  pmutation = 0.3, # probability of mutation
  elitism = 0.3, # percentage elitism
  numIslands = 4, 
  migrationRate = 0.1, 
  migrationInterval = 10,
  monitor = F,
  # suggestions = starting_point, # Optional: starting point for genetic algorithm
  parallel = n_cores,
  optim = T, # perform local search
  optimArgs = list(method = "L-BFGS-B",
                    poptim = 0.20,
                    pressel = 0.5,
                    control = list(fnscale = -1, maxit = 100)), # paramaters for local search
  seed = 1)

GAISL_T1 <- Sys.time() # record end time
(GAISL_Time <-  GAISL_T1 - GAISL_T0)

Time difference of 5.2167 mins

summary(GAISL_output)

-- Islands Genetic Algorithm ----------- 

GA settings: 
Type                  =  real-valued 
Number of islands     =  4 
Islands pop. size     =  25 
Migration rate        =  0.1 
Migration interval    =  10 
Elitism               =  0.3 
Crossover probability =  0.8 
Mutation probability  =  0.3 
Search domain = 
          x1     x2     x3     x4     x5     x6
lower 0.0426 0.0000 0.0000 0.0000 0.3286 0.2356
upper 0.2367 0.1579 0.0933 0.0138 0.5038 0.4349

GA results: 
Iterations              = 500 
Epochs                  = 50 
Fitness function values = 89.602 89.602 89.602 89.602 
Solutions = 
        x1       x2        x3        x4      x5      x6
[1,] 0.207 0.080108 0.0094798 0.0033201 0.38825 0.26303
[2,] 0.207 0.080108 0.0094798 0.0033201 0.38825 0.26303
[3,] 0.207 0.080108 0.0094798 0.0033201 0.38825 0.26303
[4,] 0.207 0.080108 0.0094798 0.0033201 0.38825 0.26303

# Save trace data from each island
GAISL_summary <- data.frame(Iteration = 1:max_iter,
                            Island_1 = GAISL_output@summary[[1]][,"max"],
                            Island_2 = GAISL_output@summary[[2]][,"max"],
                            Island_3 = GAISL_output@summary[[3]][,"max"],
                            Island_4 = GAISL_output@summary[[4]][,"max"])

# Plot trace data for each island
GAISL_summary %>% 
  gather(key = "Island", value = "Average", -Iteration) %>% 
  ggplot(mapping = aes(x = Iteration, y = Average, col = Island)) +
    geom_line(size = 0.75, alpha = 0.8) +
    theme_bw() + 
    theme(aspect.ratio = 0.5) +
    scale_color_brewer(type = "qual", palette = "Set1") +
    labs(x = "Iteration", y = "Compressive Strength (Predicted)", title = "Best predicted compressive strength at each iteration", subtitle = "Results using Islands Genetic Algorithm (GAISL)")

# Save best solution
avNNet_GAISL <- data.frame(Method = "Islands Genetic Algorithm",
                        Strength = GAISL_output@fitnessValue,
                        Cement = GAISL_output@solution[1], 
                        Slag = GAISL_output@solution[2], 
                        Ash = GAISL_output@solution[3], 
                        Superplasticizer = GAISL_output@solution[4], 
                        Coarse_Aggregate = GAISL_output@solution[5], 
                        Fine_Aggregate = GAISL_output@solution[6]) %>% 
  mutate(Water = 1 - (Cement + Slag + Ash + Superplasticizer + Coarse_Aggregate + Fine_Aggregate), 
         Age = days_aging)


# Tabulate best solution(s)
avNNet_GAISL %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solution", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_GAISL))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solution
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Islands Genetic Algorithm	89.602	0.207	0.0801	0.0095	0.0033	0.3882	0.263	0.0488	28

The islands genetic algorithm obtained a concrete mixture with a maximal predicted compressive strength of 89.602 MPa.

5.6 Differential Evolution

Differential Evolution (DE) is a population-based search method for multidimensional real-valued functions. Similar to genetic algorithms, DE uses a population of solutions and creates new candidates solutions from parent solutions.

The main difference between genetic algorithms and differential evolution is regarding how new candidate solutions are created. In the latter method, new solutions are created by differential mutation of the population members. Three candidate solutions (e.g. a, b and c) are randomly selected and a mutant parameter vector is created using a simple arithmetic formula, y = a + F(b-c), where F is a positive scaling factor that typically ranges from 0 to 1. This process is repeated for each dimension. The final mutant vector is then used for recombination.

Search for the global minimum of the 2D Ackley function with Differential Evolution. Source: Wikipedia

The DEoptim package will be used for searching optimal concrete mixtures with differential evolution (DE). The package provides the DEoptim() function whose main arguments are:

fn: function to assess the fitness of the solutions
lower and upper: lower and upper limits for each parameter
control: list of control parameters for search method (population size, propability of crossover, maximum number of iterations, etc.)

# Set parameter settings for search algorithm
max_iter <- 500 # maximum number of iterations
pop_size <- 100 # population size

Prior to run the search method, a custom objective function is created.

# Create custom function for assessing solutions
eval_function <- function(x, model, min_water, max_water, age = 28) {

  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]
  
  # Create dataframe with proportion of each solid component
  solution_df <- data.frame(Cement = x1, 
                            Slag = x2, 
                            Ash = x3, 
                            Superplasticizer = x4, 
                            Coarse_Aggregate = x5, 
                            Fine_Aggregate = x6)
  
  # Calculate proportion of water
  solution_df$Water <- 1-rowSums(solution_df) # Water = 1-sum(solids)
  
  # Create death-penalty score for solutions with water content outside acceptable range
  if(solution_df$Water >= min_water & solution_df$Water <= max_water & rowSums(solution_df) == 1) {

    # Add pre-defined age to temporary solution
    solution_df$Age <- age
    
    return(-predict(model, solution_df))
    
  } else {
    
    return(0)
  }

}

set.seed(1)
n_cores <- detectCores()-1

DE_T0 <- Sys.time() # record start time

# Run differential evolution algorithm
DE_output <- DEoptim::DEoptim(
  fn = eval_function,
  lower = lower_limits,
  upper = upper_limits,
  control = DEoptim.control(
                            NP = pop_size, # population size
                            itermax = max_iter, # maximum number of iterations
                            CR = 0.5, # probability of crossover
                            F = 0.8, # differential weighting factor
                            storepopfreq = 1 , # store every population
                            parallelType = 1, # run parallel processing
                            trace = F
                            ),
  model = avNNet_model_final,
  min_water = minimum_water,
  max_water = maximum_water
  )

DE_T1 <- Sys.time() # record end time
(DE_Time <- DE_T1-DE_T0)

Time difference of 2.6333 mins

# Print search results
summary(DE_output)


***** summary of DEoptim object ***** 
best member   :  0.23665 0.04695 0 0.00223 0.35646 0.30419 
best value    :  -95.67 
after         :  500 generations 
fn evaluated  :  1002 times 
*************************************

DE_solution <- DE_output$optim$bestmem

# Save parameters of best solution
avNNet_DE <- data.frame(Method = "Differential Evolution",
                        Strength = abs(DE_output$optim$bestval),
                        Cement = DE_solution[1], 
                        Slag = DE_solution[2], 
                        Ash = DE_solution[3], 
                        Superplasticizer = DE_solution[4], 
                        Coarse_Aggregate = DE_solution[5], 
                        Fine_Aggregate = DE_solution[6]) %>% 
  mutate(Water = 1 - (Cement + Slag + Ash + Superplasticizer + Coarse_Aggregate + Fine_Aggregate), 
         Age = days_aging)

# Plot results
ggplot(mapping = aes(x = 1:length(DE_output$member$bestvalit), y = abs(DE_output$member$bestvalit))) +
    geom_line(col = "#08519c", size = 1) + 
    theme_bw() +
    theme(aspect.ratio = 0.7) +
    labs(x = "Iteration", y = "Compressive Strength (Predicted)", title = "Best predicted compressive strength at each iteration", subtitle = "Results using Differential Evolution")

# Tabulate best solution(s)
avNNet_DE %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solution", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_DE))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solution
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Differential Evolution	95.67	0.2366	0.047	0	0.0022	0.3565	0.3042	0.0535	28

The search was completed and found the best value to be 95.67 after 500 iterations and 1002 function evaluations.

5.7 Particle Swarm Optimization

Particle swarm optimization (PSO) is a population-based search method that belongs to the swarm intelligence family of algorithms. Proposed by Kenned and Eberhart (1995), this method is inspired by the swarm behavior of several animals such as bird flocks, fish schools and bee swarms.

The PSO method searches for the optimal solution by using a population of candidate solutions (also known as particles) that iteratively move across the search space. The movement of a specific particle across the search space is determined by its local best known position but also by the best known position in the search space found by other particles. This results in the whole swarm moving in a self-organized behaviour.

Each particle is defined by its:

position
fitness value
velocity
previous best position
previous best position in the neighbourhood

The position of a particle on the next iteration depends on its current position and velocity, while the velocity depends on all of the parameters that define the particle.

A swarm of particles searching for the global minimum. Source: Wikipedia

The PSO package will be used to optimize the concrete mixture using particle swarm optimization through the psoptim() function. Prior to run the search method, a custom objective function is created.

# Set parameter settings for search algorithm
max_iter <- 500 # maximum number of iterations
pop_size <- 100 # population size

# Create custom function for assessing solutions
eval_function <- function(x, model, min_water, max_water, age = 28) {

  x1 <- x[1]; x2 <- x[2]; x3 <- x[3]; x4 <- x[4]; x5 <- x[5]; x6 <- x[6]
  
  # Create dataframe with proportion of each solid component
  solution_df <- data.frame(Cement = x1, 
                            Slag = x2, 
                            Ash = x3, 
                            Superplasticizer = x4, 
                            Coarse_Aggregate = x5, 
                            Fine_Aggregate = x6)
  
  # Calculate proportion of water
  solution_df$Water <- 1-rowSums(solution_df) # Water = 1-sum(solids)
  
  # Create death-penalty score for solutions with water content outside acceptable range
  if(solution_df$Water >= min_water & solution_df$Water <= max_water & rowSums(solution_df) == 1) {

    # Add pre-defined age to temporary solution
    solution_df$Age <- age
    
    return(-predict(model, solution_df))
    
  } else {
    
    return(0)
  }

}

The psoptim() function from the pso package is used to run the search algorithm. The SPSO2011 method will be used for this search.

set.seed(1)
n_cores <- detectCores()-1

PSO_T0 <- Sys.time() # record start time

# Run differential evolution algorithm
PSO_output <- pso::psoptim(
  par = rep(NA, 6),
  fn = eval_function,
  lower = lower_limits,
  upper = upper_limits,
  control = list(
                trace = 1, #  produce tracing information on the progress of the optimization
                maxit = max_iter, # maximum number of iterations
                REPORT = 1, #  frequency for reports
                trace.stats = T,
                s = pop_size, # Swarm Size,
                maxit.stagnate = round(0.75*max_iter), # maximum number of iterations without improvement
                vectorize = T,
                type = "SPSO2011" # method used
                ),
  model = avNNet_model_final,
  min_water = minimum_water,
  max_water = maximum_water
  )

PSO_T1 <- Sys.time() # record end time
(PSO_Time <- PSO_T1-PSO_T0)

Time difference of 1.8913 mins

PSO_solution <- PSO_output$par

avNNet_PSO <- data.frame(Method = "Particle Swarm Optimization",
                        Strength = abs(PSO_output$value),
                        Cement = PSO_solution[1],
                        Slag = PSO_solution[2],
                        Ash = PSO_solution[3],
                        Superplasticizer = PSO_solution[4],
                        Coarse_Aggregate = PSO_solution[5],
                        Fine_Aggregate = PSO_solution[6]) %>%
  mutate(Water = 1 - (Cement + Slag + Ash + Superplasticizer + Coarse_Aggregate + Fine_Aggregate),
         Age = days_aging)

# Plot results
PSO_summary <- data.frame(
                          Iteration = PSO_output$stats$it,
                          Mean = PSO_output$stats$f %>% sapply(FUN = mean) %>% abs(),
                          Median = PSO_output$stats$f %>% sapply(FUN = median) %>% abs(),
                          Best = PSO_output$stats$error %>% sapply(FUN = min) %>% abs()
                          )

PSO_summary %>% 
  gather(key = "Parameter", value = "Value", -Iteration) %>% 
  ggplot(mapping = aes(x = Iteration, y = abs(Value), col = Parameter)) +
    geom_line(size = 0.7) +
    theme_bw() +
    theme(aspect.ratio = 0.9) +
    labs(x = "Iteration", y = "Compressive Strength (Predicted)", title = "Best predicted compressive strength at each iteration", subtitle = "Results using Particle Swarm Optimization") +
    scale_color_brewer(type = "qual", palette = "Set1")

# Tabulate best solution(s)
avNNet_PSO %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solution", align = "c", digits = 4, col.names = gsub("_", " ", colnames(avNNet_PSO))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solution
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Particle Swarm Optimization	88.695	0.1814	0.0671	0.01	0.0045	0.3743	0.3036	0.0591	28

The search was completed a found the best predicted compressive strength to be 88.695 after 500 iterations and 50000 function evaluations.

6 Analysing Solutions

The table below summarises the values of predicted compressive strength for each method used and their respective concrete mixtures.

predictor_ID <- c("Cement", "Slag", "Ash", "Superplasticizer", "Coarse_Aggregate", "Fine_Aggregate", "Water")

summary_table <- bind_rows(avNNet_GS, avNNet_RS, avNNet_SA, avNNet_GA, avNNet_GAISL, avNNet_DE, avNNet_PSO) 

summary_table %>% 
  arrange(desc(Strength)) %>% 
  knitr::kable(caption = "Summary of the best solutions.", align = "c", digits = 3, col.names = gsub("_", " ", colnames(summary_table))) %>%
  kableExtra::kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "center")

Summary of the best solutions.
Method	Strength	Cement	Slag	Ash	Superplasticizer	Coarse Aggregate	Fine Aggregate	Water	Age
Differential Evolution	95.670	0.237	0.047	0.000	0.002	0.356	0.304	0.054	28
Grid Search	93.807	0.225	0.046	0.005	0.002	0.360	0.306	0.055	28
Grid Search	93.615	0.225	0.046	0.011	0.002	0.360	0.306	0.049	28
Grid Search	92.516	0.237	0.056	0.000	0.002	0.349	0.306	0.050	28
Grid Search	92.454	0.237	0.046	0.005	0.002	0.360	0.294	0.055	28
Grid Search	92.360	0.225	0.056	0.000	0.002	0.360	0.306	0.051	28
Simulated Annealling	90.599	0.204	0.068	0.000	0.003	0.371	0.298	0.058	28
Random Search	90.205	0.226	0.054	0.010	0.001	0.354	0.306	0.049	28
Random Search	89.937	0.226	0.042	0.010	0.001	0.364	0.306	0.050	28
Random Search	89.775	0.226	0.054	0.010	0.001	0.354	0.306	0.049	28
Simulated Annealling	89.706	0.200	0.088	0.018	0.005	0.329	0.307	0.052	28
Simulated Annealling	89.706	0.200	0.088	0.018	0.005	0.329	0.307	0.052	28
Islands Genetic Algorithm	89.602	0.207	0.080	0.009	0.003	0.388	0.263	0.049	28
Random Search	89.535	0.226	0.054	0.010	0.001	0.354	0.306	0.050	28
Random Search	89.013	0.219	0.042	0.010	0.001	0.364	0.306	0.057	28
Genetic Algorithm	88.700	0.181	0.067	0.010	0.005	0.374	0.304	0.059	28
Particle Swarm Optimization	88.695	0.181	0.067	0.010	0.005	0.374	0.304	0.059	28
Simulated Annealling	87.805	0.191	0.068	0.013	0.005	0.376	0.288	0.060	28
Simulated Annealling	87.805	0.191	0.068	0.013	0.005	0.376	0.288	0.060	28

The PCA plot below shows how the best solutions of each model differ and cluster.

PCA <- prcomp(summary_table[, predictor_ID], center = TRUE, scale = TRUE) # Age needs to be removed from the PCA matrix (zero-variance)

autoplot(PCA, 
         data = summary_table, 
         size = "Strength",
         colour = "Method", alpha = 0.7, 
         loadings = TRUE, loadings.label = TRUE, loadings.colour = "grey10", loadings.label.colour = "grey10", loadings.label.size = 3, 
         Loadings.label.label = c("Cement", "Slag", "Ash", "Superplasticizer", "Coarse Aggregate", "Fine Aggregate", "Water")) +
  scale_color_brewer(type = "qual", palette = "Set1") +
  labs(title = "PCA plot of best concrete mixtures for each search method", subtitle = "Strength values at 28 days.", size = "Predicted Strength\n(MPa)", caption = NULL)  +
  theme_bw() +
  theme(aspect.ratio = 0.9, legend.text = element_text(size = 8.5), legend.title = element_text(face = "bold", size = 9), axis.title = element_text(size = 9))

The PCA plot above shows that there are some differences between solutions provided by each search method. Solutions with higher strength leverage the use of cement for obtaining higher predicted compressive strengths.

The solution obtained with differential evolution had the highest predicted strength and based on the PCA plot above it composition is similar to the composition of solutions obtained grid and random search.

7 References

Cortez, P. (2014) Modern Optimization with R, DOI 10.1007/978-3-319-08263-9
Scrucca, L. (2013) GA: A Package for Genetic Algorithms in R. Journal of Statistical Software, 53/4, 1-37. doi:http://dx.doi.org/10.18637/jss.v053.i04
Storn, R.; Price, K. (1997). Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 11 (4): 341–359. doi:10.1023/A:1008202821328
Mullen, K., Ardia, D., Gil, D., Windover, D., & Cline, J. (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1 - 26. doi:http://dx.doi.org/10.18637/jss.v040.i06
Kennedy, J., Eberhart, R. (1995). Particle Swarm Optimization. Proceedings of IEEE International Conference on Neural Networks. IV. pp. 1942–1948. doi:10.1109/ICNN.1995.488968
M. R. Bonyadi and Z. Michalewicz, Analysis of Stability, Local Convergence, and Transformation Sensitivity of a Variant of the Particle Swarm Optimization Algorithm, in IEEE Transactions on Evolutionary Computation, vol. 20, no. 3, pp. 370-385, June 2016. doi: 10.1109/TEVC.2015.2460753

8 Additional Information

sessionInfo()

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.1252 
[2] LC_CTYPE=English_United Kingdom.1252   
[3] LC_MONETARY=English_United Kingdom.1252
[4] LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] kableExtra_1.0.1  knitr_1.22        broom_0.5.1      
 [4] ggfortify_0.4.5   GGally_1.4.0      DEoptim_2.2-4    
 [7] GenSA_1.1.7       GA_3.2            pso_1.0.3        
[10] gridExtra_2.3     doParallel_1.0.14 iterators_1.0.10 
[13] foreach_1.4.4     readxl_1.3.0      tidyselect_0.2.5 
[16] forcats_0.4.0     stringr_1.4.0     dplyr_0.8.0.1    
[19] purrr_0.3.0       readr_1.3.1       tidyr_0.8.3      
[22] tibble_2.0.1      tidyverse_1.2.1   caret_6.0-81     
[25] ggplot2_3.1.0     lattice_0.20-38   plyr_1.8.4       
[28] pacman_0.5.1     

loaded via a namespace (and not attached):
 [1] httr_1.4.0         viridisLite_0.3.0  jsonlite_1.6      
 [4] splines_3.5.1      prodlim_2018.04.18 modelr_0.1.4      
 [7] assertthat_0.2.0   highr_0.7          stats4_3.5.1      
[10] cellranger_1.1.0   yaml_2.2.0         ipred_0.9-8       
[13] pillar_1.3.1       backports_1.1.3    glue_1.3.0        
[16] digest_0.6.18      RColorBrewer_1.1-2 rvest_0.3.2       
[19] colorspace_1.4-0   recipes_0.1.4      htmltools_0.3.6   
[22] Matrix_1.2-15      timeDate_3043.102  pkgconfig_2.0.2   
[25] haven_2.1.0        webshot_0.5.1      scales_1.0.0      
[28] gower_0.1.2        lava_1.6.5         proxy_0.4-22      
[31] generics_0.0.2     withr_2.1.2        nnet_7.3-12       
[34] lazyeval_0.2.1     cli_1.0.1          survival_2.43-3   
[37] magrittr_1.5       crayon_1.3.4       evaluate_0.13     
[40] fansi_0.4.0        nlme_3.1-137       MASS_7.3-51.1     
[43] xml2_1.2.0         class_7.3-15       tools_3.5.1       
[46] data.table_1.12.0  hms_0.4.2          munsell_0.5.0     
[49] compiler_3.5.1     rlang_0.3.1        grid_3.5.1        
[52] rstudioapi_0.9.0   labeling_0.3       rmarkdown_1.12    
[55] gtable_0.2.0       ModelMetrics_1.2.2 codetools_0.2-16  
[58] reshape_0.8.8      reshape2_1.4.3     R6_2.4.0          
[61] lubridate_1.7.4    utf8_1.1.4         stringi_1.3.1     
[64] Rcpp_1.0.0         rpart_4.1-13       xfun_0.5

Exploring the Use of Modern Search Methods in R for Maximizing the Compressive Strength of Concrete

Use and Comparison of Local and Population-based Search Methods for Concrete Mixture Optimization

Jean Dos Santos

04 April 2019