Concrete is an integral part of most industrialized societies. It is used to some extent in nearly all structures and in many roads. One of the main properties of interest (besides cost) is the compressive strength of the hardened concrete. The composition of many concretes includes a number of dry ingredients that are mixed with water and then allowed to dry and harden. Given its abundance and critical role in infrastructure, the composition is important and has been widely studied. The objective of this assignment is to create models that help to find potential recipes to maximize compressive strength.
A standard type of experimental setup for this scenario is called a mixture design. Here, boundaries on the upper and lower limits on the mixture proportion for each ingredient are used to create multiple mixtures that methodically fill the space within the boundaries. The ingredients used in the experimental setup were:
There is also an additional non-mixture factor related to compressive strength: the age of the mixture (at testing). Since this is not an ingredient, it is usually referred to as a process factor.
Create a predictive model for the compressive strength of different
concrete mixtures. The trainingdata data set can be used to train the
model. The testdata data set contains the mixtures of which the
compressive strength needs to be predicted. Write the compressive
strength of the mixture to a file with write.csv such that it only
contains the index and the compressive strength. This can be done by
creating a data frame with the index and the compressive strength only
(say, e.g., predictions) and then issuing the command:
write.csv(predictions, file = "predictions.csv", row.names = FALSE)
trainingdata.csvtestdata.csvBelow is a brief summary of the methods I used including the effect on the predictions.
Decrease in model accuracy:
Increase in model accuracy:
I think it is remarkable that all the adjustments to the training data led to a decrease in model accuracy. At the end, putting all the data in the most complex model gave the best results.
library(doParallel)
# set working directory
setwd("C:/directories/r_directory/pdm")
# start core cluster
n_cores <- detectCores() -1
registerDoParallel(cores = n_cores)
cl <- makeCluster(n_cores)
registerDoParallel(cl)
# import training data
dataset <- read.csv("trainingdata.csv")
# set predictor values
predictors <- dataset[,1:8]
# set response values
response <- dataset[,9]
# import test data
testdata <- read.csv("testdata.csv")
# save index
index_testdata <- testdata[,1]
# test data without index
testdata <- testdata[,2:9]
library(skimr)
library(DataExplorer)
library(ggthemes)
library(ggplot2)
library(GGally)
library(PerformanceAnalytics)
# general overview data set
head(dataset)
## Cement BlastFurnaceSlag FlyAsh Water Superplasticizer CoarseAggregate
## 1 540.0 0.0 0 162 2.5 1055
## 2 332.5 142.5 0 228 0.0 932
## 3 332.5 142.5 0 228 0.0 932
## 4 266.0 114.0 0 228 0.0 932
## 5 380.0 95.0 0 228 0.0 932
## 6 266.0 114.0 0 228 0.0 932
## FineAggregate Age CompressiveStrength
## 1 676 28 61.89
## 2 594 270 40.27
## 3 594 365 41.05
## 4 670 90 47.03
## 5 594 365 43.70
## 6 670 28 45.85
skim(dataset)
| Name | dataset |
| Number of rows | 900 |
| Number of columns | 9 |
| _______________________ | |
| Column type frequency: | |
| numeric | 9 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| Cement | 0 | 1 | 281.14 | 105.76 | 102.00 | 190.70 | 266.10 | 350.00 | 540.0 | ▆▇▆▃▂ |
| BlastFurnaceSlag | 0 | 1 | 73.00 | 85.67 | 0.00 | 0.00 | 21.00 | 142.50 | 359.4 | ▇▂▃▁▁ |
| FlyAsh | 0 | 1 | 54.93 | 64.21 | 0.00 | 0.00 | 0.00 | 118.30 | 200.1 | ▇▁▂▂▁ |
| Water | 0 | 1 | 181.58 | 21.37 | 121.80 | 164.90 | 185.00 | 192.00 | 247.0 | ▁▅▇▂▁ |
| Superplasticizer | 0 | 1 | 6.25 | 5.97 | 0.00 | 0.00 | 6.50 | 10.30 | 32.2 | ▇▇▁▁▁ |
| CoarseAggregate | 0 | 1 | 973.29 | 78.09 | 801.00 | 932.00 | 968.00 | 1029.40 | 1145.0 | ▃▅▇▅▂ |
| FineAggregate | 0 | 1 | 773.16 | 80.84 | 594.00 | 728.68 | 778.90 | 824.00 | 992.6 | ▂▃▇▃▁ |
| Age | 0 | 1 | 45.61 | 63.36 | 1.00 | 7.00 | 28.00 | 56.00 | 365.0 | ▇▁▁▁▁ |
| CompressiveStrength | 0 | 1 | 35.55 | 16.69 | 2.33 | 23.61 | 33.95 | 45.37 | 82.6 | ▅▇▇▃▂ |
summary(dataset)
## Cement BlastFurnaceSlag FlyAsh Water
## Min. :102.0 Min. : 0.0 Min. : 0.00 Min. :121.8
## 1st Qu.:190.7 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.:164.9
## Median :266.1 Median : 21.0 Median : 0.00 Median :185.0
## Mean :281.1 Mean : 73.0 Mean : 54.93 Mean :181.6
## 3rd Qu.:350.0 3rd Qu.:142.5 3rd Qu.:118.30 3rd Qu.:192.0
## Max. :540.0 Max. :359.4 Max. :200.10 Max. :247.0
## Superplasticizer CoarseAggregate FineAggregate Age
## Min. : 0.000 Min. : 801.0 Min. :594.0 Min. : 1.00
## 1st Qu.: 0.000 1st Qu.: 932.0 1st Qu.:728.7 1st Qu.: 7.00
## Median : 6.500 Median : 968.0 Median :778.9 Median : 28.00
## Mean : 6.246 Mean : 973.3 Mean :773.2 Mean : 45.61
## 3rd Qu.:10.300 3rd Qu.:1029.4 3rd Qu.:824.0 3rd Qu.: 56.00
## Max. :32.200 Max. :1145.0 Max. :992.6 Max. :365.00
## CompressiveStrength
## Min. : 2.33
## 1st Qu.:23.61
## Median :33.95
## Mean :35.55
## 3rd Qu.:45.37
## Max. :82.60
# pair plot
ggpairs(dataset)
# histograms
plot_histogram(dataset, ggtheme=theme_few())
# box plots
boxplot(dataset, col = "blue")
# scatter plots by CompressiveStrength
plot_scatterplot(dataset, by = "CompressiveStrength", ggtheme=theme_few())
# correlation plot
plot_correlation(cor(dataset), ggtheme=theme_few())
# count zeros per column and calculate as percentage from total to confirm the number of zeros and the percentage of the total values in the column
zero_count <- colSums(dataset==0)
zero_count
## Cement BlastFurnaceSlag FlyAsh Water
## 0 417 490 0
## Superplasticizer CoarseAggregate FineAggregate Age
## 327 0 0 0
## CompressiveStrength
## 0
zero_pct <- colSums(dataset==0)/nrow(dataset)*100
zero_pct
## Cement BlastFurnaceSlag FlyAsh Water
## 0.00000 46.33333 54.44444 0.00000
## Superplasticizer CoarseAggregate FineAggregate Age
## 36.33333 0.00000 0.00000 0.00000
## CompressiveStrength
## 0.00000
# skewness
skewness(dataset)
## Cement BlastFurnaceSlag FlyAsh Water Superplasticizer
## Skewness 0.5247352 0.7951561 0.5148602 0.05496505 0.896554
## CoarseAggregate FineAggregate Age CompressiveStrength
## Skewness -0.01441195 -0.2375712 3.224019 0.4450269
plot_qq(dataset, ggtheme=theme_few())
library(caret)
library(mlbench)
library(tidyverse)
library(dplyr)
library(dlookr)
# check if there is near zero variance
nzv <- nearZeroVar(predictors, saveMetrics = TRUE)
nzv
## freqRatio percentUnique zeroVar nzv
## Cement 1.125000 29.555556 FALSE FALSE
## BlastFurnaceSlag 15.444444 19.222222 FALSE FALSE
## FlyAsh 30.625000 16.333333 FALSE FALSE
## Water 2.191489 20.777778 FALSE FALSE
## Superplasticizer 10.218750 11.777778 FALSE FALSE
## CoarseAggregate 1.225000 30.000000 FALSE FALSE
## FineAggregate 1.000000 32.222222 FALSE FALSE
## Age 3.066116 1.555556 FALSE FALSE
#no near zero variance so we don't use nzv as method in preProc
# check if there are linear dependencies
lindep <- findLinearCombos(predictors)
lindep
## $linearCombos
## list()
##
## $remove
## NULL
#no linear dependencies
# check if there is significant correlation based on a .75 cutoff, to confirm what we have seen in the plots earlier
cor <- cor(predictors)
cor
## Cement BlastFurnaceSlag FlyAsh Water
## Cement 1.00000000 -0.27613504 -0.40435177 -0.07700660
## BlastFurnaceSlag -0.27613504 1.00000000 -0.31911721 0.09937857
## FlyAsh -0.40435177 -0.31911721 1.00000000 -0.23894626
## Water -0.07700660 0.09937857 -0.23894626 1.00000000
## Superplasticizer 0.08073602 0.05986934 0.36711238 -0.65179537
## CoarseAggregate -0.10723333 -0.28199425 -0.01368126 -0.18203813
## FineAggregate -0.23137835 -0.26633897 0.07629013 -0.45991660
## Age 0.08642213 -0.04364486 -0.16027293 0.27899737
## Superplasticizer CoarseAggregate FineAggregate Age
## Cement 0.08073602 -0.107233332 -0.23137835 0.086422133
## BlastFurnaceSlag 0.05986934 -0.281994254 -0.26633897 -0.043644863
## FlyAsh 0.36711238 -0.013681262 0.07629013 -0.160272926
## Water -0.65179537 -0.182038132 -0.45991660 0.278997375
## Superplasticizer 1.00000000 -0.266468752 0.22047749 -0.192093136
## CoarseAggregate -0.26646875 1.000000000 -0.18317900 -0.004315883
## FineAggregate 0.22047749 -0.183178995 1.00000000 -0.157180793
## Age -0.19209314 -0.004315883 -0.15718079 1.000000000
high_cor <- findCorrelation(cor, cutoff = .75)
high_cor
## integer(0)
#no features with a correlation at a cutoff of .75. This confirms the conclusion based on the correlation plot. We will not use corr as method in preProc and we will keep all the features
# feature importance using Recursive Feature Elimination (RFE)
set.seed(123)
ctrl <- rfeControl(functions = rfFuncs,
method = "cv",
number = 10,
repeats = 10,
verbose = FALSE)
lmProfile <- rfe(x = predictors,
y = response,
sizes = c(1:8),
rfeControl = ctrl)
lmProfile
##
## Recursive feature selection
##
## Outer resampling method: Cross-Validated (10 fold)
##
## Resampling performance over subset size:
##
## Variables RMSE Rsquared MAE RMSESD RsquaredSD MAESD Selected
## 1 12.820 0.4061 10.190 1.0823 0.10511 0.9413
## 2 8.992 0.7101 7.035 0.9194 0.06239 0.7053
## 3 7.292 0.8216 5.638 0.8096 0.05095 0.3940
## 4 6.527 0.8729 5.070 0.6500 0.03824 0.2964
## 5 6.401 0.8896 5.021 0.6837 0.03395 0.3438
## 6 5.189 0.9125 3.858 0.7156 0.02927 0.2802 *
## 7 5.390 0.9093 3.986 0.7370 0.03216 0.2763
## 8 5.552 0.9065 4.138 0.7181 0.03302 0.2689
##
## The top 5 variables (out of 6):
## Age, Cement, Water, FineAggregate, BlastFurnaceSlag
predictors(lmProfile)
## [1] "Age" "Cement" "Water" "FineAggregate"
## [5] "BlastFurnaceSlag" "Superplasticizer"
plot(lmProfile, type = c("g", "o"))
#based on the RFE Age, Cement, Water, FineAggregate and BlastFurnanceSlag are the most important features
# set training parameters
ctrl <- trainControl(method = "repeatedcv",
number = 20,
repeats = 20)
ctrl_cv <- trainControl(method = "cv",
number = 20)
ctrl_xgb = trainControl(method = "cv",
number = 20,
search = "grid")
# neural networks
set.seed(123)
nnetGrid <- expand.grid(.decay = c(0, 0.01, .1),
.size = c(1:10),
.bag = FALSE)
nnetGrid
## .decay .size .bag
## 1 0.00 1 FALSE
## 2 0.01 1 FALSE
## 3 0.10 1 FALSE
## 4 0.00 2 FALSE
## 5 0.01 2 FALSE
## 6 0.10 2 FALSE
## 7 0.00 3 FALSE
## 8 0.01 3 FALSE
## 9 0.10 3 FALSE
## 10 0.00 4 FALSE
## 11 0.01 4 FALSE
## 12 0.10 4 FALSE
## 13 0.00 5 FALSE
## 14 0.01 5 FALSE
## 15 0.10 5 FALSE
## 16 0.00 6 FALSE
## 17 0.01 6 FALSE
## 18 0.10 6 FALSE
## 19 0.00 7 FALSE
## 20 0.01 7 FALSE
## 21 0.10 7 FALSE
## 22 0.00 8 FALSE
## 23 0.01 8 FALSE
## 24 0.10 8 FALSE
## 25 0.00 9 FALSE
## 26 0.01 9 FALSE
## 27 0.10 9 FALSE
## 28 0.00 10 FALSE
## 29 0.01 10 FALSE
## 30 0.10 10 FALSE
nnetFit <- train(predictors,
response,
method = "avNNet",
tuneGrid = nnetGrid,
trControl = ctrl_cv,
linout = TRUE,
trace = FALSE,
MaxNWts = 10 * (ncol(predictors) + 1) + 10 + 1,
maxit = 500,
preProc = c("center", "scale"))
nnetFit
## Model Averaged Neural Network
##
## 900 samples
## 8 predictor
##
## Pre-processing: centered (8), scaled (8)
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 855, 855, 856, 856, 856, 854, ...
## Resampling results across tuning parameters:
##
## decay size RMSE Rsquared MAE
## 0.00 1 9.381062 0.6911776 7.002066
## 0.00 2 6.740511 0.8377439 5.191630
## 0.00 3 6.208132 0.8611464 4.778825
## 0.00 4 5.802836 0.8799794 4.447783
## 0.00 5 5.524865 0.8935214 4.179284
## 0.00 6 5.388293 0.8970321 4.054678
## 0.00 7 5.298037 0.9024527 3.871275
## 0.00 8 10.456101 0.8627619 4.721305
## 0.00 9 5.118550 0.9067706 3.693407
## 0.00 10 6.258228 0.8607140 4.074554
## 0.01 1 9.416876 0.6920283 7.095914
## 0.01 2 6.897797 0.8300779 5.300253
## 0.01 3 6.183066 0.8647398 4.757632
## 0.01 4 5.729807 0.8837714 4.434294
## 0.01 5 5.309166 0.9004170 4.040946
## 0.01 6 5.293512 0.9008021 4.029641
## 0.01 7 5.204859 0.9044074 3.862407
## 0.01 8 4.952935 0.9132856 3.681698
## 0.01 9 4.878235 0.9163351 3.656338
## 0.01 10 4.995252 0.9110841 3.787124
## 0.10 1 9.383546 0.6909523 7.029851
## 0.10 2 6.924770 0.8274929 5.358764
## 0.10 3 6.239736 0.8600777 4.815771
## 0.10 4 5.696332 0.8843682 4.352744
## 0.10 5 5.446549 0.8950531 4.152041
## 0.10 6 5.329909 0.8996663 4.011057
## 0.10 7 5.106442 0.9083157 3.872992
## 0.10 8 5.014059 0.9106457 3.752738
## 0.10 9 4.951479 0.9131644 3.716294
## 0.10 10 4.918789 0.9145168 3.683385
##
## Tuning parameter 'bag' was held constant at a value of FALSE
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were size = 9, decay = 0.01 and bag = FALSE.
plot(nnetFit)
plot(varImp(nnetFit))
# random forests
set.seed(123)
rfGrid <- data.frame(mtry = 1:ncol(predictors))
rfGrid
## mtry
## 1 1
## 2 2
## 3 3
## 4 4
## 5 5
## 6 6
## 7 7
## 8 8
rfFit <- train(x = predictors,
y = response,
method = "rf",
tuneGrid = rfGrid,
ntree = 2000,
importance = TRUE,
trainCtrl = "YeoJohnson",
trControl = ctrl_cv)
rfFit
## Random Forest
##
## 900 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 855, 855, 856, 856, 856, 854, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 7.264047 0.8594038 5.738772
## 2 5.320665 0.9150545 3.962373
## 3 4.886979 0.9219525 3.554419
## 4 4.774547 0.9223704 3.442779
## 5 4.743583 0.9217818 3.391820
## 6 4.736136 0.9213526 3.367641
## 7 4.744854 0.9206342 3.362591
## 8 4.759586 0.9198757 3.372045
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 6.
plot(rfFit)
rfImp <- varImp(rfFit, competes = FALSE)
plot(rfImp)
# gradient boosting machines
set.seed(123)
gbmGrid <- expand.grid(interaction.depth = seq(1, 7, by = 2),
n.trees = seq(100, 2000, by = 50),
n.minobsinnode = 11,
shrinkage = c(0.01, 0.1))
gbmGrid
## interaction.depth n.trees n.minobsinnode shrinkage
## 1 1 100 11 0.01
## 2 3 100 11 0.01
## 3 5 100 11 0.01
## 4 7 100 11 0.01
## 5 1 150 11 0.01
## 6 3 150 11 0.01
## 7 5 150 11 0.01
## 8 7 150 11 0.01
## 9 1 200 11 0.01
## 10 3 200 11 0.01
## 11 5 200 11 0.01
## 12 7 200 11 0.01
## 13 1 250 11 0.01
## 14 3 250 11 0.01
## 15 5 250 11 0.01
## 16 7 250 11 0.01
## 17 1 300 11 0.01
## 18 3 300 11 0.01
## 19 5 300 11 0.01
## 20 7 300 11 0.01
## 21 1 350 11 0.01
## 22 3 350 11 0.01
## 23 5 350 11 0.01
## 24 7 350 11 0.01
## 25 1 400 11 0.01
## 26 3 400 11 0.01
## 27 5 400 11 0.01
## 28 7 400 11 0.01
## 29 1 450 11 0.01
## 30 3 450 11 0.01
## 31 5 450 11 0.01
## 32 7 450 11 0.01
## 33 1 500 11 0.01
## 34 3 500 11 0.01
## 35 5 500 11 0.01
## 36 7 500 11 0.01
## 37 1 550 11 0.01
## 38 3 550 11 0.01
## 39 5 550 11 0.01
## 40 7 550 11 0.01
## 41 1 600 11 0.01
## 42 3 600 11 0.01
## 43 5 600 11 0.01
## 44 7 600 11 0.01
## 45 1 650 11 0.01
## 46 3 650 11 0.01
## 47 5 650 11 0.01
## 48 7 650 11 0.01
## 49 1 700 11 0.01
## 50 3 700 11 0.01
## 51 5 700 11 0.01
## 52 7 700 11 0.01
## 53 1 750 11 0.01
## 54 3 750 11 0.01
## 55 5 750 11 0.01
## 56 7 750 11 0.01
## 57 1 800 11 0.01
## 58 3 800 11 0.01
## 59 5 800 11 0.01
## 60 7 800 11 0.01
## 61 1 850 11 0.01
## 62 3 850 11 0.01
## 63 5 850 11 0.01
## 64 7 850 11 0.01
## 65 1 900 11 0.01
## 66 3 900 11 0.01
## 67 5 900 11 0.01
## 68 7 900 11 0.01
## 69 1 950 11 0.01
## 70 3 950 11 0.01
## 71 5 950 11 0.01
## 72 7 950 11 0.01
## 73 1 1000 11 0.01
## 74 3 1000 11 0.01
## 75 5 1000 11 0.01
## 76 7 1000 11 0.01
## 77 1 1050 11 0.01
## 78 3 1050 11 0.01
## 79 5 1050 11 0.01
## 80 7 1050 11 0.01
## 81 1 1100 11 0.01
## 82 3 1100 11 0.01
## 83 5 1100 11 0.01
## 84 7 1100 11 0.01
## 85 1 1150 11 0.01
## 86 3 1150 11 0.01
## 87 5 1150 11 0.01
## 88 7 1150 11 0.01
## 89 1 1200 11 0.01
## 90 3 1200 11 0.01
## 91 5 1200 11 0.01
## 92 7 1200 11 0.01
## 93 1 1250 11 0.01
## 94 3 1250 11 0.01
## 95 5 1250 11 0.01
## 96 7 1250 11 0.01
## 97 1 1300 11 0.01
## 98 3 1300 11 0.01
## 99 5 1300 11 0.01
## 100 7 1300 11 0.01
## 101 1 1350 11 0.01
## 102 3 1350 11 0.01
## 103 5 1350 11 0.01
## 104 7 1350 11 0.01
## 105 1 1400 11 0.01
## 106 3 1400 11 0.01
## 107 5 1400 11 0.01
## 108 7 1400 11 0.01
## 109 1 1450 11 0.01
## 110 3 1450 11 0.01
## 111 5 1450 11 0.01
## 112 7 1450 11 0.01
## 113 1 1500 11 0.01
## 114 3 1500 11 0.01
## 115 5 1500 11 0.01
## 116 7 1500 11 0.01
## 117 1 1550 11 0.01
## 118 3 1550 11 0.01
## 119 5 1550 11 0.01
## 120 7 1550 11 0.01
## 121 1 1600 11 0.01
## 122 3 1600 11 0.01
## 123 5 1600 11 0.01
## 124 7 1600 11 0.01
## 125 1 1650 11 0.01
## 126 3 1650 11 0.01
## 127 5 1650 11 0.01
## 128 7 1650 11 0.01
## 129 1 1700 11 0.01
## 130 3 1700 11 0.01
## 131 5 1700 11 0.01
## 132 7 1700 11 0.01
## 133 1 1750 11 0.01
## 134 3 1750 11 0.01
## 135 5 1750 11 0.01
## 136 7 1750 11 0.01
## 137 1 1800 11 0.01
## 138 3 1800 11 0.01
## 139 5 1800 11 0.01
## 140 7 1800 11 0.01
## 141 1 1850 11 0.01
## 142 3 1850 11 0.01
## 143 5 1850 11 0.01
## 144 7 1850 11 0.01
## 145 1 1900 11 0.01
## 146 3 1900 11 0.01
## 147 5 1900 11 0.01
## 148 7 1900 11 0.01
## 149 1 1950 11 0.01
## 150 3 1950 11 0.01
## 151 5 1950 11 0.01
## 152 7 1950 11 0.01
## 153 1 2000 11 0.01
## 154 3 2000 11 0.01
## 155 5 2000 11 0.01
## 156 7 2000 11 0.01
## 157 1 100 11 0.10
## 158 3 100 11 0.10
## 159 5 100 11 0.10
## 160 7 100 11 0.10
## 161 1 150 11 0.10
## 162 3 150 11 0.10
## 163 5 150 11 0.10
## 164 7 150 11 0.10
## 165 1 200 11 0.10
## 166 3 200 11 0.10
## 167 5 200 11 0.10
## 168 7 200 11 0.10
## 169 1 250 11 0.10
## 170 3 250 11 0.10
## 171 5 250 11 0.10
## 172 7 250 11 0.10
## 173 1 300 11 0.10
## 174 3 300 11 0.10
## 175 5 300 11 0.10
## 176 7 300 11 0.10
## 177 1 350 11 0.10
## 178 3 350 11 0.10
## 179 5 350 11 0.10
## 180 7 350 11 0.10
## 181 1 400 11 0.10
## 182 3 400 11 0.10
## 183 5 400 11 0.10
## 184 7 400 11 0.10
## 185 1 450 11 0.10
## 186 3 450 11 0.10
## 187 5 450 11 0.10
## 188 7 450 11 0.10
## 189 1 500 11 0.10
## 190 3 500 11 0.10
## 191 5 500 11 0.10
## 192 7 500 11 0.10
## 193 1 550 11 0.10
## 194 3 550 11 0.10
## 195 5 550 11 0.10
## 196 7 550 11 0.10
## 197 1 600 11 0.10
## 198 3 600 11 0.10
## 199 5 600 11 0.10
## 200 7 600 11 0.10
## 201 1 650 11 0.10
## 202 3 650 11 0.10
## 203 5 650 11 0.10
## 204 7 650 11 0.10
## 205 1 700 11 0.10
## 206 3 700 11 0.10
## 207 5 700 11 0.10
## 208 7 700 11 0.10
## 209 1 750 11 0.10
## 210 3 750 11 0.10
## 211 5 750 11 0.10
## 212 7 750 11 0.10
## 213 1 800 11 0.10
## 214 3 800 11 0.10
## 215 5 800 11 0.10
## 216 7 800 11 0.10
## 217 1 850 11 0.10
## 218 3 850 11 0.10
## 219 5 850 11 0.10
## 220 7 850 11 0.10
## 221 1 900 11 0.10
## 222 3 900 11 0.10
## 223 5 900 11 0.10
## 224 7 900 11 0.10
## 225 1 950 11 0.10
## 226 3 950 11 0.10
## 227 5 950 11 0.10
## 228 7 950 11 0.10
## 229 1 1000 11 0.10
## 230 3 1000 11 0.10
## 231 5 1000 11 0.10
## 232 7 1000 11 0.10
## 233 1 1050 11 0.10
## 234 3 1050 11 0.10
## 235 5 1050 11 0.10
## 236 7 1050 11 0.10
## 237 1 1100 11 0.10
## 238 3 1100 11 0.10
## 239 5 1100 11 0.10
## 240 7 1100 11 0.10
## 241 1 1150 11 0.10
## 242 3 1150 11 0.10
## 243 5 1150 11 0.10
## 244 7 1150 11 0.10
## 245 1 1200 11 0.10
## 246 3 1200 11 0.10
## 247 5 1200 11 0.10
## 248 7 1200 11 0.10
## 249 1 1250 11 0.10
## 250 3 1250 11 0.10
## 251 5 1250 11 0.10
## 252 7 1250 11 0.10
## 253 1 1300 11 0.10
## 254 3 1300 11 0.10
## 255 5 1300 11 0.10
## 256 7 1300 11 0.10
## 257 1 1350 11 0.10
## 258 3 1350 11 0.10
## 259 5 1350 11 0.10
## 260 7 1350 11 0.10
## 261 1 1400 11 0.10
## 262 3 1400 11 0.10
## 263 5 1400 11 0.10
## 264 7 1400 11 0.10
## 265 1 1450 11 0.10
## 266 3 1450 11 0.10
## 267 5 1450 11 0.10
## 268 7 1450 11 0.10
## 269 1 1500 11 0.10
## 270 3 1500 11 0.10
## 271 5 1500 11 0.10
## 272 7 1500 11 0.10
## 273 1 1550 11 0.10
## 274 3 1550 11 0.10
## 275 5 1550 11 0.10
## 276 7 1550 11 0.10
## 277 1 1600 11 0.10
## 278 3 1600 11 0.10
## 279 5 1600 11 0.10
## 280 7 1600 11 0.10
## 281 1 1650 11 0.10
## 282 3 1650 11 0.10
## 283 5 1650 11 0.10
## 284 7 1650 11 0.10
## 285 1 1700 11 0.10
## 286 3 1700 11 0.10
## 287 5 1700 11 0.10
## 288 7 1700 11 0.10
## 289 1 1750 11 0.10
## 290 3 1750 11 0.10
## 291 5 1750 11 0.10
## 292 7 1750 11 0.10
## 293 1 1800 11 0.10
## 294 3 1800 11 0.10
## 295 5 1800 11 0.10
## 296 7 1800 11 0.10
## 297 1 1850 11 0.10
## 298 3 1850 11 0.10
## 299 5 1850 11 0.10
## 300 7 1850 11 0.10
## 301 1 1900 11 0.10
## 302 3 1900 11 0.10
## 303 5 1900 11 0.10
## 304 7 1900 11 0.10
## 305 1 1950 11 0.10
## 306 3 1950 11 0.10
## 307 5 1950 11 0.10
## 308 7 1950 11 0.10
## 309 1 2000 11 0.10
## 310 3 2000 11 0.10
## 311 5 2000 11 0.10
## 312 7 2000 11 0.10
gbmFit <- train(x = predictors,
y = response,
method = "gbm",
tuneGrid = gbmGrid,
trControl = ctrl,
verbose = FALSE)
gbmFit
## Stochastic Gradient Boosting
##
## 900 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (20 fold, repeated 20 times)
## Summary of sample sizes: 855, 855, 856, 856, 856, 854, ...
## Resampling results across tuning parameters:
##
## shrinkage interaction.depth n.trees RMSE Rsquared MAE
## 0.01 1 100 13.666520 0.6551054 10.900110
## 0.01 1 150 12.613273 0.6830760 10.068949
## 0.01 1 200 11.779890 0.6972649 9.427925
## 0.01 1 250 11.111855 0.7122930 8.893464
## 0.01 1 300 10.552272 0.7300101 8.428726
## 0.01 1 350 10.071879 0.7464196 8.030580
## 0.01 1 400 9.656905 0.7608141 7.685632
## 0.01 1 450 9.292617 0.7737359 7.382511
## 0.01 1 500 8.969311 0.7851264 7.114585
## 0.01 1 550 8.678086 0.7950275 6.873445
## 0.01 1 600 8.418381 0.8035855 6.659639
## 0.01 1 650 8.185273 0.8108008 6.461801
## 0.01 1 700 7.974631 0.8171946 6.283892
## 0.01 1 750 7.784333 0.8227216 6.122466
## 0.01 1 800 7.614436 0.8274067 5.978000
## 0.01 1 850 7.464265 0.8312918 5.850963
## 0.01 1 900 7.328040 0.8347681 5.738070
## 0.01 1 950 7.207202 0.8377441 5.637400
## 0.01 1 1000 7.097933 0.8404494 5.547481
## 0.01 1 1050 7.001105 0.8428097 5.469410
## 0.01 1 1100 6.914179 0.8450386 5.400450
## 0.01 1 1150 6.835279 0.8471380 5.339248
## 0.01 1 1200 6.763299 0.8491083 5.283400
## 0.01 1 1250 6.698125 0.8509298 5.232141
## 0.01 1 1300 6.638306 0.8526741 5.184085
## 0.01 1 1350 6.584007 0.8542949 5.139143
## 0.01 1 1400 6.533514 0.8558289 5.097311
## 0.01 1 1450 6.485838 0.8573181 5.057114
## 0.01 1 1500 6.441891 0.8587193 5.018472
## 0.01 1 1550 6.400730 0.8600526 4.981941
## 0.01 1 1600 6.361558 0.8613289 4.947148
## 0.01 1 1650 6.324935 0.8625174 4.914859
## 0.01 1 1700 6.291395 0.8636311 4.885884
## 0.01 1 1750 6.257794 0.8647462 4.856314
## 0.01 1 1800 6.225380 0.8658365 4.828284
## 0.01 1 1850 6.195533 0.8668396 4.801954
## 0.01 1 1900 6.168129 0.8677547 4.777961
## 0.01 1 1950 6.142243 0.8685957 4.755182
## 0.01 1 2000 6.117107 0.8694574 4.733181
## 0.01 3 100 11.476781 0.7470971 9.173866
## 0.01 3 150 10.057987 0.7784653 8.023591
## 0.01 3 200 9.028011 0.8028283 7.200153
## 0.01 3 250 8.257895 0.8213980 6.594992
## 0.01 3 300 7.666751 0.8357258 6.125281
## 0.01 3 350 7.213305 0.8464275 5.755245
## 0.01 3 400 6.862768 0.8547287 5.456521
## 0.01 3 450 6.590829 0.8612759 5.219178
## 0.01 3 500 6.378977 0.8666179 5.026619
## 0.01 3 550 6.206140 0.8712990 4.863850
## 0.01 3 600 6.061343 0.8754620 4.723662
## 0.01 3 650 5.938052 0.8791947 4.601442
## 0.01 3 700 5.831261 0.8825375 4.494099
## 0.01 3 750 5.736366 0.8855863 4.398001
## 0.01 3 800 5.652335 0.8883150 4.312914
## 0.01 3 850 5.577834 0.8907611 4.237319
## 0.01 3 900 5.510820 0.8929844 4.170542
## 0.01 3 950 5.450256 0.8949961 4.110789
## 0.01 3 1000 5.395854 0.8968095 4.057207
## 0.01 3 1050 5.346542 0.8984504 4.008285
## 0.01 3 1100 5.301998 0.8999478 3.963774
## 0.01 3 1150 5.260633 0.9013342 3.923439
## 0.01 3 1200 5.222805 0.9026026 3.885654
## 0.01 3 1250 5.187647 0.9037870 3.851005
## 0.01 3 1300 5.154718 0.9048872 3.818475
## 0.01 3 1350 5.122984 0.9059343 3.787267
## 0.01 3 1400 5.094057 0.9068873 3.758504
## 0.01 3 1450 5.066414 0.9078034 3.730842
## 0.01 3 1500 5.040295 0.9086745 3.705878
## 0.01 3 1550 5.015242 0.9095019 3.680782
## 0.01 3 1600 4.992754 0.9102513 3.659092
## 0.01 3 1650 4.970866 0.9109811 3.638221
## 0.01 3 1700 4.948907 0.9117050 3.617057
## 0.01 3 1750 4.928597 0.9123701 3.598054
## 0.01 3 1800 4.909402 0.9130089 3.579657
## 0.01 3 1850 4.891609 0.9135860 3.561766
## 0.01 3 1900 4.873949 0.9141629 3.544988
## 0.01 3 1950 4.857290 0.9147048 3.528855
## 0.01 3 2000 4.840937 0.9152515 3.513576
## 0.01 5 100 10.559654 0.7869783 8.422737
## 0.01 5 150 9.044945 0.8149391 7.210264
## 0.01 5 200 8.013034 0.8353634 6.399440
## 0.01 5 250 7.284967 0.8505883 5.819730
## 0.01 5 300 6.765496 0.8618104 5.384891
## 0.01 5 350 6.391794 0.8702927 5.051467
## 0.01 5 400 6.113163 0.8771524 4.790720
## 0.01 5 450 5.898335 0.8828520 4.580586
## 0.01 5 500 5.726549 0.8877347 4.408275
## 0.01 5 550 5.584822 0.8919571 4.263674
## 0.01 5 600 5.467020 0.8955642 4.143718
## 0.01 5 650 5.365305 0.8987756 4.040511
## 0.01 5 700 5.278189 0.9015538 3.951869
## 0.01 5 750 5.203752 0.9039379 3.876106
## 0.01 5 800 5.136340 0.9061113 3.809596
## 0.01 5 850 5.076626 0.9080489 3.750265
## 0.01 5 900 5.023541 0.9097585 3.698860
## 0.01 5 950 4.975168 0.9113134 3.651673
## 0.01 5 1000 4.931358 0.9127281 3.609521
## 0.01 5 1050 4.890689 0.9140303 3.570739
## 0.01 5 1100 4.854839 0.9151876 3.537632
## 0.01 5 1150 4.820342 0.9162929 3.505709
## 0.01 5 1200 4.789538 0.9172859 3.476596
## 0.01 5 1250 4.760771 0.9182015 3.450087
## 0.01 5 1300 4.733875 0.9190606 3.425018
## 0.01 5 1350 4.709072 0.9198499 3.402279
## 0.01 5 1400 4.686273 0.9205721 3.381239
## 0.01 5 1450 4.663345 0.9213067 3.360723
## 0.01 5 1500 4.641330 0.9220054 3.341102
## 0.01 5 1550 4.620639 0.9226566 3.322452
## 0.01 5 1600 4.600773 0.9232849 3.304958
## 0.01 5 1650 4.581923 0.9238826 3.288066
## 0.01 5 1700 4.564533 0.9244181 3.272624
## 0.01 5 1750 4.548865 0.9249051 3.258558
## 0.01 5 1800 4.533205 0.9253905 3.244754
## 0.01 5 1850 4.517527 0.9258834 3.230630
## 0.01 5 1900 4.503087 0.9263376 3.217277
## 0.01 5 1950 4.488720 0.9267790 3.205055
## 0.01 5 2000 4.474157 0.9272319 3.192294
## 0.01 7 100 10.041819 0.8138495 8.015326
## 0.01 7 150 8.481977 0.8361965 6.776675
## 0.01 7 200 7.467280 0.8532296 5.961306
## 0.01 7 250 6.777678 0.8663061 5.393915
## 0.01 7 300 6.300029 0.8762426 4.971094
## 0.01 7 350 5.962009 0.8840055 4.652550
## 0.01 7 400 5.710217 0.8903938 4.404960
## 0.01 7 450 5.517159 0.8956369 4.209906
## 0.01 7 500 5.363273 0.9000430 4.052576
## 0.01 7 550 5.237406 0.9037713 3.926177
## 0.01 7 600 5.132814 0.9069429 3.819864
## 0.01 7 650 5.046161 0.9095850 3.731905
## 0.01 7 700 4.970531 0.9119290 3.655322
## 0.01 7 750 4.905565 0.9139559 3.590203
## 0.01 7 800 4.849534 0.9156892 3.535911
## 0.01 7 850 4.799360 0.9172602 3.487956
## 0.01 7 900 4.754030 0.9186663 3.445635
## 0.01 7 950 4.713601 0.9199192 3.408599
## 0.01 7 1000 4.676275 0.9210802 3.373873
## 0.01 7 1050 4.642902 0.9221102 3.342772
## 0.01 7 1100 4.610896 0.9231098 3.313855
## 0.01 7 1150 4.581561 0.9240154 3.287416
## 0.01 7 1200 4.553670 0.9248831 3.262394
## 0.01 7 1250 4.528602 0.9256516 3.239945
## 0.01 7 1300 4.503637 0.9264183 3.217620
## 0.01 7 1350 4.481271 0.9271193 3.198131
## 0.01 7 1400 4.459539 0.9277703 3.178788
## 0.01 7 1450 4.439335 0.9283903 3.160744
## 0.01 7 1500 4.420429 0.9289724 3.143770
## 0.01 7 1550 4.402666 0.9294954 3.127099
## 0.01 7 1600 4.384941 0.9300227 3.111219
## 0.01 7 1650 4.368778 0.9305082 3.095954
## 0.01 7 1700 4.353111 0.9309850 3.081759
## 0.01 7 1750 4.337178 0.9314558 3.067502
## 0.01 7 1800 4.323119 0.9318726 3.054624
## 0.01 7 1850 4.309069 0.9322904 3.041862
## 0.01 7 1900 4.295091 0.9327032 3.029550
## 0.01 7 1950 4.282014 0.9330952 3.017206
## 0.01 7 2000 4.269354 0.9334735 3.005996
## 0.10 1 100 7.066337 0.8396175 5.517732
## 0.10 1 150 6.431465 0.8583025 4.999250
## 0.10 1 200 6.118111 0.8688041 4.728490
## 0.10 1 250 5.925450 0.8756672 4.563115
## 0.10 1 300 5.800108 0.8802354 4.449750
## 0.10 1 350 5.709749 0.8836028 4.363592
## 0.10 1 400 5.642262 0.8860642 4.299499
## 0.10 1 450 5.583663 0.8883094 4.247942
## 0.10 1 500 5.538416 0.8899870 4.205484
## 0.10 1 550 5.497250 0.8914992 4.165832
## 0.10 1 600 5.467134 0.8926840 4.137389
## 0.10 1 650 5.436025 0.8938329 4.106521
## 0.10 1 700 5.409284 0.8948276 4.079563
## 0.10 1 750 5.389421 0.8956096 4.060712
## 0.10 1 800 5.371604 0.8962965 4.043509
## 0.10 1 850 5.348599 0.8971093 4.023129
## 0.10 1 900 5.333036 0.8977493 4.006564
## 0.10 1 950 5.316400 0.8984013 3.993238
## 0.10 1 1000 5.306270 0.8987512 3.980394
## 0.10 1 1050 5.292625 0.8992656 3.967062
## 0.10 1 1100 5.281080 0.8996747 3.954370
## 0.10 1 1150 5.268421 0.9001361 3.943372
## 0.10 1 1200 5.262407 0.9003550 3.935463
## 0.10 1 1250 5.249760 0.9008182 3.924264
## 0.10 1 1300 5.245265 0.9009617 3.919312
## 0.10 1 1350 5.239526 0.9012138 3.911851
## 0.10 1 1400 5.231996 0.9015164 3.905540
## 0.10 1 1450 5.220723 0.9019317 3.898143
## 0.10 1 1500 5.213992 0.9021510 3.889985
## 0.10 1 1550 5.208490 0.9023749 3.886010
## 0.10 1 1600 5.199746 0.9026593 3.877420
## 0.10 1 1650 5.197730 0.9027752 3.873181
## 0.10 1 1700 5.192467 0.9029650 3.868750
## 0.10 1 1750 5.187184 0.9031622 3.863822
## 0.10 1 1800 5.185281 0.9032526 3.861641
## 0.10 1 1850 5.178522 0.9035221 3.855724
## 0.10 1 1900 5.173185 0.9036802 3.851525
## 0.10 1 1950 5.169971 0.9038424 3.848815
## 0.10 1 2000 5.162622 0.9041475 3.841168
## 0.10 3 100 5.469965 0.8935541 4.110766
## 0.10 3 150 5.116505 0.9057135 3.766024
## 0.10 3 200 4.912146 0.9126823 3.581519
## 0.10 3 250 4.784843 0.9169434 3.462063
## 0.10 3 300 4.687229 0.9201160 3.375329
## 0.10 3 350 4.609170 0.9226274 3.304036
## 0.10 3 400 4.544139 0.9246628 3.250289
## 0.10 3 450 4.490430 0.9264022 3.203538
## 0.10 3 500 4.444126 0.9278545 3.165182
## 0.10 3 550 4.404236 0.9290896 3.128428
## 0.10 3 600 4.363404 0.9303121 3.094755
## 0.10 3 650 4.336233 0.9311255 3.069342
## 0.10 3 700 4.309303 0.9319312 3.047166
## 0.10 3 750 4.284892 0.9326847 3.028099
## 0.10 3 800 4.261016 0.9333998 3.009906
## 0.10 3 850 4.238128 0.9340857 2.992411
## 0.10 3 900 4.222802 0.9345374 2.978883
## 0.10 3 950 4.201946 0.9351286 2.961686
## 0.10 3 1000 4.183039 0.9356892 2.947786
## 0.10 3 1050 4.166973 0.9361434 2.933935
## 0.10 3 1100 4.156366 0.9364907 2.922927
## 0.10 3 1150 4.142264 0.9369015 2.911497
## 0.10 3 1200 4.126254 0.9373285 2.898417
## 0.10 3 1250 4.118437 0.9375404 2.890676
## 0.10 3 1300 4.107853 0.9378472 2.881702
## 0.10 3 1350 4.101030 0.9380341 2.875392
## 0.10 3 1400 4.089167 0.9383224 2.864676
## 0.10 3 1450 4.083406 0.9385038 2.858361
## 0.10 3 1500 4.072651 0.9387885 2.849262
## 0.10 3 1550 4.065400 0.9390033 2.841541
## 0.10 3 1600 4.057730 0.9392113 2.832617
## 0.10 3 1650 4.051363 0.9393623 2.826580
## 0.10 3 1700 4.045395 0.9395699 2.821649
## 0.10 3 1750 4.044156 0.9395739 2.818845
## 0.10 3 1800 4.038861 0.9397549 2.813236
## 0.10 3 1850 4.034443 0.9398499 2.808963
## 0.10 3 1900 4.030638 0.9399713 2.804195
## 0.10 3 1950 4.027842 0.9400279 2.801621
## 0.10 3 2000 4.024769 0.9401271 2.797304
## 0.10 5 100 5.034595 0.9088243 3.694425
## 0.10 5 150 4.758379 0.9179949 3.439827
## 0.10 5 200 4.585626 0.9235025 3.285723
## 0.10 5 250 4.475932 0.9269322 3.187707
## 0.10 5 300 4.381672 0.9297944 3.105669
## 0.10 5 350 4.314372 0.9318725 3.047069
## 0.10 5 400 4.257906 0.9335693 3.001173
## 0.10 5 450 4.209904 0.9349909 2.959170
## 0.10 5 500 4.174407 0.9360637 2.924560
## 0.10 5 550 4.145280 0.9368662 2.896924
## 0.10 5 600 4.115156 0.9377069 2.870903
## 0.10 5 650 4.085346 0.9385295 2.844407
## 0.10 5 700 4.067767 0.9390199 2.827928
## 0.10 5 750 4.050826 0.9394967 2.809247
## 0.10 5 800 4.037966 0.9398198 2.796835
## 0.10 5 850 4.026852 0.9401125 2.785431
## 0.10 5 900 4.018786 0.9403712 2.775263
## 0.10 5 950 4.009202 0.9406232 2.765515
## 0.10 5 1000 4.000686 0.9408470 2.756278
## 0.10 5 1050 3.994534 0.9410176 2.747719
## 0.10 5 1100 3.989532 0.9411346 2.742564
## 0.10 5 1150 3.985342 0.9412306 2.733707
## 0.10 5 1200 3.980246 0.9413931 2.728132
## 0.10 5 1250 3.979727 0.9413756 2.724443
## 0.10 5 1300 3.974031 0.9415448 2.717135
## 0.10 5 1350 3.975728 0.9415199 2.715761
## 0.10 5 1400 3.971615 0.9416226 2.711335
## 0.10 5 1450 3.968586 0.9416967 2.706696
## 0.10 5 1500 3.966840 0.9417641 2.702839
## 0.10 5 1550 3.963532 0.9418597 2.700126
## 0.10 5 1600 3.966621 0.9417777 2.699232
## 0.10 5 1650 3.965503 0.9418077 2.693930
## 0.10 5 1700 3.966328 0.9417702 2.693546
## 0.10 5 1750 3.967436 0.9417408 2.690555
## 0.10 5 1800 3.965903 0.9417760 2.687753
## 0.10 5 1850 3.967347 0.9417213 2.688377
## 0.10 5 1900 3.967195 0.9417066 2.685244
## 0.10 5 1950 3.963634 0.9418031 2.681715
## 0.10 5 2000 3.963537 0.9417834 2.678780
## 0.10 7 100 4.791694 0.9170049 3.468091
## 0.10 7 150 4.536828 0.9250729 3.245165
## 0.10 7 200 4.389111 0.9296456 3.116329
## 0.10 7 250 4.277937 0.9330143 3.020017
## 0.10 7 300 4.197062 0.9353746 2.947913
## 0.10 7 350 4.134860 0.9371745 2.892307
## 0.10 7 400 4.093696 0.9383219 2.852477
## 0.10 7 450 4.058564 0.9393228 2.820127
## 0.10 7 500 4.029676 0.9401106 2.790373
## 0.10 7 550 4.010132 0.9406704 2.769735
## 0.10 7 600 3.992169 0.9411283 2.751660
## 0.10 7 650 3.977431 0.9415334 2.731740
## 0.10 7 700 3.968707 0.9417521 2.721572
## 0.10 7 750 3.963295 0.9418939 2.712830
## 0.10 7 800 3.957281 0.9420724 2.703369
## 0.10 7 850 3.953283 0.9421194 2.695623
## 0.10 7 900 3.950373 0.9422095 2.689001
## 0.10 7 950 3.949391 0.9422335 2.681343
## 0.10 7 1000 3.941443 0.9424583 2.673505
## 0.10 7 1050 3.941453 0.9424711 2.669269
## 0.10 7 1100 3.940261 0.9425079 2.666278
## 0.10 7 1150 3.939780 0.9425429 2.662768
## 0.10 7 1200 3.941778 0.9424621 2.661057
## 0.10 7 1250 3.941306 0.9424762 2.656433
## 0.10 7 1300 3.940841 0.9424906 2.655260
## 0.10 7 1350 3.938082 0.9425546 2.649270
## 0.10 7 1400 3.939006 0.9425158 2.647333
## 0.10 7 1450 3.941716 0.9424365 2.647498
## 0.10 7 1500 3.940881 0.9424714 2.642444
## 0.10 7 1550 3.944226 0.9423854 2.644032
## 0.10 7 1600 3.945520 0.9423610 2.642675
## 0.10 7 1650 3.948077 0.9422745 2.643473
## 0.10 7 1700 3.946046 0.9423124 2.639839
## 0.10 7 1750 3.947414 0.9422707 2.639811
## 0.10 7 1800 3.948256 0.9422531 2.639432
## 0.10 7 1850 3.951906 0.9421578 2.639203
## 0.10 7 1900 3.950592 0.9422017 2.636294
## 0.10 7 1950 3.950885 0.9421827 2.634344
## 0.10 7 2000 3.957035 0.9420202 2.636694
##
## Tuning parameter 'n.minobsinnode' was held constant at a value of 11
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were n.trees = 1350, interaction.depth =
## 7, shrinkage = 0.1 and n.minobsinnode = 11.
plot(gbmFit)
# extreme gradient boosting
set.seed(123)
xgbGrid <- expand.grid(max_depth = c(3, 5, 7),
nrounds = (1:10)*70,
eta = 0.3,
gamma = 0,
subsample = 1,
min_child_weight = 1,
colsample_bytree = 0.6)
xgbFit <- train(x = predictors,
y = response,
method = "xgbTree",
trControl = ctrl_xgb,
tuneGrid = xgbGrid)
xgbFit
## eXtreme Gradient Boosting
##
## 900 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (20 fold)
## Summary of sample sizes: 855, 855, 856, 856, 856, 854, ...
## Resampling results across tuning parameters:
##
## max_depth nrounds RMSE Rsquared MAE
## 3 70 4.548472 0.9258530 3.319408
## 3 140 4.193718 0.9361447 2.986624
## 3 210 4.026018 0.9409575 2.814816
## 3 280 3.957934 0.9426267 2.753618
## 3 350 3.915965 0.9437347 2.705431
## 3 420 3.885122 0.9446051 2.670619
## 3 490 3.880767 0.9447276 2.661558
## 3 560 3.866485 0.9450859 2.638666
## 3 630 3.858587 0.9453751 2.630057
## 3 700 3.857600 0.9454180 2.628142
## 5 70 4.182646 0.9368702 2.910676
## 5 140 4.043065 0.9407298 2.742854
## 5 210 4.013970 0.9415822 2.692528
## 5 280 4.013485 0.9416043 2.681317
## 5 350 4.008992 0.9416721 2.668952
## 5 420 4.009771 0.9416212 2.666352
## 5 490 4.011120 0.9415464 2.662795
## 5 560 4.013019 0.9414771 2.661483
## 5 630 4.013576 0.9414463 2.661160
## 5 700 4.018622 0.9412778 2.664534
## 7 70 4.269227 0.9347829 2.839591
## 7 140 4.243791 0.9354771 2.798705
## 7 210 4.230965 0.9358187 2.784395
## 7 280 4.228003 0.9358945 2.781854
## 7 350 4.227725 0.9358949 2.781597
## 7 420 4.227896 0.9358805 2.781953
## 7 490 4.228015 0.9358739 2.781983
## 7 560 4.228072 0.9358722 2.781918
## 7 630 4.228072 0.9358722 2.781918
## 7 700 4.228072 0.9358722 2.781918
##
## Tuning parameter 'eta' was held constant at a value of 0.3
## Tuning
##
## Tuning parameter 'min_child_weight' was held constant at a value of 1
##
## Tuning parameter 'subsample' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 700, max_depth = 3, eta
## = 0.3, gamma = 0, colsample_bytree = 0.6, min_child_weight = 1 and subsample
## = 1.
plot(xgbFit)
# cubist
set.seed(123)
cbGrid <- expand.grid(committees = c(1:10, 20, 50, 75, 123),
neighbors = c(0, 1, 5, 9))
cbGrid
## committees neighbors
## 1 1 0
## 2 2 0
## 3 3 0
## 4 4 0
## 5 5 0
## 6 6 0
## 7 7 0
## 8 8 0
## 9 9 0
## 10 10 0
## 11 20 0
## 12 50 0
## 13 75 0
## 14 123 0
## 15 1 1
## 16 2 1
## 17 3 1
## 18 4 1
## 19 5 1
## 20 6 1
## 21 7 1
## 22 8 1
## 23 9 1
## 24 10 1
## 25 20 1
## 26 50 1
## 27 75 1
## 28 123 1
## 29 1 5
## 30 2 5
## 31 3 5
## 32 4 5
## 33 5 5
## 34 6 5
## 35 7 5
## 36 8 5
## 37 9 5
## 38 10 5
## 39 20 5
## 40 50 5
## 41 75 5
## 42 123 5
## 43 1 9
## 44 2 9
## 45 3 9
## 46 4 9
## 47 5 9
## 48 6 9
## 49 7 9
## 50 8 9
## 51 9 9
## 52 10 9
## 53 20 9
## 54 50 9
## 55 75 9
## 56 123 9
cubistFit <- train(predictors,
response,
method="cubist",
FitGrid = cbGrid,
trControl = ctrl,
preProc = "BoxCox")
cubistFit
## Cubist
##
## 900 samples
## 8 predictor
##
## Pre-processing: Box-Cox transformation (5)
## Resampling: Cross-Validated (20 fold, repeated 20 times)
## Summary of sample sizes: 855, 855, 856, 856, 856, 854, ...
## Resampling results across tuning parameters:
##
## committees neighbors RMSE Rsquared MAE
## 1 0 5.757070 0.8814646 4.124494
## 1 5 5.012061 0.9084429 3.407996
## 1 9 5.231181 0.9008091 3.601144
## 10 0 5.112298 0.9067381 3.768867
## 10 5 4.418290 0.9285085 3.036641
## 10 9 4.623403 0.9223662 3.211760
## 20 0 5.060287 0.9085702 3.739905
## 20 5 4.383041 0.9294864 3.017908
## 20 9 4.586975 0.9234415 3.188513
##
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were committees = 20 and neighbors = 5.
plot(cubistFit, auto.key = list(columns = 4, lines = TRUE))
plot(varImp(cubistFit))
# suummary output of all models
resamples_cv <- resamples(list("Neural Net" = nnetFit,
"RF" = rfFit,
"XGB" = xgbFit))
resamples_rcv <- resamples(list("GBM" = gbmFit,
"Cubist" = cubistFit))
summary(resamples_rcv)
##
## Call:
## summary.resamples(object = resamples_rcv)
##
## Models: GBM, Cubist
## Number of resamples: 400
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GBM 1.663437 2.360057 2.604865 2.649270 2.912577 3.974157 0
## Cubist 1.804833 2.673383 2.955732 3.017908 3.326497 4.624829 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GBM 2.172531 3.234712 3.753491 3.938082 4.479314 7.459319 0
## Cubist 2.287482 3.641158 4.226884 4.383041 4.919362 7.725057 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## GBM 0.8080794 0.9298449 0.9508306 0.9425546 0.9635267 0.9857974 0
## Cubist 0.7644881 0.9145283 0.9381322 0.9294864 0.9528790 0.9835193 0
summary(resamples_cv)
##
## Call:
## summary.resamples(object = resamples_cv)
##
## Models: Neural Net, RF, XGB
## Number of resamples: 20
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Neural Net 2.809372 3.365640 3.647068 3.656338 3.875043 4.317641 0
## RF 2.598364 2.790693 3.421066 3.367641 3.669695 4.301928 0
## XGB 1.861652 2.327385 2.546302 2.628142 2.740838 3.687222 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Neural Net 3.496230 4.282147 4.969973 4.878235 5.461264 5.911077 0
## RF 3.442241 4.066484 4.391610 4.736136 5.314887 6.810052 0
## XGB 2.851883 3.263398 3.626988 3.857600 4.023123 6.223857 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Neural Net 0.8742637 0.8956146 0.9195148 0.9163351 0.9326244 0.9491027 0
## RF 0.8117342 0.9068681 0.9303838 0.9213526 0.9442502 0.9633361 0
## XGB 0.8473354 0.9421326 0.9545396 0.9454180 0.9620434 0.9658337 0
library(iml)
predictor = Predictor$new(xgbFit, data = predictors, y = response)
imp = FeatureImp$new(predictor, loss = "rmse")
plot(imp)
imp$results
## feature importance.05 importance importance.95 permutation.error
## 1 Age 9.600921 9.646738 9.889071 14.750397
## 2 Cement 8.778072 8.909604 9.212497 13.623278
## 3 Water 5.260389 5.526491 5.639696 8.450311
## 4 BlastFurnaceSlag 4.470963 4.521810 4.687091 6.914098
## 5 FineAggregate 2.542928 2.597556 2.702018 3.971807
## 6 Superplasticizer 2.515451 2.575576 2.608171 3.938198
## 7 CoarseAggregate 2.237426 2.307224 2.328720 3.527874
## 8 FlyAsh 1.616057 1.621778 1.655662 2.479788
ale = FeatureEffect$new(predictor, feature = "Age")
ale$plot()
interact = Interaction$new(predictor)
plot(interact)
interact = Interaction$new(predictor, feature = "Age")
plot(interact)
tree = TreeSurrogate$new(predictor, maxdepth = 2)
plot(tree)
lime.explain = LocalModel$new(predictor, x.interest = predictors[1,])
lime.explain$results
## beta x.recoded effect x.original feature
## Cement 0.04050841 540.0 21.8745395 540 Cement
## Superplasticizer 0.46646311 2.5 1.1661578 2.5 Superplasticizer
## Age 0.03379576 28.0 0.9462812 28 Age
## feature.value
## Cement Cement=540
## Superplasticizer Superplasticizer=2.5
## Age Age=28
plot(lime.explain)
shapley = Shapley$new(predictor, x.interest = predictors[1,])
shapley$plot()
# set chosen model
predictions <- predict(xgbFit, predictors)
plot(response, predictions, col = "blue")
abline(0,1, col = "red")
# predict XGB on the test data
predictions <- predict(xgbFit, testdata)
# add index and export results to csv
final_pred <- cbind(index_testdata, predictions)
write.csv(final_pred, file = "predictions.csv", row.names = FALSE)
# stop core cluster
stopCluster(cl)
Comments
General overview: there are no null values in any of the columns.
Histograms: the data is not normally distributed, we should transform the data to normal distributions.
Box plots: there are outliers in predictors BlastFurnaceSlag, Water, Superplasticizer, FineAggregate and Age. Removing the rows with outliers will remove around 10% of the data so we will try and substitute them with the mean and check the results.
Scatter plots: based on the scatter plots feature ‘Cement’ seems to be most correlated with the response value CompressiveStrength we will check/ confirm this with a correlation overview and plot.
Correlation plot: we can observe a high positive correlation between the predictor and Cement. Also, Age and Super Plasticizer are the other two factors influencing the predictor. There is also a strong negative correlation between SuperPlasticizer and Water.
Count zeros: based on the plots we see many zeros in the features BlastFurnaceSlag, FlyAsh and Superplasticizer. We should explore this further.
Skewness: skewness is clearly visible, especially at the features with many zeros. Age has some extreme outliers at the top. We should transform the distributions to normal distributions when preprocessing