Concrete Strength Prediction
Concrete Strength Prediction
Concrete Strength Prediction
Overview
The purpose of this project is to predict concrete strength by comparing several models with different algorithms. MAE and R-Squared are used to evaluate the performance of each model.
Data Processing and Modelling Flow
Libraries and Data Importing
Data Preparation and EDA
Modelling
Evaluation
Libraries and Data Importing
Libraries Used
library(tidyverse) # data manipulating
library(olsrr) # outliers plotting and removing
library(ggplot2) # data plotting
library(plotly) # interactive plotting
library(ggthemes) # plot themes
library(corrplot) # correlation plotting
library(caret) # modelling
library(keras) # Neural net. modelling
library(tensorflow) # Neural net. modelling
library(MLmetrics) # model evaluation (MAE & R-squared)
library(randomForest) # Random Forest modelling
library(xgboost) # extreme gradient boosting modelling
library(lmtest) # assumption test
library(car) # assumption test
Data Importing
data <- read.csv("data/data-train.csv")
str(data)
## 'data.frame': 825 obs. of 9 variables:
## $ cement : num 540 540 332 332 199 ...
## $ slag : num 0 0 142 142 132 ...
## $ flyash : num 0 0 0 0 0 0 0 0 0 0 ...
## $ water : num 162 162 228 228 192 228 228 228 192 192 ...
## $ super_plast: num 2.5 2.5 0 0 0 0 0 0 0 0 ...
## $ coarse_agg : num 1040 1055 932 932 978 ...
## $ fine_agg : num 676 676 594 594 826 ...
## $ age : int 28 28 270 365 360 365 28 28 90 28 ...
## $ strength : num 80 61.9 40.3 41 44.3 ...
head(data)
## cement slag flyash water super_plast coarse_agg fine_agg age strength
## 1 540.0 0.0 0 162 2.5 1040.0 676.0 28 79.99
## 2 540.0 0.0 0 162 2.5 1055.0 676.0 28 61.89
## 3 332.5 142.5 0 228 0.0 932.0 594.0 270 40.27
## 4 332.5 142.5 0 228 0.0 932.0 594.0 365 41.05
## 5 198.6 132.4 0 192 0.0 978.4 825.5 360 44.30
## 6 380.0 95.0 0 228 0.0 932.0 594.0 365 43.70
So we have 8 variables suspected to affect the strength of the concrete.
cement
: The amount of cement (Kg) in a m3 mixtureslag
: The amount of blast furnace slag (Kg) in a m3 mixtureflyash
: The amount of fly ash (Kg) in a m3 mixturewater
: The amount of water (Kg) in a m3 mixturesuper_plast
: The amount of Superplasticizer (Kg) in a m3 mixturecoarse_agg
: The amount of Coarse Aggreagate (Kg) in a m3 mixturefine_agg
: The amount of Fine Aggreagate (Kg) in a m3 mixtureage
: the number of resting days before the compressive strength measurementstrength
: Concrete compressive strength measurement in MPa unit.
Data Prepartion and EDA
Data Preparation
NA Checking
data %>%
is.na() %>%
colSums()
## cement slag flyash water super_plast coarse_agg
## 0 0 0 0 0 0
## fine_agg age strength
## 0 0 0
There isn’t any NA in the data.
Outliers Checking
Studentized Residuals vs Leverage Plot
In this step, the outliers of the data will be eliminated by plotting the observations into a plot that will seperate the obseations into different zones.
outliers <- lm(strength ~. , data=data)
d <- ols_plot_resid_lev(outliers)
eliminate <- d$leverage$observation
eliminate
## [1] 1 11 12 32 41 66 69 72 77 93 106 126 298 299 316 318 332
## [18] 391 396 397 414 425 426 600 615
The plot above shows us the observations and its status as the outlier or as the leverage or both. The ones that will be eliminated are those who are the outliers (green points) and those who are the leverages and the outliers at the same time (purple points).
data_no_outliers <- data[-eliminate, ]
data_no_outliers %>%
count()
## # A tibble: 1 x 1
## n
## <int>
## 1 800
There are 25 observations defined as outliers that’s been removed.
Near Zero Variance
data_no_outliers %>%
nearZeroVar()
## integer(0)
There isn’t any near-zero-variance among the variables.
Cross Validation
set.seed(111)
index1 <- sample(nrow(data_no_outliers), nrow(data_no_outliers)*0.8)
index2 <- sample(nrow(data), nrow(data)*0.8)
train1 <- data_no_outliers[index1, ]
test1 <- data_no_outliers[-index1,]
train2 <- data[index2,]
test2 <- data[-index2,]
train3 <- train1 %>%
select(-9) %>%
scale() %>%
data.frame() %>%
mutate(strength = train1$strength)
Each data is divided into data train and data test.
Exploratory Data Analysis
Correlation Among Variables
corrplot(cor(data_no_outliers))
strength
have correlation to other variables, the highest is correlation with cement
and the lowest are correlation with slag
flyash
and coarse_agg
.
ggplot(data_no_outliers, aes(strength, cement)) +
geom_jitter(aes(col = super_plast, size = age)) +
labs(title = "Highest Correlated Variables on Strength", x= "Strength", y= 'Cement')+
theme(plot.title = element_text(hjust = 0.5))
The plot shows us the strength of the concrete tend to increase by the increasing amount of cement, the amount of super plast and the age of concrete while being measured also doesn’t really affect the concrete.
Variable Range
range_plot <- ggplot(data = data_no_outliers %>% gather(key = "Variable", value = "Value"), aes(Variable, Value)) +
geom_boxplot(aes(col = Variable))+
scale_color_calc()+
theme(axis.text.x = element_text(angle = 90),
axis.title.x = element_blank())
ggplotly(range_plot)
The variables are not in the same range so I will scale them for Neural Network modelling. There are also some outliers in some variables, I will try to eliminate them and see if it’s going to increase the models’ performance.
Modelling
Multiple Linear Regression
Data Without Outliers
lm.all1 <- lm(strength ~., train1)
lm.none1 <- lm(strength ~ 1, train1)
lm_forward1 <- stats::step(lm.all1,scope = list(lower = lm.none1, upper = lm.all1), direction = "backward")
## Start: AIC=2906.65
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg +
## fine_agg + age
##
## Df Sum of Sq RSS AIC
## <none> 58395 2906.6
## - water 1 394 58789 2909.0
## - fine_agg 1 698 59093 2912.3
## - coarse_agg 1 755 59150 2912.9
## - super_plast 1 841 59236 2913.8
## - flyash 1 4999 63394 2957.2
## - slag 1 8411 66806 2990.8
## - cement 1 16409 74804 3063.1
## - age 1 32843 91237 3190.2
summary(lm_forward1)
##
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast +
## coarse_agg + fine_agg + age, data = train1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.2147 -5.9223 0.6697 6.6704 20.5073
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -62.219802 30.626863 -2.032 0.04262 *
## cement 0.132941 0.009984 13.316 < 2e-16 ***
## slag 0.112282 0.011777 9.534 < 2e-16 ***
## flyash 0.108975 0.014827 7.350 6.18e-13 ***
## water -0.096174 0.046594 -2.064 0.03942 *
## super_plast 0.329303 0.109212 3.015 0.00267 **
## coarse_agg 0.030700 0.010748 2.856 0.00443 **
## fine_agg 0.034172 0.012439 2.747 0.00618 **
## age 0.121366 0.006442 18.838 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.62 on 631 degrees of freedom
## Multiple R-squared: 0.6627, Adjusted R-squared: 0.6585
## F-statistic: 155 on 8 and 631 DF, p-value: < 2.2e-16
Model using data without outliers returns the adjusted R-squared of 0.6585. It means that arround 65.8% of the observations could be explained by this model.
Data With Outliers
lm.all2 <- lm(strength ~., train2)
lm.none2 <- lm(strength ~ 1, train2)
lm_forward2 <- stats::step(lm.all2,scope = list(lower = lm.none2, upper = lm.all2),direction = "backward")
## Start: AIC=3080.9
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg +
## fine_agg + age
##
## Df Sum of Sq RSS AIC
## - fine_agg 1 2 68393 3078.9
## - coarse_agg 1 66 68457 3079.5
## <none> 68391 3080.9
## - super_plast 1 1001 69392 3088.5
## - water 1 1401 69792 3092.3
## - flyash 1 1775 70166 3095.8
## - slag 1 4811 73202 3123.8
## - cement 1 11117 79508 3178.3
## - age 1 33891 102282 3344.5
##
## Step: AIC=3078.92
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg +
## age
##
## Df Sum of Sq RSS AIC
## - coarse_agg 1 153 68547 3078.4
## <none> 68393 3078.9
## - super_plast 1 1019 69412 3086.7
## - flyash 1 4183 72576 3116.1
## - water 1 4223 72616 3116.5
## - slag 1 17794 86187 3229.5
## - age 1 33911 102304 3342.7
## - cement 1 42562 110955 3396.3
##
## Step: AIC=3078.4
## strength ~ cement + slag + flyash + water + super_plast + age
##
## Df Sum of Sq RSS AIC
## <none> 68547 3078.4
## - super_plast 1 867 69413 3084.7
## - flyash 1 4101 72647 3114.7
## - water 1 6444 74991 3135.7
## - slag 1 18116 86662 3231.2
## - age 1 34242 102788 3343.8
## - cement 1 42858 111405 3396.9
summary(lm_forward2)
##
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast +
## age, data = train2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.500 -6.829 0.820 7.100 35.126
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.279867 5.388139 5.249 2.08e-07 ***
## cement 0.104878 0.005190 20.206 < 2e-16 ***
## slag 0.078350 0.005964 13.137 < 2e-16 ***
## flyash 0.060646 0.009703 6.250 7.40e-10 ***
## water -0.212573 0.027131 -7.835 1.91e-14 ***
## super_plast 0.315416 0.109771 2.873 0.00419 **
## age 0.126275 0.006992 18.061 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.25 on 653 degrees of freedom
## Multiple R-squared: 0.6222, Adjusted R-squared: 0.6187
## F-statistic: 179.2 on 6 and 653 DF, p-value: < 2.2e-16
Model using data with outliers returns the adjusted R-squared of 0.6187 which means that only arround 61.8% of the observations could be explained by the model.
The data without outliers give the best result with adjusted r-squared of 0.6585. The less significant variables are fine_agg
and coarse_agg
.
Data Without fine_agg and coarse_agg
lm_final <- lm(formula = strength ~ cement + slag + flyash + water + super_plast + age, data = data_no_outliers)
summary(lm_final)
##
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast +
## age, data = data_no_outliers)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.8319 -6.3289 0.8218 6.7174 20.8786
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.807058 4.481772 6.204 8.83e-10 ***
## cement 0.106631 0.004476 23.825 < 2e-16 ***
## slag 0.084388 0.005235 16.119 < 2e-16 ***
## flyash 0.071462 0.008086 8.837 < 2e-16 ***
## water -0.215335 0.022555 -9.547 < 2e-16 ***
## super_plast 0.259034 0.088313 2.933 0.00345 **
## age 0.117982 0.005713 20.650 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.664 on 793 degrees of freedom
## Multiple R-squared: 0.6526, Adjusted R-squared: 0.65
## F-statistic: 248.3 on 6 and 793 DF, p-value: < 2.2e-16
The model without fine_agg
and coarse_agg
doesn’t really affect the model performance.
Assumption Test
- Normality Residuals
H0 = Residuals normally distributed
H1 = Residuals not normally distributed
shapiro.test(lm_final$residuals)
##
## Shapiro-Wilk normality test
##
## data: lm_final$residuals
## W = 0.99171, p-value = 0.0001849
p-Value < 0.05, reject null hypothesis. The residuals are more likely not-normally-distributed.
- Homoscedasticity
H0 = Homoscedastic
H1 = Heteroscedastic
bptest(lm_final)
##
## studentized Breusch-Pagan test
##
## data: lm_final
## BP = 124.16, df = 6, p-value < 2.2e-16
p-Value < 0.05, reject null hypothesis. The residuals are more likely to be heteroscedastic.
- Multicolinearity
vif(lm_final)
## cement slag flyash water super_plast age
## 1.858746 1.717994 2.301958 1.912341 2.396542 1.098473
There isn’t any multicolinearity situation among the variables.
The model created by regression model doesn’t give good result in assumption test of Normality residuals and Homoscedasticity.
Predicting Data Test
predict_regression <- predict(lm_final, test1)
MAE(y_pred = predict_regression, test1$strength)
## [1] 7.82459
R2_Score(y_pred = predict_regression, test1$strength)
## [1] 0.6304002
The Performance of the model after being tested to the data test doesn’t change drastically the performance of the model.
Multiple Linear Regression (Scaled Data)
This section will show you how the performance of the model change if the data is standarized (generating Z-score) before being trained for the model.
scaled <- data_no_outliers %>%
select(-9) %>%
scale(center = T, scale = T) %>%
data.frame() %>%
mutate(strength = data_no_outliers$strength)
head(scaled)
## cement slag flyash water super_plast coarse_agg
## 1 2.5108298 -0.8456622 -0.8526166 -0.9199052 -0.6193728 1.06734907
## 2 0.5184145 0.8192025 -0.8526166 2.2287599 -1.0365291 -0.53494220
## 3 0.5184145 0.8192025 -0.8526166 2.2287599 -1.0365291 -0.53494220
## 4 -0.7672936 0.7012016 -0.8526166 0.5113062 -1.0365291 0.06949938
## 5 0.9745096 0.2642476 -0.8526166 2.2287599 -1.0365291 -0.53494220
## 6 0.9745096 0.2642476 -0.8526166 2.2287599 -1.0365291 -0.53494220
## fine_agg age strength
## 1 -1.2418160 -0.2830425 61.89
## 2 -2.2527707 3.5756309 40.27
## 3 -2.2527707 5.0903994 41.05
## 4 0.6013269 5.0106747 44.30
## 5 -2.2527707 5.0903994 43.70
## 6 -2.2527707 -0.2830425 36.45
lm.all_s <- lm(strength ~., scaled)
lm.none_s <- lm(strength ~ 1, scaled)
lm_forward_s <- stats::step(lm.all_s,scope = list(lower = lm.none_s, upper = lm.all_s),direction = "backward")
## Start: AIC=3630.74
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg +
## fine_agg + age
##
## Df Sum of Sq RSS AIC
## <none> 73169 3630.7
## - water 1 609 73779 3635.4
## - fine_agg 1 736 73905 3636.7
## - coarse_agg 1 861 74031 3638.1
## - super_plast 1 1183 74353 3641.6
## - flyash 1 5519 78688 3686.9
## - slag 1 10443 83612 3735.5
## - cement 1 19497 92666 3817.7
## - age 1 39966 113136 3977.4
summary(lm_forward_s)
##
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast +
## coarse_agg + fine_agg + age, data = scaled)
##
## Residuals:
## Min 1Q Median 3Q Max
## -27.4446 -6.2657 0.7483 6.8796 20.1230
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.4919 0.3400 104.375 < 2e-16 ***
## cement 13.4819 0.9286 14.518 < 2e-16 ***
## slag 9.5479 0.8986 10.625 < 2e-16 ***
## flyash 6.5213 0.8443 7.724 3.42e-14 ***
## water -2.2418 0.8735 -2.566 0.010461 *
## super_plast 2.0927 0.5851 3.576 0.000369 ***
## coarse_agg 2.2853 0.7488 3.052 0.002352 **
## fine_agg 2.5324 0.8980 2.820 0.004923 **
## age 7.4346 0.3577 20.786 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.618 on 791 degrees of freedom
## Multiple R-squared: 0.6568, Adjusted R-squared: 0.6533
## F-statistic: 189.2 on 8 and 791 DF, p-value: < 2.2e-16
The scaled data doesn’t really change the performance of the model. The adjusted r-squared is still arround 0.65.
shapiro.test(lm_forward_s$residuals)
##
## Shapiro-Wilk normality test
##
## data: lm_forward_s$residuals
## W = 0.99041, p-value = 4.48e-05
bptest(lm_forward_s)
##
## studentized Breusch-Pagan test
##
## data: lm_forward_s
## BP = 119.43, df = 8, p-value < 2.2e-16
vif(lm_forward_s)
## cement slag flyash water super_plast coarse_agg
## 7.448834 6.974866 6.157367 6.590915 2.957384 4.843735
## fine_agg age
## 6.965873 1.105007
The residuals of the data is still in a not normally distributed condition and more likely to be heteroscedastic after the data is scaled.
Random Forest
Default Random Forest
set.seed(111)
random_forest1 <- randomForest(strength ~. , train1, importance = T)
random_forest1
##
## Call:
## randomForest(formula = strength ~ ., data = train1, importance = T)
## Type of random forest: regression
## Number of trees: 500
## No. of variables tried at each split: 2
##
## Mean of squared residuals: 31.26905
## % Var explained: 88.44
- Model Performance on Data Train
predict_rf_train <- predict(random_forest1, train1)
MAE(y_pred = predict_rf_train, train1$strength)
## [1] 2.317671
R2_Score(y_pred = predict_rf_train, train1$strength)
## [1] 0.9669058
The MAE of the data train is arround 2.3 and the R-squared is arround 0.96.
- Model Performance on Data Test
predict_rf_test <- predict(random_forest1, test1)
MAE(y_pred = predict_rf_test, test1$strength)
## [1] 4.023457
R2_Score(y_pred = predict_rf_test, test1$strength)
## [1] 0.8983492
The MAE of the data train is arround 4 and the R-squared is arround 0.89. This model is a little bit overfit.
Repeated Cross-Validation Random Forest
set.seed(111)
ctrl <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
random_forest_kf <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T)
- Model Performance on Data Train
predict_rf_kf_train <- predict(random_forest_kf, train1)
MAE(y_pred = predict_rf_kf_train, train1$strength)
## [1] 1.566825
R2_Score(y_pred = predict_rf_kf_train, train1$strength)
## [1] 0.9827747
The MAE of the data train is arround 1.5 and the R-squared is arround 0.98.
- Model Performance on Data Test
predict_rf_kf <- predict(random_forest_kf, test1)
MAE(y_pred = predict_rf_kf, test1$strength)
## [1] 3.261497
R2_Score(y_pred = predict_rf_kf, test1$strength)
## [1] 0.9202383
The model is a bit overfit, the MAE and R-Squared decreased when applied to data test. But anyway the performance is still good for the data test.
Random Forest Tunning
- Best mtry
set.seed(111)
tuneGrid <- expand.grid(.mtry = c(1 : 10))
ctrl <- trainControl(method = "repeatedcv", number = 4, repeats = 3, search = 'grid')
random_forest2 <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid)
# saveRDS(random_forest2, "rf2.rds")
random_forest2
## Random Forest
##
## 640 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (4 fold, repeated 3 times)
## Summary of sample sizes: 480, 480, 480, 480, 480, 480, ...
## Resampling results across tuning parameters:
##
## mtry RMSE Rsquared MAE
## 1 7.958314 0.8153242 6.414368
## 2 6.164759 0.8821002 4.765663
## 3 5.652414 0.8947230 4.325177
## 4 5.484685 0.8973927 4.154928
## 5 5.405860 0.8984021 4.077201
## 6 5.388267 0.8980253 4.046609
## 7 5.409674 0.8965995 4.030283
## 8 5.422302 0.8955358 4.025745
## 9 5.419954 0.8957245 4.028254
## 10 5.434759 0.8951116 4.032395
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 6.
The best mtry defined is mtry = 6.
- Best maxnodes
store_maxnode <- list()
best_mtry <- random_forest2$bestTune$mtry
tuneGrid <- expand.grid(.mtry = best_mtry)
for (maxnodes in c(1:100)) {
set.seed(111)
random_forest_maxnode <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid, maxnodes=maxnodes)
current_iteration <- toString(maxnodes)
store_maxnode[[current_iteration]] <- random_forest_maxnode
}
## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.
results_mtry <- resamples(store_maxnode)
summary(results_mtry)
##
## Call:
## summary.resamples(object = results_mtry)
##
## Models: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100
## Number of resamples: 12
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1 12.666669 12.995524 13.193038 13.186415 13.355179 13.752310 0
## 2 10.171481 10.474636 10.693555 10.634693 10.756417 10.923110 0
## 3 9.101885 9.244209 9.405757 9.421281 9.538627 9.939129 0
## 4 8.024406 8.322673 8.367425 8.392554 8.495123 8.705707 0
## 5 7.394535 8.030953 8.107454 8.120594 8.318161 8.503510 0
## 6 7.114220 7.558034 7.780871 7.687996 7.846101 7.887980 0
## 7 6.890933 7.293917 7.421221 7.371506 7.501225 7.680152 0
## 8 6.512300 6.801516 6.946343 6.965048 7.166065 7.345797 0
## 9 6.309483 6.705903 6.920341 6.894946 7.081168 7.345600 0
## 10 6.198228 6.582422 6.824111 6.795517 6.996180 7.268973 0
## 11 6.100529 6.458748 6.622204 6.642086 6.894813 7.093801 0
## 12 6.024407 6.284404 6.455337 6.500283 6.764215 7.004216 0
## 13 5.913935 6.144421 6.316851 6.372718 6.632119 6.893290 0
## 14 5.867660 5.963510 6.183110 6.244747 6.479317 6.669929 0
## 15 5.765542 5.889977 6.100119 6.134167 6.393459 6.556502 0
## 16 5.638974 5.730334 6.020144 6.042091 6.352518 6.489975 0
## 17 5.666873 5.704515 5.954202 6.012070 6.276911 6.497406 0
## 18 5.571789 5.677412 5.929209 5.963620 6.295535 6.384823 0
## 19 5.516792 5.681283 5.873326 5.928576 6.223814 6.386239 0
## 20 5.409041 5.628655 5.806666 5.866561 6.153505 6.298202 0
## 21 5.370908 5.533031 5.740987 5.788124 6.068034 6.198409 0
## 22 5.381251 5.477308 5.654169 5.736114 5.999666 6.136142 0
## 23 5.202445 5.417435 5.612349 5.666648 5.948135 6.047422 0
## 24 5.207579 5.355883 5.530700 5.595757 5.873799 6.002781 0
## 25 5.160788 5.239322 5.440457 5.529261 5.829926 5.968398 0
## 26 5.119127 5.191570 5.385031 5.483363 5.790990 5.917810 0
## 27 5.093141 5.148216 5.350832 5.432488 5.718006 5.860115 0
## 28 5.053126 5.136042 5.295209 5.394125 5.680989 5.898031 0
## 29 4.959091 5.056356 5.241158 5.342099 5.678498 5.807180 0
## 30 4.946212 5.044145 5.239593 5.326086 5.613230 5.879230 0
## 31 4.910383 4.993552 5.121770 5.279710 5.571692 5.791972 0
## 32 4.885233 4.990003 5.168119 5.266257 5.538423 5.771463 0
## 33 4.843924 4.969014 5.145939 5.241639 5.516042 5.720623 0
## 34 4.822969 4.944618 5.123267 5.223775 5.521362 5.697963 0
## 35 4.826840 4.942619 5.101210 5.200237 5.486573 5.625277 0
## 36 4.785507 4.905010 5.071653 5.172401 5.462067 5.619668 0
## 37 4.772202 4.914899 5.056745 5.151731 5.474289 5.547927 0
## 38 4.691883 4.840109 5.017435 5.088899 5.403289 5.498950 0
## 39 4.670342 4.856162 4.976594 5.079156 5.401778 5.471810 0
## 40 4.652541 4.792655 4.960224 5.045054 5.355923 5.489133 0
## 41 4.622214 4.768251 4.915047 5.014754 5.324580 5.430625 0
## 42 4.600108 4.762203 4.901859 4.980907 5.259310 5.415109 0
## 43 4.601816 4.693703 4.873921 4.948189 5.226197 5.369362 0
## 44 4.522711 4.686678 4.828089 4.922851 5.245488 5.353357 0
## 45 4.517616 4.648941 4.818116 4.906494 5.211935 5.400384 0
## 46 4.526070 4.630574 4.785879 4.879258 5.168391 5.314796 0
## 47 4.481192 4.606158 4.742571 4.858899 5.156524 5.306319 0
## 48 4.469115 4.563425 4.726130 4.832116 5.102239 5.291006 0
## 49 4.440228 4.560494 4.717978 4.827503 5.106607 5.277192 0
## 50 4.441386 4.570162 4.731089 4.812412 5.073447 5.263677 0
## 51 4.433728 4.502573 4.683289 4.779823 5.059858 5.206425 0
## 52 4.402053 4.547297 4.689525 4.782247 5.037458 5.220126 0
## 53 4.382385 4.505463 4.665385 4.758336 5.036435 5.227634 0
## 54 4.337971 4.494666 4.658797 4.752005 5.027187 5.213345 0
## 55 4.364542 4.486329 4.647746 4.735595 5.027309 5.180411 0
## 56 4.319522 4.480286 4.592337 4.722420 4.983428 5.189955 0
## 57 4.278988 4.438286 4.624596 4.699841 4.981055 5.170773 0
## 58 4.321579 4.422587 4.563887 4.676461 4.919356 5.151519 0
## 59 4.299203 4.457851 4.590038 4.674686 4.918785 5.099639 0
## 60 4.268393 4.426847 4.581769 4.655625 4.904139 5.136081 0
## 61 4.283657 4.396298 4.520773 4.630035 4.874511 5.090278 0
## 62 4.243713 4.385730 4.567961 4.621459 4.869324 5.050587 0
## 63 4.164378 4.388424 4.499959 4.592910 4.854946 5.043356 0
## 64 4.241563 4.352694 4.502561 4.593531 4.828921 5.043903 0
## 65 4.152874 4.358556 4.501801 4.574758 4.799520 5.051070 0
## 66 4.205855 4.340855 4.442574 4.562868 4.806395 5.056077 0
## 67 4.136898 4.295353 4.469851 4.545895 4.812491 5.081667 0
## 68 4.124913 4.297788 4.441756 4.537675 4.807989 5.044352 0
## 69 4.115238 4.287112 4.471304 4.526133 4.796951 4.991888 0
## 70 4.122742 4.245180 4.432436 4.500125 4.764194 5.028199 0
## 71 4.117454 4.235462 4.405831 4.488677 4.776870 4.945304 0
## 72 4.071259 4.272995 4.412844 4.493588 4.750259 4.955720 0
## 73 4.062165 4.222887 4.400049 4.472445 4.721100 4.982357 0
## 74 4.060909 4.233915 4.396619 4.478708 4.742407 5.041518 0
## 75 4.033545 4.228637 4.415532 4.467001 4.743941 4.962287 0
## 76 4.027037 4.219513 4.365998 4.447076 4.712918 4.970647 0
## 77 4.059415 4.222140 4.375001 4.450156 4.674393 4.953568 0
## 78 4.033601 4.170918 4.368557 4.427638 4.670258 4.913441 0
## 79 3.987496 4.198061 4.309559 4.416280 4.675902 4.958982 0
## 80 4.000992 4.180655 4.315187 4.404320 4.662328 4.922232 0
## 81 3.980601 4.195610 4.334954 4.413239 4.653522 4.949693 0
## 82 3.961377 4.168369 4.315541 4.397979 4.628895 4.945933 0
## 83 4.002679 4.141012 4.297002 4.382038 4.617546 4.902053 0
## 84 3.930606 4.139527 4.303488 4.365889 4.631989 4.956668 0
## 85 3.949089 4.130276 4.325573 4.363868 4.631449 4.875245 0
## 86 3.961272 4.143420 4.300112 4.366095 4.611778 4.847535 0
## 87 3.984316 4.146281 4.277953 4.364538 4.613116 4.918584 0
## 88 3.921470 4.108607 4.262835 4.336548 4.603196 4.875903 0
## 89 3.866333 4.123096 4.285827 4.331684 4.587848 4.860816 0
## 90 3.915706 4.105556 4.268747 4.332452 4.581236 4.888070 0
## 91 3.878066 4.052429 4.246684 4.300459 4.551998 4.875767 0
## 92 3.905276 4.088103 4.228480 4.304828 4.557614 4.877182 0
## 93 3.865533 4.097687 4.250310 4.310552 4.559023 4.870027 0
## 94 3.861083 4.081623 4.239510 4.297851 4.551399 4.832682 0
## 95 3.835763 4.053458 4.218177 4.281541 4.529269 4.827723 0
## 96 3.883920 4.055087 4.192463 4.275389 4.530928 4.877318 0
## 97 3.844942 4.064955 4.171494 4.268695 4.495071 4.875706 0
## 98 3.866718 4.051556 4.226803 4.283218 4.543325 4.871091 0
## 99 3.802983 4.069778 4.201711 4.257006 4.499345 4.774153 0
## 100 3.828103 4.020013 4.155446 4.250378 4.504486 4.831153 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1 15.466636 16.254002 16.582784 16.444122 16.775774 17.010099 0
## 2 12.571988 13.006443 13.117113 13.101487 13.237757 13.471237 0
## 3 11.470620 11.749286 11.870169 11.895124 12.011540 12.502383 0
## 4 9.988759 10.153556 10.350199 10.359161 10.587345 10.767276 0
## 5 9.264779 9.925914 10.157601 10.126326 10.334320 10.623855 0
## 6 8.889302 9.361275 9.731150 9.608509 9.822578 10.007264 0
## 7 8.525405 9.019966 9.227114 9.179015 9.392308 9.667874 0
## 8 8.086174 8.502377 8.724773 8.727429 9.089979 9.159121 0
## 9 7.904711 8.416259 8.667839 8.670389 9.057733 9.195531 0
## 10 7.765302 8.330775 8.622198 8.560066 8.903160 9.027605 0
## 11 7.537720 8.099951 8.420050 8.340163 8.652283 8.830029 0
## 12 7.483404 7.935223 8.183675 8.156572 8.419310 8.595278 0
## 13 7.325306 7.778664 8.004290 7.981177 8.253140 8.434551 0
## 14 7.262570 7.591620 7.806433 7.815439 8.109333 8.303583 0
## 15 7.117733 7.455334 7.688935 7.664960 7.943787 8.129016 0
## 16 7.022705 7.223735 7.616663 7.542355 7.813485 8.065500 0
## 17 7.007242 7.201288 7.572881 7.521876 7.755430 8.090000 0
## 18 6.941957 7.181410 7.549654 7.474403 7.773929 7.951273 0
## 19 6.867721 7.218426 7.515755 7.433732 7.646954 7.994999 0
## 20 6.764491 7.114181 7.432897 7.354881 7.605868 7.837008 0
## 21 6.690131 6.975157 7.334300 7.251381 7.479169 7.703927 0
## 22 6.682376 6.960799 7.178582 7.179703 7.381459 7.635797 0
## 23 6.464127 6.855955 7.100330 7.091095 7.335765 7.603144 0
## 24 6.454549 6.814392 6.992712 7.005822 7.194016 7.503902 0
## 25 6.402850 6.696083 6.861062 6.918686 7.179365 7.425321 0
## 26 6.345678 6.649856 6.802355 6.864159 7.085503 7.377449 0
## 27 6.295976 6.529831 6.774677 6.804983 7.025546 7.353292 0
## 28 6.248834 6.500376 6.708228 6.746628 6.937583 7.307328 0
## 29 6.107021 6.411681 6.636138 6.681598 6.924690 7.293610 0
## 30 6.111920 6.373589 6.659991 6.677138 6.891269 7.283388 0
## 31 6.061892 6.307445 6.565500 6.619847 6.852203 7.261238 0
## 32 6.088675 6.312891 6.629810 6.610806 6.799516 7.225483 0
## 33 5.985415 6.305044 6.583604 6.579447 6.805791 7.124182 0
## 34 5.998075 6.267327 6.575732 6.564938 6.791591 7.183330 0
## 35 5.992544 6.280674 6.507356 6.529146 6.759120 7.125274 0
## 36 5.965409 6.272432 6.445081 6.499174 6.733106 7.071178 0
## 37 5.939549 6.249749 6.460354 6.475161 6.686233 6.998080 0
## 38 5.838409 6.203366 6.413490 6.411872 6.610348 6.966266 0
## 39 5.830851 6.189099 6.350901 6.392440 6.604062 6.937923 0
## 40 5.806353 6.100505 6.318009 6.358219 6.626974 6.895563 0
## 41 5.759565 6.072325 6.271804 6.317879 6.565528 6.893512 0
## 42 5.727898 6.052438 6.258735 6.285608 6.509485 6.842959 0
## 43 5.697110 5.986323 6.220246 6.241770 6.469828 6.788742 0
## 44 5.651271 5.940340 6.205522 6.224073 6.469441 6.838156 0
## 45 5.650803 5.921667 6.201312 6.204523 6.434354 6.811490 0
## 46 5.624203 5.892822 6.139200 6.168747 6.414601 6.721599 0
## 47 5.601953 5.892359 6.093925 6.145005 6.398363 6.730218 0
## 48 5.568893 5.816539 6.080580 6.123008 6.329813 6.800079 0
## 49 5.625418 5.846871 6.043678 6.113604 6.352208 6.745974 0
## 50 5.542810 5.800115 6.054067 6.102584 6.306135 6.745983 0
## 51 5.538706 5.802510 6.079262 6.082011 6.253516 6.736671 0
## 52 5.491540 5.803607 6.032943 6.066724 6.263994 6.661955 0
## 53 5.532536 5.750591 6.000592 6.056855 6.288359 6.743044 0
## 54 5.444690 5.758897 6.014092 6.040774 6.243117 6.728374 0
## 55 5.452044 5.747694 5.959691 6.022185 6.272249 6.686504 0
## 56 5.444681 5.786113 5.939387 6.010935 6.241174 6.666152 0
## 57 5.398209 5.728745 5.976009 6.000873 6.254880 6.694358 0
## 58 5.433470 5.695431 5.907833 5.964497 6.163940 6.667252 0
## 59 5.427512 5.718410 5.928146 5.963239 6.139353 6.580709 0
## 60 5.432328 5.649462 5.903595 5.943438 6.167475 6.633523 0
## 61 5.381898 5.663174 5.870160 5.913015 6.130634 6.545875 0
## 62 5.358244 5.638055 5.911484 5.917433 6.136299 6.555146 0
## 63 5.275057 5.649530 5.838235 5.888343 6.122383 6.540047 0
## 64 5.347438 5.594809 5.837157 5.883788 6.107074 6.516209 0
## 65 5.242110 5.580711 5.829436 5.863451 6.072932 6.574025 0
## 66 5.305702 5.603758 5.784027 5.856073 6.100602 6.548610 0
## 67 5.251391 5.549863 5.829825 5.844826 6.060570 6.596828 0
## 68 5.237867 5.549047 5.786066 5.829910 6.097882 6.527706 0
## 69 5.231058 5.535939 5.831615 5.818870 6.033483 6.464024 0
## 70 5.223185 5.493593 5.766552 5.799466 6.042635 6.527281 0
## 71 5.210201 5.500081 5.766930 5.781435 6.018921 6.471239 0
## 72 5.184857 5.528135 5.745331 5.791621 6.035250 6.447643 0
## 73 5.168853 5.496733 5.759968 5.773773 6.026107 6.496485 0
## 74 5.156172 5.493555 5.748430 5.769290 5.991157 6.541438 0
## 75 5.125492 5.511783 5.729800 5.759447 5.993634 6.470104 0
## 76 5.129924 5.471741 5.712022 5.737176 5.942724 6.473712 0
## 77 5.166867 5.468895 5.694331 5.741809 5.969410 6.430601 0
## 78 5.125097 5.400185 5.692206 5.728364 6.003469 6.377183 0
## 79 5.065146 5.441384 5.670976 5.706058 5.925016 6.475636 0
## 80 5.094669 5.453233 5.684722 5.708589 5.927317 6.392300 0
## 81 5.056865 5.442923 5.680070 5.718640 5.988600 6.481351 0
## 82 5.085210 5.446055 5.684768 5.714282 5.968620 6.446876 0
## 83 5.117250 5.431497 5.640986 5.696038 5.919684 6.422837 0
## 84 5.050299 5.426223 5.657686 5.671361 5.847612 6.466838 0
## 85 5.041862 5.398018 5.679718 5.666406 5.888763 6.408117 0
## 86 5.045781 5.404946 5.659534 5.672369 5.904578 6.370977 0
## 87 5.071333 5.399438 5.639908 5.670839 5.912175 6.437857 0
## 88 4.992232 5.372019 5.626461 5.638184 5.881693 6.357368 0
## 89 4.974990 5.377809 5.656422 5.647612 5.855009 6.370423 0
## 90 5.002926 5.379314 5.658140 5.648100 5.857846 6.411636 0
## 91 4.988343 5.346669 5.616797 5.616146 5.797319 6.403248 0
## 92 4.986712 5.344414 5.603417 5.616645 5.802193 6.390019 0
## 93 4.949694 5.346775 5.602415 5.616462 5.843503 6.375822 0
## 94 4.946363 5.341115 5.604223 5.604187 5.836748 6.304004 0
## 95 4.928236 5.314388 5.583834 5.595709 5.846012 6.318686 0
## 96 4.963340 5.329992 5.577529 5.594812 5.802353 6.389542 0
## 97 4.947491 5.322424 5.555363 5.595398 5.821693 6.425617 0
## 98 4.963774 5.292333 5.589912 5.600757 5.795882 6.406710 0
## 99 4.880196 5.301502 5.561850 5.577153 5.836840 6.300001 0
## 100 4.899368 5.301065 5.516014 5.567853 5.833135 6.354759 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1 NA NA NA NaN NA NA 12
## 2 0.3553758 0.5316139 0.5810859 0.5621771 0.6157512 0.7124356 0
## 3 0.4660146 0.5327102 0.5464086 0.5670108 0.6087630 0.7131239 0
## 4 0.5685701 0.6407787 0.6836624 0.6716929 0.7052046 0.7654066 0
## 5 0.5693533 0.6360218 0.6784487 0.6707867 0.6963010 0.7801829 0
## 6 0.6227102 0.6745742 0.7145986 0.7049032 0.7262817 0.7913541 0
## 7 0.6604722 0.7039195 0.7390618 0.7259546 0.7448411 0.8004765 0
## 8 0.6751406 0.7293157 0.7511868 0.7471083 0.7701440 0.8150852 0
## 9 0.6747781 0.7307053 0.7494750 0.7481385 0.7767587 0.8182324 0
## 10 0.6817835 0.7356853 0.7585851 0.7544928 0.7809418 0.8241716 0
## 11 0.6969897 0.7527282 0.7678425 0.7674374 0.7869087 0.8342393 0
## 12 0.7022342 0.7640442 0.7819859 0.7770574 0.7964124 0.8334445 0
## 13 0.7148654 0.7769868 0.7891231 0.7863389 0.8039888 0.8434326 0
## 14 0.7352540 0.7843988 0.7949653 0.7951334 0.8137987 0.8455897 0
## 15 0.7419473 0.7935337 0.8017165 0.8020167 0.8223545 0.8485637 0
## 16 0.7562613 0.7951692 0.8051739 0.8081283 0.8318270 0.8505556 0
## 17 0.7643783 0.7935163 0.8058581 0.8088436 0.8304203 0.8517947 0
## 18 0.7566673 0.8002028 0.8079664 0.8111317 0.8339958 0.8526439 0
## 19 0.7650095 0.8019798 0.8109141 0.8132694 0.8323420 0.8577975 0
## 20 0.7664534 0.8073062 0.8157949 0.8175678 0.8363295 0.8620077 0
## 21 0.7711929 0.8144358 0.8175462 0.8222931 0.8410253 0.8642500 0
## 22 0.7724564 0.8183531 0.8244841 0.8258674 0.8439814 0.8637715 0
## 23 0.7781906 0.8216239 0.8276440 0.8298632 0.8476516 0.8738470 0
## 24 0.7851034 0.8253068 0.8326339 0.8338968 0.8484324 0.8732278 0
## 25 0.7835278 0.8291534 0.8401447 0.8380401 0.8533038 0.8750347 0
## 26 0.7889409 0.8323175 0.8413771 0.8401928 0.8556624 0.8783109 0
## 27 0.8003230 0.8330826 0.8430215 0.8429856 0.8583631 0.8788945 0
## 28 0.8050977 0.8352228 0.8463349 0.8455258 0.8586528 0.8814809 0
## 29 0.8035712 0.8374606 0.8502221 0.8484514 0.8631018 0.8870025 0
## 30 0.8080709 0.8368105 0.8482472 0.8483802 0.8630566 0.8871600 0
## 31 0.8116047 0.8379638 0.8518008 0.8505822 0.8656588 0.8886664 0
## 32 0.8126005 0.8394751 0.8490356 0.8513775 0.8658621 0.8878082 0
## 33 0.8126689 0.8421628 0.8513778 0.8523644 0.8654877 0.8904840 0
## 34 0.8128223 0.8419093 0.8514302 0.8531193 0.8681557 0.8905296 0
## 35 0.8145160 0.8452042 0.8542208 0.8548890 0.8676141 0.8911974 0
## 36 0.8166559 0.8465039 0.8578832 0.8561236 0.8680520 0.8920782 0
## 37 0.8138574 0.8500415 0.8576397 0.8569988 0.8674724 0.8925557 0
## 38 0.8185384 0.8508302 0.8585171 0.8597384 0.8726464 0.8956457 0
## 39 0.8191536 0.8532040 0.8613150 0.8606550 0.8708940 0.8961393 0
## 40 0.8205846 0.8536683 0.8619955 0.8615665 0.8754886 0.8966585 0
## 41 0.8237950 0.8544560 0.8643843 0.8632358 0.8754188 0.8980824 0
## 42 0.8277752 0.8564187 0.8659423 0.8648431 0.8790691 0.8993575 0
## 43 0.8302300 0.8583346 0.8664968 0.8664717 0.8790840 0.8999528 0
## 44 0.8296866 0.8564740 0.8670205 0.8668535 0.8831138 0.9013633 0
## 45 0.8289437 0.8583172 0.8694492 0.8680568 0.8830744 0.9020106 0
## 46 0.8323994 0.8618853 0.8707390 0.8694861 0.8823526 0.9025898 0
## 47 0.8327267 0.8599934 0.8729769 0.8702459 0.8827439 0.9034120 0
## 48 0.8376071 0.8584855 0.8730320 0.8710765 0.8853444 0.9042117 0
## 49 0.8358789 0.8597220 0.8743291 0.8714436 0.8870161 0.9023614 0
## 50 0.8389547 0.8603616 0.8726333 0.8716439 0.8866779 0.9049917 0
## 51 0.8419517 0.8604495 0.8740084 0.8727245 0.8852330 0.9053249 0
## 52 0.8413996 0.8631122 0.8743077 0.8732416 0.8865595 0.9066726 0
## 53 0.8386256 0.8606646 0.8741618 0.8736754 0.8890823 0.9055890 0
## 54 0.8416552 0.8616950 0.8761151 0.8743615 0.8879243 0.9091747 0
## 55 0.8391514 0.8633617 0.8766670 0.8748452 0.8884414 0.9073924 0
## 56 0.8408018 0.8637007 0.8782611 0.8754938 0.8863970 0.9082039 0
## 57 0.8404610 0.8658167 0.8764874 0.8759836 0.8883902 0.9102767 0
## 58 0.8453413 0.8644829 0.8786615 0.8772460 0.8924827 0.9088652 0
## 59 0.8472079 0.8682529 0.8779231 0.8775193 0.8906943 0.9093083 0
## 60 0.8453384 0.8654515 0.8786645 0.8779595 0.8943880 0.9084916 0
## 61 0.8467237 0.8691419 0.8804080 0.8792525 0.8928542 0.9098727 0
## 62 0.8457729 0.8686723 0.8784291 0.8790338 0.8937238 0.9113696 0
## 63 0.8462661 0.8690397 0.8812381 0.8799722 0.8947076 0.9139931 0
## 64 0.8476681 0.8700031 0.8814034 0.8802404 0.8951376 0.9104881 0
## 65 0.8491696 0.8696865 0.8813780 0.8810536 0.8946896 0.9145454 0
## 66 0.8472686 0.8690959 0.8833176 0.8811315 0.8950914 0.9115304 0
## 67 0.8498139 0.8677762 0.8813530 0.8815968 0.8968548 0.9139532 0
## 68 0.8481227 0.8705064 0.8825581 0.8820567 0.8961643 0.9144121 0
## 69 0.8519540 0.8720514 0.8809492 0.8824080 0.8975689 0.9147929 0
## 70 0.8513292 0.8702583 0.8839389 0.8831761 0.8985530 0.9148137 0
## 71 0.8522718 0.8730012 0.8840907 0.8840232 0.8982276 0.9149224 0
## 72 0.8503925 0.8728253 0.8848462 0.8836068 0.8986651 0.9162657 0
## 73 0.8515365 0.8722780 0.8840341 0.8843985 0.8996749 0.9173734 0
## 74 0.8537785 0.8694639 0.8841254 0.8842204 0.8987510 0.9170192 0
## 75 0.8540693 0.8734242 0.8851601 0.8849116 0.8990751 0.9183854 0
## 76 0.8564032 0.8724323 0.8859806 0.8856975 0.9003750 0.9180545 0
## 77 0.8539086 0.8737777 0.8865691 0.8854249 0.8987792 0.9158453 0
## 78 0.8522086 0.8749575 0.8861577 0.8858395 0.9009660 0.9183126 0
## 79 0.8567629 0.8723976 0.8873079 0.8866617 0.9016161 0.9189873 0
## 80 0.8563860 0.8756476 0.8865698 0.8865471 0.8997373 0.9181920 0
## 81 0.8521307 0.8728108 0.8871073 0.8861656 0.8995245 0.9199105 0
## 82 0.8534849 0.8743344 0.8866648 0.8862607 0.9004343 0.9188570 0
## 83 0.8569996 0.8747999 0.8884931 0.8870344 0.8995704 0.9171836 0
## 84 0.8596946 0.8757804 0.8880005 0.8879764 0.8996355 0.9202597 0
## 85 0.8587317 0.8760967 0.8862931 0.8880883 0.9030902 0.9199343 0
## 86 0.8570934 0.8765749 0.8877665 0.8877778 0.9008505 0.9199563 0
## 87 0.8570892 0.8746729 0.8883795 0.8878677 0.9022920 0.9191473 0
## 88 0.8586336 0.8770482 0.8884854 0.8890743 0.9023103 0.9217356 0
## 89 0.8597849 0.8767616 0.8877296 0.8887483 0.9008787 0.9221986 0
## 90 0.8599164 0.8756657 0.8873177 0.8888206 0.9023069 0.9215736 0
## 91 0.8615889 0.8781280 0.8892474 0.8900939 0.9024703 0.9217348 0
## 92 0.8625955 0.8762253 0.8896783 0.8897324 0.9031238 0.9212640 0
## 93 0.8607586 0.8774487 0.8899641 0.8899865 0.9032064 0.9226214 0
## 94 0.8606682 0.8788526 0.8895659 0.8904931 0.9049399 0.9235878 0
## 95 0.8600912 0.8784980 0.8901526 0.8906271 0.9033033 0.9234815 0
## 96 0.8624230 0.8775882 0.8905696 0.8904766 0.9032791 0.9222995 0
## 97 0.8612935 0.8764298 0.8915705 0.8906040 0.9027021 0.9228581 0
## 98 0.8622274 0.8766053 0.8903308 0.8903151 0.9037348 0.9220020 0
## 99 0.8600106 0.8795097 0.8912030 0.8913267 0.9035608 0.9252951 0
## 100 0.8605627 0.8768052 0.8932481 0.8914790 0.9057399 0.9239863 0
The higher the maxnodes the better the performance. The 10 last maxnodes return almost the same performance. Let’s just try maxnodes of 90
- Best nTrees
store_maxtrees <- list()
for (ntree in c(250, 300, 350, 400, 450, 500, 550, 600, 800, 1000)) {
set.seed(111)
random_forest_maxtree <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid, ntree = ntree, maxnodes = 90)
key <- toString(ntree)
store_maxtrees[[key]] <- random_forest_maxtree
}
results_tree <- resamples(store_maxtrees)
summary(results_tree)
##
## Call:
## summary.resamples(object = results_tree)
##
## Models: 250, 300, 350, 400, 450, 500, 550, 600, 800, 1000
## Number of resamples: 12
##
## MAE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 250 3.895912 4.082000 4.285613 4.339462 4.592441 4.933105 0
## 300 3.898656 4.087173 4.288965 4.334791 4.576438 4.921247 0
## 350 3.909302 4.090478 4.280679 4.331180 4.576194 4.903621 0
## 400 3.928810 4.095702 4.273540 4.332917 4.583629 4.892261 0
## 450 3.923761 4.099865 4.266525 4.333419 4.586127 4.884295 0
## 500 3.915706 4.105556 4.268747 4.332452 4.581236 4.888070 0
## 550 3.912079 4.118299 4.266197 4.331660 4.578614 4.881327 0
## 600 3.917555 4.119647 4.266863 4.330920 4.578060 4.873924 0
## 800 3.905900 4.121835 4.260137 4.326028 4.560735 4.877705 0
## 1000 3.913228 4.116296 4.252034 4.323004 4.571270 4.867959 0
##
## RMSE
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 250 4.994902 5.363187 5.688904 5.666331 5.898793 6.448471 0
## 300 4.983465 5.373398 5.677615 5.658660 5.871945 6.432675 0
## 350 4.985320 5.358902 5.665844 5.649038 5.847199 6.425646 0
## 400 5.013175 5.357417 5.666252 5.650805 5.852934 6.406395 0
## 450 5.012709 5.367422 5.660790 5.648297 5.861439 6.401608 0
## 500 5.002926 5.379314 5.658140 5.648100 5.857846 6.411636 0
## 550 5.000061 5.380619 5.660641 5.647077 5.845445 6.406783 0
## 600 5.002928 5.381998 5.657586 5.644557 5.850677 6.393408 0
## 800 4.999468 5.384560 5.639036 5.637612 5.853339 6.387184 0
## 1000 5.007758 5.380485 5.636893 5.633613 5.844059 6.372587 0
##
## Rsquared
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 250 0.8577124 0.8746659 0.8859580 0.8880820 0.9021097 0.9219139 0
## 300 0.8591152 0.8752758 0.8864010 0.8883665 0.9016869 0.9221414 0
## 350 0.8605070 0.8755559 0.8869922 0.8887831 0.9022169 0.9218977 0
## 400 0.8602265 0.8755973 0.8869780 0.8886895 0.9025050 0.9210347 0
## 450 0.8598399 0.8759949 0.8872410 0.8887891 0.9025228 0.9211672 0
## 500 0.8599164 0.8756657 0.8873177 0.8888206 0.9023069 0.9215736 0
## 550 0.8606834 0.8760324 0.8871309 0.8888849 0.9024121 0.9216680 0
## 600 0.8601538 0.8764164 0.8872429 0.8889417 0.9026535 0.9215043 0
## 800 0.8596993 0.8764750 0.8882434 0.8892288 0.9026496 0.9214786 0
## 1000 0.8603660 0.8768495 0.8885004 0.8893595 0.9030244 0.9210641 0
Number of trees doesn’t really affect the model performance.
random_forest_kf_final <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T, ntree = 500, maxnodes = 100, tuneGrid = tuneGrid)
random_forest_kf_final
## Random Forest
##
## 640 samples
## 8 predictor
##
## No pre-processing
## Resampling: Cross-Validated (4 fold, repeated 3 times)
## Summary of sample sizes: 480, 480, 480, 480, 480, 480, ...
## Resampling results:
##
## RMSE Rsquared MAE
## 5.581129 0.8890584 4.238691
##
## Tuning parameter 'mtry' was held constant at a value of 6
predict_rf_kf_train <- predict(random_forest_kf_final, train1)
predict_rf_kf <- predict(random_forest_kf_final, test1)
- Model Performance on Data Train
MAE(y_pred = predict_rf_kf_train, train1$strength)
## [1] 2.522225
R2_Score(y_pred = predict_rf_kf_train, train1$strength)
## [1] 0.9609133
- Model Performance on Data Test
MAE(y_pred = predict_rf_kf, test1$strength)
## [1] 3.932892
R2_Score(y_pred = predict_rf_kf, test1$strength)
## [1] 0.8997213
The model performance of Repeated Cross-Validation is still better then the tuned Random Forest.
XGBTree (Extra Gradient Boosting)
X_train = xgb.DMatrix(as.matrix(train1 %>% select(-strength)))
y_train = train1$strength
X_test = xgb.DMatrix(as.matrix(test1 %>% select(-strength)))
y_test = test1$strength
xgb_trcontrol = trainControl(
method = "repeatedcv",
number = 4,
repeats = 3,
allowParallel = TRUE,
verboseIter = FALSE,
returnData = FALSE
)
set.seed(111)
xgb_model = caret::train(
X_train, y_train,
trControl = xgb_trcontrol,
method = "xgbTree",
importance = T
)
predict_xgbt_train <- predict(xgb_model, train1)
predict_xgbt <- predict(xgb_model, test1)
- Model Performance on Data Train
MAE(y_pred = predict_xgbt_train, train1$strength)
## [1] 1.286544
R2_Score(y_pred = predict_xgbt_train, train1$strength)
## [1] 0.988718
- Model Performance on Data Train
MAE(y_pred = predict_xgbt, test1$strength)
## [1] 2.673547
R2_Score(y_pred = predict_xgbt, test1$strength)
## [1] 0.9412361
The model performance is better than the model created by default random forest with k-fold.
Neural Network
Data Preprocess
train_scaled <- train1 %>%
select(-9) %>%
scale() %>%
data.frame() %>%
mutate(strength = train1$strength)
test_scaled <- test1 %>%
select(-9) %>%
scale() %>%
data.frame() %>%
mutate(strength = test1$strength)
train_matrix <- data.matrix(train_scaled)
test_matrix <- data.matrix(test_scaled)
train_x <- train_matrix[,-9]
train_y <- train_matrix[,9]
test_x <- test_matrix[,-9]
test_y <- test_matrix[,9]
Model Design
model_nn <- keras_model_sequential()
model_nn %>%
layer_dense(input_shape = c(8),
units = 512,
activation = "relu",
kernel_regularizer = regularizer_l2(l=0.001)) %>%
layer_dense(units = 256,
activation = "relu",
kernel_regularizer = regularizer_l2(l=0.001)) %>%
layer_dense(units = 128,
activation = "relu",
kernel_regularizer = regularizer_l2(l=0.001)) %>%
layer_dense(units = 64,
activation = "relu",
kernel_regularizer = regularizer_l2(l=0.001)) %>%
layer_dense( units = 1)
model_nn %>%
compile(loss = "mse",
optimizer_adamax(lr = 0.0005),
metrics = c("mae"))
hist <- model_nn %>% fit(train_x,
train_y,
epoch = 200)
pred_train <- model_nn %>%
predict(train_x)
pred_test <- model_nn %>%
predict(test_x)
- Model Performance on Data Train
MAE(pred_train,train_y)
## [1] 1.997067
R2_Score(pred_train,train_y)
## [1] 0.973989
- Model Performance on Data Test
MAE(pred_test,test_y)
## [1] 3.647482
R2_Score(pred_test,test_y)
## [1] 0.907411
The model performance after being tested to the data test is still not as good as the XGBTree model.
Evaluation
The XGBTree model return the best performance among all created models.
- Model Performance on Data Train
predict_xgb_train <- predict(xgb_model, train1)
MAE(y_pred = predict_xgb_train, train1$strength)
## [1] 1.286544
R2_Score(y_pred = predict_xgb_train, train1$strength)
## [1] 0.988718
- Model Performance on Data Test
predict_xgb_test <- predict(xgb_model, test1)
MAE(y_pred = predict_xgb_test, test1$strength)
## [1] 2.673547
R2_Score(y_pred = predict_xgb_test, test1$strength)
## [1] 0.9412361
- Variable Importance
x <- varImp(xgb_model)
var <- x$importance
var$varname <- colnames(train1[, as.numeric(rownames(var)) +1])
var
## Overall varname
## 7 100.0000000 age
## 0 99.3055253 cement
## 3 26.0828847 water
## 1 12.3470518 slag
## 4 9.5222047 super_plast
## 6 8.7157322 fine_agg
## 5 0.5344267 coarse_agg
## 2 0.0000000 flyash
Age
, cement
, water
are the most important variables for predicting the concrete strength.