Concrete Strength Prediction

Concrete Strength Prediction
- Overview
- Data Processing and Modelling Flow
Libraries and Data Importing
- Libraries Used
- Data Importing
Data Prepartion and EDA
- Data Preparation
- Exploratory Data Analysis
Modelling
Evaluation
Other Portfolio

Concrete Strength Prediction

Overview

The purpose of this project is to predict concrete strength by comparing several models with different algorithms. MAE and R-Squared are used to evaluate the performance of each model.

Data Processing and Modelling Flow

Libraries and Data Importing
Data Preparation and EDA
Modelling
Evaluation

Libraries and Data Importing

Libraries Used

library(tidyverse) # data manipulating
library(olsrr) # outliers plotting and removing
library(ggplot2) # data plotting
library(plotly) # interactive plotting
library(ggthemes) # plot themes
library(corrplot) # correlation plotting
library(caret) # modelling
library(keras) # Neural net. modelling
library(tensorflow) # Neural net. modelling
library(MLmetrics) # model evaluation (MAE & R-squared)
library(randomForest) # Random Forest modelling
library(xgboost) # extreme gradient boosting modelling
library(lmtest) # assumption test
library(car) # assumption test

Data Importing

data <- read.csv("data/data-train.csv")
str(data)

## 'data.frame':    825 obs. of  9 variables:
##  $ cement     : num  540 540 332 332 199 ...
##  $ slag       : num  0 0 142 142 132 ...
##  $ flyash     : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ water      : num  162 162 228 228 192 228 228 228 192 192 ...
##  $ super_plast: num  2.5 2.5 0 0 0 0 0 0 0 0 ...
##  $ coarse_agg : num  1040 1055 932 932 978 ...
##  $ fine_agg   : num  676 676 594 594 826 ...
##  $ age        : int  28 28 270 365 360 365 28 28 90 28 ...
##  $ strength   : num  80 61.9 40.3 41 44.3 ...

head(data)

##   cement  slag flyash water super_plast coarse_agg fine_agg age strength
## 1  540.0   0.0      0   162         2.5     1040.0    676.0  28    79.99
## 2  540.0   0.0      0   162         2.5     1055.0    676.0  28    61.89
## 3  332.5 142.5      0   228         0.0      932.0    594.0 270    40.27
## 4  332.5 142.5      0   228         0.0      932.0    594.0 365    41.05
## 5  198.6 132.4      0   192         0.0      978.4    825.5 360    44.30
## 6  380.0  95.0      0   228         0.0      932.0    594.0 365    43.70

So we have 8 variables suspected to affect the strength of the concrete.

cement : The amount of cement (Kg) in a m3 mixture
slag : The amount of blast furnace slag (Kg) in a m3 mixture
flyash : The amount of fly ash (Kg) in a m3 mixture
water : The amount of water (Kg) in a m3 mixture
super_plast : The amount of Superplasticizer (Kg) in a m3 mixture
coarse_agg : The amount of Coarse Aggreagate (Kg) in a m3 mixture
fine_agg : The amount of Fine Aggreagate (Kg) in a m3 mixture
age : the number of resting days before the compressive strength measurement
strength: Concrete compressive strength measurement in MPa unit.

Data Prepartion and EDA

Data Preparation

NA Checking

data %>%
  is.na() %>% 
  colSums()

##      cement        slag      flyash       water super_plast  coarse_agg 
##           0           0           0           0           0           0 
##    fine_agg         age    strength 
##           0           0           0

There isn’t any NA in the data.

Outliers Checking

Studentized Residuals vs Leverage Plot

In this step, the outliers of the data will be eliminated by plotting the observations into a plot that will seperate the obseations into different zones.

outliers <- lm(strength ~. , data=data)

d <- ols_plot_resid_lev(outliers)

eliminate <- d$leverage$observation
eliminate

##  [1]   1  11  12  32  41  66  69  72  77  93 106 126 298 299 316 318 332
## [18] 391 396 397 414 425 426 600 615

The plot above shows us the observations and its status as the outlier or as the leverage or both. The ones that will be eliminated are those who are the outliers (green points) and those who are the leverages and the outliers at the same time (purple points).

data_no_outliers <- data[-eliminate, ]
data_no_outliers %>% 
  count()

## # A tibble: 1 x 1
##       n
##   <int>
## 1   800

There are 25 observations defined as outliers that’s been removed.

Near Zero Variance

data_no_outliers %>% 
  nearZeroVar()

## integer(0)

There isn’t any near-zero-variance among the variables.

Cross Validation

set.seed(111)
index1 <- sample(nrow(data_no_outliers), nrow(data_no_outliers)*0.8)
index2 <- sample(nrow(data), nrow(data)*0.8)

train1 <- data_no_outliers[index1, ]
test1 <- data_no_outliers[-index1,]
train2 <- data[index2,]
test2 <- data[-index2,]

train3 <- train1 %>% 
  select(-9) %>% 
  scale() %>% 
  data.frame() %>% 
  mutate(strength = train1$strength)

Each data is divided into data train and data test.

Exploratory Data Analysis

Correlation Among Variables

corrplot(cor(data_no_outliers))

strength have correlation to other variables, the highest is correlation with cement and the lowest are correlation with slag flyash and coarse_agg.

ggplot(data_no_outliers, aes(strength, cement)) +
  geom_jitter(aes(col = super_plast, size = age)) + 
  labs(title = "Highest Correlated Variables on Strength", x= "Strength", y= 'Cement')+
  theme(plot.title = element_text(hjust = 0.5))

The plot shows us the strength of the concrete tend to increase by the increasing amount of cement, the amount of super plast and the age of concrete while being measured also doesn’t really affect the concrete.

Variable Range

range_plot <- ggplot(data = data_no_outliers %>% gather(key = "Variable", value = "Value"), aes(Variable, Value)) +
  geom_boxplot(aes(col = Variable))+
  scale_color_calc()+
  theme(axis.text.x = element_text(angle = 90),
        axis.title.x = element_blank())

ggplotly(range_plot)

The variables are not in the same range so I will scale them for Neural Network modelling. There are also some outliers in some variables, I will try to eliminate them and see if it’s going to increase the models’ performance.

Modelling

Multiple Linear Regression

Data Without Outliers

lm.all1 <- lm(strength ~., train1)
lm.none1 <- lm(strength ~ 1, train1)
lm_forward1 <- stats::step(lm.all1,scope = list(lower = lm.none1, upper = lm.all1), direction = "backward")

## Start:  AIC=2906.65
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg + 
##     fine_agg + age
## 
##               Df Sum of Sq   RSS    AIC
## <none>                     58395 2906.6
## - water        1       394 58789 2909.0
## - fine_agg     1       698 59093 2912.3
## - coarse_agg   1       755 59150 2912.9
## - super_plast  1       841 59236 2913.8
## - flyash       1      4999 63394 2957.2
## - slag         1      8411 66806 2990.8
## - cement       1     16409 74804 3063.1
## - age          1     32843 91237 3190.2

summary(lm_forward1)

## 
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast + 
##     coarse_agg + fine_agg + age, data = train1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.2147  -5.9223   0.6697   6.6704  20.5073 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -62.219802  30.626863  -2.032  0.04262 *  
## cement        0.132941   0.009984  13.316  < 2e-16 ***
## slag          0.112282   0.011777   9.534  < 2e-16 ***
## flyash        0.108975   0.014827   7.350 6.18e-13 ***
## water        -0.096174   0.046594  -2.064  0.03942 *  
## super_plast   0.329303   0.109212   3.015  0.00267 ** 
## coarse_agg    0.030700   0.010748   2.856  0.00443 ** 
## fine_agg      0.034172   0.012439   2.747  0.00618 ** 
## age           0.121366   0.006442  18.838  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.62 on 631 degrees of freedom
## Multiple R-squared:  0.6627, Adjusted R-squared:  0.6585 
## F-statistic:   155 on 8 and 631 DF,  p-value: < 2.2e-16

Model using data without outliers returns the adjusted R-squared of 0.6585. It means that arround 65.8% of the observations could be explained by this model.

Data With Outliers

lm.all2 <- lm(strength ~., train2)
lm.none2 <- lm(strength ~ 1, train2)
lm_forward2 <- stats::step(lm.all2,scope = list(lower = lm.none2, upper = lm.all2),direction = "backward")

## Start:  AIC=3080.9
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg + 
##     fine_agg + age
## 
##               Df Sum of Sq    RSS    AIC
## - fine_agg     1         2  68393 3078.9
## - coarse_agg   1        66  68457 3079.5
## <none>                      68391 3080.9
## - super_plast  1      1001  69392 3088.5
## - water        1      1401  69792 3092.3
## - flyash       1      1775  70166 3095.8
## - slag         1      4811  73202 3123.8
## - cement       1     11117  79508 3178.3
## - age          1     33891 102282 3344.5
## 
## Step:  AIC=3078.92
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg + 
##     age
## 
##               Df Sum of Sq    RSS    AIC
## - coarse_agg   1       153  68547 3078.4
## <none>                      68393 3078.9
## - super_plast  1      1019  69412 3086.7
## - flyash       1      4183  72576 3116.1
## - water        1      4223  72616 3116.5
## - slag         1     17794  86187 3229.5
## - age          1     33911 102304 3342.7
## - cement       1     42562 110955 3396.3
## 
## Step:  AIC=3078.4
## strength ~ cement + slag + flyash + water + super_plast + age
## 
##               Df Sum of Sq    RSS    AIC
## <none>                      68547 3078.4
## - super_plast  1       867  69413 3084.7
## - flyash       1      4101  72647 3114.7
## - water        1      6444  74991 3135.7
## - slag         1     18116  86662 3231.2
## - age          1     34242 102788 3343.8
## - cement       1     42858 111405 3396.9

summary(lm_forward2)

## 
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast + 
##     age, data = train2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -29.500  -6.829   0.820   7.100  35.126 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.279867   5.388139   5.249 2.08e-07 ***
## cement       0.104878   0.005190  20.206  < 2e-16 ***
## slag         0.078350   0.005964  13.137  < 2e-16 ***
## flyash       0.060646   0.009703   6.250 7.40e-10 ***
## water       -0.212573   0.027131  -7.835 1.91e-14 ***
## super_plast  0.315416   0.109771   2.873  0.00419 ** 
## age          0.126275   0.006992  18.061  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.25 on 653 degrees of freedom
## Multiple R-squared:  0.6222, Adjusted R-squared:  0.6187 
## F-statistic: 179.2 on 6 and 653 DF,  p-value: < 2.2e-16

Model using data with outliers returns the adjusted R-squared of 0.6187 which means that only arround 61.8% of the observations could be explained by the model.

The data without outliers give the best result with adjusted r-squared of 0.6585. The less significant variables are fine_agg and coarse_agg.

Data Without fine_agg and coarse_agg

lm_final <- lm(formula = strength ~ cement + slag + flyash + water + super_plast + age, data = data_no_outliers)

summary(lm_final)

## 
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast + 
##     age, data = data_no_outliers)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.8319  -6.3289   0.8218   6.7174  20.8786 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 27.807058   4.481772   6.204 8.83e-10 ***
## cement       0.106631   0.004476  23.825  < 2e-16 ***
## slag         0.084388   0.005235  16.119  < 2e-16 ***
## flyash       0.071462   0.008086   8.837  < 2e-16 ***
## water       -0.215335   0.022555  -9.547  < 2e-16 ***
## super_plast  0.259034   0.088313   2.933  0.00345 ** 
## age          0.117982   0.005713  20.650  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.664 on 793 degrees of freedom
## Multiple R-squared:  0.6526, Adjusted R-squared:   0.65 
## F-statistic: 248.3 on 6 and 793 DF,  p-value: < 2.2e-16

The model without fine_agg and coarse_agg doesn’t really affect the model performance.

Assumption Test

Normality Residuals

H0 = Residuals normally distributed

H1 = Residuals not normally distributed

shapiro.test(lm_final$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  lm_final$residuals
## W = 0.99171, p-value = 0.0001849

p-Value < 0.05, reject null hypothesis. The residuals are more likely not-normally-distributed.

Homoscedasticity

H0 = Homoscedastic

H1 = Heteroscedastic

bptest(lm_final)

## 
##  studentized Breusch-Pagan test
## 
## data:  lm_final
## BP = 124.16, df = 6, p-value < 2.2e-16

p-Value < 0.05, reject null hypothesis. The residuals are more likely to be heteroscedastic.

Multicolinearity

vif(lm_final)

##      cement        slag      flyash       water super_plast         age 
##    1.858746    1.717994    2.301958    1.912341    2.396542    1.098473

There isn’t any multicolinearity situation among the variables.

The model created by regression model doesn’t give good result in assumption test of Normality residuals and Homoscedasticity.

Predicting Data Test

predict_regression <- predict(lm_final, test1)

MAE(y_pred = predict_regression, test1$strength)

## [1] 7.82459

R2_Score(y_pred = predict_regression, test1$strength)

## [1] 0.6304002

The Performance of the model after being tested to the data test doesn’t change drastically the performance of the model.

Multiple Linear Regression (Scaled Data)

This section will show you how the performance of the model change if the data is standarized (generating Z-score) before being trained for the model.

scaled <- data_no_outliers %>% 
  select(-9) %>% 
  scale(center = T, scale = T) %>% 
  data.frame() %>% 
  mutate(strength = data_no_outliers$strength)

head(scaled)

##       cement       slag     flyash      water super_plast  coarse_agg
## 1  2.5108298 -0.8456622 -0.8526166 -0.9199052  -0.6193728  1.06734907
## 2  0.5184145  0.8192025 -0.8526166  2.2287599  -1.0365291 -0.53494220
## 3  0.5184145  0.8192025 -0.8526166  2.2287599  -1.0365291 -0.53494220
## 4 -0.7672936  0.7012016 -0.8526166  0.5113062  -1.0365291  0.06949938
## 5  0.9745096  0.2642476 -0.8526166  2.2287599  -1.0365291 -0.53494220
## 6  0.9745096  0.2642476 -0.8526166  2.2287599  -1.0365291 -0.53494220
##     fine_agg        age strength
## 1 -1.2418160 -0.2830425    61.89
## 2 -2.2527707  3.5756309    40.27
## 3 -2.2527707  5.0903994    41.05
## 4  0.6013269  5.0106747    44.30
## 5 -2.2527707  5.0903994    43.70
## 6 -2.2527707 -0.2830425    36.45

lm.all_s <- lm(strength ~., scaled)
lm.none_s <- lm(strength ~ 1, scaled)
lm_forward_s <- stats::step(lm.all_s,scope = list(lower = lm.none_s, upper = lm.all_s),direction = "backward")

## Start:  AIC=3630.74
## strength ~ cement + slag + flyash + water + super_plast + coarse_agg + 
##     fine_agg + age
## 
##               Df Sum of Sq    RSS    AIC
## <none>                      73169 3630.7
## - water        1       609  73779 3635.4
## - fine_agg     1       736  73905 3636.7
## - coarse_agg   1       861  74031 3638.1
## - super_plast  1      1183  74353 3641.6
## - flyash       1      5519  78688 3686.9
## - slag         1     10443  83612 3735.5
## - cement       1     19497  92666 3817.7
## - age          1     39966 113136 3977.4

summary(lm_forward_s)

## 
## Call:
## lm(formula = strength ~ cement + slag + flyash + water + super_plast + 
##     coarse_agg + fine_agg + age, data = scaled)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -27.4446  -6.2657   0.7483   6.8796  20.1230 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  35.4919     0.3400 104.375  < 2e-16 ***
## cement       13.4819     0.9286  14.518  < 2e-16 ***
## slag          9.5479     0.8986  10.625  < 2e-16 ***
## flyash        6.5213     0.8443   7.724 3.42e-14 ***
## water        -2.2418     0.8735  -2.566 0.010461 *  
## super_plast   2.0927     0.5851   3.576 0.000369 ***
## coarse_agg    2.2853     0.7488   3.052 0.002352 ** 
## fine_agg      2.5324     0.8980   2.820 0.004923 ** 
## age           7.4346     0.3577  20.786  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.618 on 791 degrees of freedom
## Multiple R-squared:  0.6568, Adjusted R-squared:  0.6533 
## F-statistic: 189.2 on 8 and 791 DF,  p-value: < 2.2e-16

The scaled data doesn’t really change the performance of the model. The adjusted r-squared is still arround 0.65.

shapiro.test(lm_forward_s$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  lm_forward_s$residuals
## W = 0.99041, p-value = 4.48e-05

bptest(lm_forward_s)

## 
##  studentized Breusch-Pagan test
## 
## data:  lm_forward_s
## BP = 119.43, df = 8, p-value < 2.2e-16

vif(lm_forward_s)

##      cement        slag      flyash       water super_plast  coarse_agg 
##    7.448834    6.974866    6.157367    6.590915    2.957384    4.843735 
##    fine_agg         age 
##    6.965873    1.105007

The residuals of the data is still in a not normally distributed condition and more likely to be heteroscedastic after the data is scaled.

Random Forest

Default Random Forest

set.seed(111)
random_forest1 <- randomForest(strength ~. , train1, importance = T)
random_forest1

## 
## Call:
##  randomForest(formula = strength ~ ., data = train1, importance = T) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 2
## 
##           Mean of squared residuals: 31.26905
##                     % Var explained: 88.44

Model Performance on Data Train

predict_rf_train <- predict(random_forest1, train1)

MAE(y_pred = predict_rf_train, train1$strength)

## [1] 2.317671

R2_Score(y_pred = predict_rf_train, train1$strength)

## [1] 0.9669058

The MAE of the data train is arround 2.3 and the R-squared is arround 0.96.

Model Performance on Data Test

predict_rf_test <- predict(random_forest1, test1)

MAE(y_pred = predict_rf_test, test1$strength)

## [1] 4.023457

R2_Score(y_pred = predict_rf_test, test1$strength)

## [1] 0.8983492

The MAE of the data train is arround 4 and the R-squared is arround 0.89. This model is a little bit overfit.

Repeated Cross-Validation Random Forest

set.seed(111)
ctrl <- trainControl(method = "repeatedcv", number = 4, repeats = 3)
random_forest_kf <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T)

Model Performance on Data Train

predict_rf_kf_train <- predict(random_forest_kf, train1)

MAE(y_pred = predict_rf_kf_train, train1$strength)

## [1] 1.566825

R2_Score(y_pred = predict_rf_kf_train, train1$strength)

## [1] 0.9827747

The MAE of the data train is arround 1.5 and the R-squared is arround 0.98.

Model Performance on Data Test

predict_rf_kf <- predict(random_forest_kf, test1)

MAE(y_pred = predict_rf_kf, test1$strength)

## [1] 3.261497

R2_Score(y_pred = predict_rf_kf, test1$strength)

## [1] 0.9202383

The model is a bit overfit, the MAE and R-Squared decreased when applied to data test. But anyway the performance is still good for the data test.

Random Forest Tunning

Best mtry

set.seed(111)
tuneGrid <- expand.grid(.mtry = c(1 : 10))
ctrl <- trainControl(method = "repeatedcv", number = 4, repeats = 3, search = 'grid')
random_forest2 <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid)
# saveRDS(random_forest2, "rf2.rds")
random_forest2

## Random Forest 
## 
## 640 samples
##   8 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (4 fold, repeated 3 times) 
## Summary of sample sizes: 480, 480, 480, 480, 480, 480, ... 
## Resampling results across tuning parameters:
## 
##   mtry  RMSE      Rsquared   MAE     
##    1    7.958314  0.8153242  6.414368
##    2    6.164759  0.8821002  4.765663
##    3    5.652414  0.8947230  4.325177
##    4    5.484685  0.8973927  4.154928
##    5    5.405860  0.8984021  4.077201
##    6    5.388267  0.8980253  4.046609
##    7    5.409674  0.8965995  4.030283
##    8    5.422302  0.8955358  4.025745
##    9    5.419954  0.8957245  4.028254
##   10    5.434759  0.8951116  4.032395
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was mtry = 6.

The best mtry defined is mtry = 6.

Best maxnodes

store_maxnode <- list()
best_mtry <- random_forest2$bestTune$mtry
tuneGrid <- expand.grid(.mtry = best_mtry)

for (maxnodes in c(1:100)) {
  set.seed(111)
  random_forest_maxnode <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid, maxnodes=maxnodes)
  current_iteration <- toString(maxnodes)
  store_maxnode[[current_iteration]] <- random_forest_maxnode

}

## Warning in nominalTrainWorkflow(x = x, y = y, wts = weights, info =
## trainInfo, : There were missing values in resampled performance measures.

results_mtry <- resamples(store_maxnode)
summary(results_mtry)

## 
## Call:
## summary.resamples(object = results_mtry)
## 
## Models: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100 
## Number of resamples: 12 
## 
## MAE 
##          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 1   12.666669 12.995524 13.193038 13.186415 13.355179 13.752310    0
## 2   10.171481 10.474636 10.693555 10.634693 10.756417 10.923110    0
## 3    9.101885  9.244209  9.405757  9.421281  9.538627  9.939129    0
## 4    8.024406  8.322673  8.367425  8.392554  8.495123  8.705707    0
## 5    7.394535  8.030953  8.107454  8.120594  8.318161  8.503510    0
## 6    7.114220  7.558034  7.780871  7.687996  7.846101  7.887980    0
## 7    6.890933  7.293917  7.421221  7.371506  7.501225  7.680152    0
## 8    6.512300  6.801516  6.946343  6.965048  7.166065  7.345797    0
## 9    6.309483  6.705903  6.920341  6.894946  7.081168  7.345600    0
## 10   6.198228  6.582422  6.824111  6.795517  6.996180  7.268973    0
## 11   6.100529  6.458748  6.622204  6.642086  6.894813  7.093801    0
## 12   6.024407  6.284404  6.455337  6.500283  6.764215  7.004216    0
## 13   5.913935  6.144421  6.316851  6.372718  6.632119  6.893290    0
## 14   5.867660  5.963510  6.183110  6.244747  6.479317  6.669929    0
## 15   5.765542  5.889977  6.100119  6.134167  6.393459  6.556502    0
## 16   5.638974  5.730334  6.020144  6.042091  6.352518  6.489975    0
## 17   5.666873  5.704515  5.954202  6.012070  6.276911  6.497406    0
## 18   5.571789  5.677412  5.929209  5.963620  6.295535  6.384823    0
## 19   5.516792  5.681283  5.873326  5.928576  6.223814  6.386239    0
## 20   5.409041  5.628655  5.806666  5.866561  6.153505  6.298202    0
## 21   5.370908  5.533031  5.740987  5.788124  6.068034  6.198409    0
## 22   5.381251  5.477308  5.654169  5.736114  5.999666  6.136142    0
## 23   5.202445  5.417435  5.612349  5.666648  5.948135  6.047422    0
## 24   5.207579  5.355883  5.530700  5.595757  5.873799  6.002781    0
## 25   5.160788  5.239322  5.440457  5.529261  5.829926  5.968398    0
## 26   5.119127  5.191570  5.385031  5.483363  5.790990  5.917810    0
## 27   5.093141  5.148216  5.350832  5.432488  5.718006  5.860115    0
## 28   5.053126  5.136042  5.295209  5.394125  5.680989  5.898031    0
## 29   4.959091  5.056356  5.241158  5.342099  5.678498  5.807180    0
## 30   4.946212  5.044145  5.239593  5.326086  5.613230  5.879230    0
## 31   4.910383  4.993552  5.121770  5.279710  5.571692  5.791972    0
## 32   4.885233  4.990003  5.168119  5.266257  5.538423  5.771463    0
## 33   4.843924  4.969014  5.145939  5.241639  5.516042  5.720623    0
## 34   4.822969  4.944618  5.123267  5.223775  5.521362  5.697963    0
## 35   4.826840  4.942619  5.101210  5.200237  5.486573  5.625277    0
## 36   4.785507  4.905010  5.071653  5.172401  5.462067  5.619668    0
## 37   4.772202  4.914899  5.056745  5.151731  5.474289  5.547927    0
## 38   4.691883  4.840109  5.017435  5.088899  5.403289  5.498950    0
## 39   4.670342  4.856162  4.976594  5.079156  5.401778  5.471810    0
## 40   4.652541  4.792655  4.960224  5.045054  5.355923  5.489133    0
## 41   4.622214  4.768251  4.915047  5.014754  5.324580  5.430625    0
## 42   4.600108  4.762203  4.901859  4.980907  5.259310  5.415109    0
## 43   4.601816  4.693703  4.873921  4.948189  5.226197  5.369362    0
## 44   4.522711  4.686678  4.828089  4.922851  5.245488  5.353357    0
## 45   4.517616  4.648941  4.818116  4.906494  5.211935  5.400384    0
## 46   4.526070  4.630574  4.785879  4.879258  5.168391  5.314796    0
## 47   4.481192  4.606158  4.742571  4.858899  5.156524  5.306319    0
## 48   4.469115  4.563425  4.726130  4.832116  5.102239  5.291006    0
## 49   4.440228  4.560494  4.717978  4.827503  5.106607  5.277192    0
## 50   4.441386  4.570162  4.731089  4.812412  5.073447  5.263677    0
## 51   4.433728  4.502573  4.683289  4.779823  5.059858  5.206425    0
## 52   4.402053  4.547297  4.689525  4.782247  5.037458  5.220126    0
## 53   4.382385  4.505463  4.665385  4.758336  5.036435  5.227634    0
## 54   4.337971  4.494666  4.658797  4.752005  5.027187  5.213345    0
## 55   4.364542  4.486329  4.647746  4.735595  5.027309  5.180411    0
## 56   4.319522  4.480286  4.592337  4.722420  4.983428  5.189955    0
## 57   4.278988  4.438286  4.624596  4.699841  4.981055  5.170773    0
## 58   4.321579  4.422587  4.563887  4.676461  4.919356  5.151519    0
## 59   4.299203  4.457851  4.590038  4.674686  4.918785  5.099639    0
## 60   4.268393  4.426847  4.581769  4.655625  4.904139  5.136081    0
## 61   4.283657  4.396298  4.520773  4.630035  4.874511  5.090278    0
## 62   4.243713  4.385730  4.567961  4.621459  4.869324  5.050587    0
## 63   4.164378  4.388424  4.499959  4.592910  4.854946  5.043356    0
## 64   4.241563  4.352694  4.502561  4.593531  4.828921  5.043903    0
## 65   4.152874  4.358556  4.501801  4.574758  4.799520  5.051070    0
## 66   4.205855  4.340855  4.442574  4.562868  4.806395  5.056077    0
## 67   4.136898  4.295353  4.469851  4.545895  4.812491  5.081667    0
## 68   4.124913  4.297788  4.441756  4.537675  4.807989  5.044352    0
## 69   4.115238  4.287112  4.471304  4.526133  4.796951  4.991888    0
## 70   4.122742  4.245180  4.432436  4.500125  4.764194  5.028199    0
## 71   4.117454  4.235462  4.405831  4.488677  4.776870  4.945304    0
## 72   4.071259  4.272995  4.412844  4.493588  4.750259  4.955720    0
## 73   4.062165  4.222887  4.400049  4.472445  4.721100  4.982357    0
## 74   4.060909  4.233915  4.396619  4.478708  4.742407  5.041518    0
## 75   4.033545  4.228637  4.415532  4.467001  4.743941  4.962287    0
## 76   4.027037  4.219513  4.365998  4.447076  4.712918  4.970647    0
## 77   4.059415  4.222140  4.375001  4.450156  4.674393  4.953568    0
## 78   4.033601  4.170918  4.368557  4.427638  4.670258  4.913441    0
## 79   3.987496  4.198061  4.309559  4.416280  4.675902  4.958982    0
## 80   4.000992  4.180655  4.315187  4.404320  4.662328  4.922232    0
## 81   3.980601  4.195610  4.334954  4.413239  4.653522  4.949693    0
## 82   3.961377  4.168369  4.315541  4.397979  4.628895  4.945933    0
## 83   4.002679  4.141012  4.297002  4.382038  4.617546  4.902053    0
## 84   3.930606  4.139527  4.303488  4.365889  4.631989  4.956668    0
## 85   3.949089  4.130276  4.325573  4.363868  4.631449  4.875245    0
## 86   3.961272  4.143420  4.300112  4.366095  4.611778  4.847535    0
## 87   3.984316  4.146281  4.277953  4.364538  4.613116  4.918584    0
## 88   3.921470  4.108607  4.262835  4.336548  4.603196  4.875903    0
## 89   3.866333  4.123096  4.285827  4.331684  4.587848  4.860816    0
## 90   3.915706  4.105556  4.268747  4.332452  4.581236  4.888070    0
## 91   3.878066  4.052429  4.246684  4.300459  4.551998  4.875767    0
## 92   3.905276  4.088103  4.228480  4.304828  4.557614  4.877182    0
## 93   3.865533  4.097687  4.250310  4.310552  4.559023  4.870027    0
## 94   3.861083  4.081623  4.239510  4.297851  4.551399  4.832682    0
## 95   3.835763  4.053458  4.218177  4.281541  4.529269  4.827723    0
## 96   3.883920  4.055087  4.192463  4.275389  4.530928  4.877318    0
## 97   3.844942  4.064955  4.171494  4.268695  4.495071  4.875706    0
## 98   3.866718  4.051556  4.226803  4.283218  4.543325  4.871091    0
## 99   3.802983  4.069778  4.201711  4.257006  4.499345  4.774153    0
## 100  3.828103  4.020013  4.155446  4.250378  4.504486  4.831153    0
## 
## RMSE 
##          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 1   15.466636 16.254002 16.582784 16.444122 16.775774 17.010099    0
## 2   12.571988 13.006443 13.117113 13.101487 13.237757 13.471237    0
## 3   11.470620 11.749286 11.870169 11.895124 12.011540 12.502383    0
## 4    9.988759 10.153556 10.350199 10.359161 10.587345 10.767276    0
## 5    9.264779  9.925914 10.157601 10.126326 10.334320 10.623855    0
## 6    8.889302  9.361275  9.731150  9.608509  9.822578 10.007264    0
## 7    8.525405  9.019966  9.227114  9.179015  9.392308  9.667874    0
## 8    8.086174  8.502377  8.724773  8.727429  9.089979  9.159121    0
## 9    7.904711  8.416259  8.667839  8.670389  9.057733  9.195531    0
## 10   7.765302  8.330775  8.622198  8.560066  8.903160  9.027605    0
## 11   7.537720  8.099951  8.420050  8.340163  8.652283  8.830029    0
## 12   7.483404  7.935223  8.183675  8.156572  8.419310  8.595278    0
## 13   7.325306  7.778664  8.004290  7.981177  8.253140  8.434551    0
## 14   7.262570  7.591620  7.806433  7.815439  8.109333  8.303583    0
## 15   7.117733  7.455334  7.688935  7.664960  7.943787  8.129016    0
## 16   7.022705  7.223735  7.616663  7.542355  7.813485  8.065500    0
## 17   7.007242  7.201288  7.572881  7.521876  7.755430  8.090000    0
## 18   6.941957  7.181410  7.549654  7.474403  7.773929  7.951273    0
## 19   6.867721  7.218426  7.515755  7.433732  7.646954  7.994999    0
## 20   6.764491  7.114181  7.432897  7.354881  7.605868  7.837008    0
## 21   6.690131  6.975157  7.334300  7.251381  7.479169  7.703927    0
## 22   6.682376  6.960799  7.178582  7.179703  7.381459  7.635797    0
## 23   6.464127  6.855955  7.100330  7.091095  7.335765  7.603144    0
## 24   6.454549  6.814392  6.992712  7.005822  7.194016  7.503902    0
## 25   6.402850  6.696083  6.861062  6.918686  7.179365  7.425321    0
## 26   6.345678  6.649856  6.802355  6.864159  7.085503  7.377449    0
## 27   6.295976  6.529831  6.774677  6.804983  7.025546  7.353292    0
## 28   6.248834  6.500376  6.708228  6.746628  6.937583  7.307328    0
## 29   6.107021  6.411681  6.636138  6.681598  6.924690  7.293610    0
## 30   6.111920  6.373589  6.659991  6.677138  6.891269  7.283388    0
## 31   6.061892  6.307445  6.565500  6.619847  6.852203  7.261238    0
## 32   6.088675  6.312891  6.629810  6.610806  6.799516  7.225483    0
## 33   5.985415  6.305044  6.583604  6.579447  6.805791  7.124182    0
## 34   5.998075  6.267327  6.575732  6.564938  6.791591  7.183330    0
## 35   5.992544  6.280674  6.507356  6.529146  6.759120  7.125274    0
## 36   5.965409  6.272432  6.445081  6.499174  6.733106  7.071178    0
## 37   5.939549  6.249749  6.460354  6.475161  6.686233  6.998080    0
## 38   5.838409  6.203366  6.413490  6.411872  6.610348  6.966266    0
## 39   5.830851  6.189099  6.350901  6.392440  6.604062  6.937923    0
## 40   5.806353  6.100505  6.318009  6.358219  6.626974  6.895563    0
## 41   5.759565  6.072325  6.271804  6.317879  6.565528  6.893512    0
## 42   5.727898  6.052438  6.258735  6.285608  6.509485  6.842959    0
## 43   5.697110  5.986323  6.220246  6.241770  6.469828  6.788742    0
## 44   5.651271  5.940340  6.205522  6.224073  6.469441  6.838156    0
## 45   5.650803  5.921667  6.201312  6.204523  6.434354  6.811490    0
## 46   5.624203  5.892822  6.139200  6.168747  6.414601  6.721599    0
## 47   5.601953  5.892359  6.093925  6.145005  6.398363  6.730218    0
## 48   5.568893  5.816539  6.080580  6.123008  6.329813  6.800079    0
## 49   5.625418  5.846871  6.043678  6.113604  6.352208  6.745974    0
## 50   5.542810  5.800115  6.054067  6.102584  6.306135  6.745983    0
## 51   5.538706  5.802510  6.079262  6.082011  6.253516  6.736671    0
## 52   5.491540  5.803607  6.032943  6.066724  6.263994  6.661955    0
## 53   5.532536  5.750591  6.000592  6.056855  6.288359  6.743044    0
## 54   5.444690  5.758897  6.014092  6.040774  6.243117  6.728374    0
## 55   5.452044  5.747694  5.959691  6.022185  6.272249  6.686504    0
## 56   5.444681  5.786113  5.939387  6.010935  6.241174  6.666152    0
## 57   5.398209  5.728745  5.976009  6.000873  6.254880  6.694358    0
## 58   5.433470  5.695431  5.907833  5.964497  6.163940  6.667252    0
## 59   5.427512  5.718410  5.928146  5.963239  6.139353  6.580709    0
## 60   5.432328  5.649462  5.903595  5.943438  6.167475  6.633523    0
## 61   5.381898  5.663174  5.870160  5.913015  6.130634  6.545875    0
## 62   5.358244  5.638055  5.911484  5.917433  6.136299  6.555146    0
## 63   5.275057  5.649530  5.838235  5.888343  6.122383  6.540047    0
## 64   5.347438  5.594809  5.837157  5.883788  6.107074  6.516209    0
## 65   5.242110  5.580711  5.829436  5.863451  6.072932  6.574025    0
## 66   5.305702  5.603758  5.784027  5.856073  6.100602  6.548610    0
## 67   5.251391  5.549863  5.829825  5.844826  6.060570  6.596828    0
## 68   5.237867  5.549047  5.786066  5.829910  6.097882  6.527706    0
## 69   5.231058  5.535939  5.831615  5.818870  6.033483  6.464024    0
## 70   5.223185  5.493593  5.766552  5.799466  6.042635  6.527281    0
## 71   5.210201  5.500081  5.766930  5.781435  6.018921  6.471239    0
## 72   5.184857  5.528135  5.745331  5.791621  6.035250  6.447643    0
## 73   5.168853  5.496733  5.759968  5.773773  6.026107  6.496485    0
## 74   5.156172  5.493555  5.748430  5.769290  5.991157  6.541438    0
## 75   5.125492  5.511783  5.729800  5.759447  5.993634  6.470104    0
## 76   5.129924  5.471741  5.712022  5.737176  5.942724  6.473712    0
## 77   5.166867  5.468895  5.694331  5.741809  5.969410  6.430601    0
## 78   5.125097  5.400185  5.692206  5.728364  6.003469  6.377183    0
## 79   5.065146  5.441384  5.670976  5.706058  5.925016  6.475636    0
## 80   5.094669  5.453233  5.684722  5.708589  5.927317  6.392300    0
## 81   5.056865  5.442923  5.680070  5.718640  5.988600  6.481351    0
## 82   5.085210  5.446055  5.684768  5.714282  5.968620  6.446876    0
## 83   5.117250  5.431497  5.640986  5.696038  5.919684  6.422837    0
## 84   5.050299  5.426223  5.657686  5.671361  5.847612  6.466838    0
## 85   5.041862  5.398018  5.679718  5.666406  5.888763  6.408117    0
## 86   5.045781  5.404946  5.659534  5.672369  5.904578  6.370977    0
## 87   5.071333  5.399438  5.639908  5.670839  5.912175  6.437857    0
## 88   4.992232  5.372019  5.626461  5.638184  5.881693  6.357368    0
## 89   4.974990  5.377809  5.656422  5.647612  5.855009  6.370423    0
## 90   5.002926  5.379314  5.658140  5.648100  5.857846  6.411636    0
## 91   4.988343  5.346669  5.616797  5.616146  5.797319  6.403248    0
## 92   4.986712  5.344414  5.603417  5.616645  5.802193  6.390019    0
## 93   4.949694  5.346775  5.602415  5.616462  5.843503  6.375822    0
## 94   4.946363  5.341115  5.604223  5.604187  5.836748  6.304004    0
## 95   4.928236  5.314388  5.583834  5.595709  5.846012  6.318686    0
## 96   4.963340  5.329992  5.577529  5.594812  5.802353  6.389542    0
## 97   4.947491  5.322424  5.555363  5.595398  5.821693  6.425617    0
## 98   4.963774  5.292333  5.589912  5.600757  5.795882  6.406710    0
## 99   4.880196  5.301502  5.561850  5.577153  5.836840  6.300001    0
## 100  4.899368  5.301065  5.516014  5.567853  5.833135  6.354759    0
## 
## Rsquared 
##          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 1          NA        NA        NA       NaN        NA        NA   12
## 2   0.3553758 0.5316139 0.5810859 0.5621771 0.6157512 0.7124356    0
## 3   0.4660146 0.5327102 0.5464086 0.5670108 0.6087630 0.7131239    0
## 4   0.5685701 0.6407787 0.6836624 0.6716929 0.7052046 0.7654066    0
## 5   0.5693533 0.6360218 0.6784487 0.6707867 0.6963010 0.7801829    0
## 6   0.6227102 0.6745742 0.7145986 0.7049032 0.7262817 0.7913541    0
## 7   0.6604722 0.7039195 0.7390618 0.7259546 0.7448411 0.8004765    0
## 8   0.6751406 0.7293157 0.7511868 0.7471083 0.7701440 0.8150852    0
## 9   0.6747781 0.7307053 0.7494750 0.7481385 0.7767587 0.8182324    0
## 10  0.6817835 0.7356853 0.7585851 0.7544928 0.7809418 0.8241716    0
## 11  0.6969897 0.7527282 0.7678425 0.7674374 0.7869087 0.8342393    0
## 12  0.7022342 0.7640442 0.7819859 0.7770574 0.7964124 0.8334445    0
## 13  0.7148654 0.7769868 0.7891231 0.7863389 0.8039888 0.8434326    0
## 14  0.7352540 0.7843988 0.7949653 0.7951334 0.8137987 0.8455897    0
## 15  0.7419473 0.7935337 0.8017165 0.8020167 0.8223545 0.8485637    0
## 16  0.7562613 0.7951692 0.8051739 0.8081283 0.8318270 0.8505556    0
## 17  0.7643783 0.7935163 0.8058581 0.8088436 0.8304203 0.8517947    0
## 18  0.7566673 0.8002028 0.8079664 0.8111317 0.8339958 0.8526439    0
## 19  0.7650095 0.8019798 0.8109141 0.8132694 0.8323420 0.8577975    0
## 20  0.7664534 0.8073062 0.8157949 0.8175678 0.8363295 0.8620077    0
## 21  0.7711929 0.8144358 0.8175462 0.8222931 0.8410253 0.8642500    0
## 22  0.7724564 0.8183531 0.8244841 0.8258674 0.8439814 0.8637715    0
## 23  0.7781906 0.8216239 0.8276440 0.8298632 0.8476516 0.8738470    0
## 24  0.7851034 0.8253068 0.8326339 0.8338968 0.8484324 0.8732278    0
## 25  0.7835278 0.8291534 0.8401447 0.8380401 0.8533038 0.8750347    0
## 26  0.7889409 0.8323175 0.8413771 0.8401928 0.8556624 0.8783109    0
## 27  0.8003230 0.8330826 0.8430215 0.8429856 0.8583631 0.8788945    0
## 28  0.8050977 0.8352228 0.8463349 0.8455258 0.8586528 0.8814809    0
## 29  0.8035712 0.8374606 0.8502221 0.8484514 0.8631018 0.8870025    0
## 30  0.8080709 0.8368105 0.8482472 0.8483802 0.8630566 0.8871600    0
## 31  0.8116047 0.8379638 0.8518008 0.8505822 0.8656588 0.8886664    0
## 32  0.8126005 0.8394751 0.8490356 0.8513775 0.8658621 0.8878082    0
## 33  0.8126689 0.8421628 0.8513778 0.8523644 0.8654877 0.8904840    0
## 34  0.8128223 0.8419093 0.8514302 0.8531193 0.8681557 0.8905296    0
## 35  0.8145160 0.8452042 0.8542208 0.8548890 0.8676141 0.8911974    0
## 36  0.8166559 0.8465039 0.8578832 0.8561236 0.8680520 0.8920782    0
## 37  0.8138574 0.8500415 0.8576397 0.8569988 0.8674724 0.8925557    0
## 38  0.8185384 0.8508302 0.8585171 0.8597384 0.8726464 0.8956457    0
## 39  0.8191536 0.8532040 0.8613150 0.8606550 0.8708940 0.8961393    0
## 40  0.8205846 0.8536683 0.8619955 0.8615665 0.8754886 0.8966585    0
## 41  0.8237950 0.8544560 0.8643843 0.8632358 0.8754188 0.8980824    0
## 42  0.8277752 0.8564187 0.8659423 0.8648431 0.8790691 0.8993575    0
## 43  0.8302300 0.8583346 0.8664968 0.8664717 0.8790840 0.8999528    0
## 44  0.8296866 0.8564740 0.8670205 0.8668535 0.8831138 0.9013633    0
## 45  0.8289437 0.8583172 0.8694492 0.8680568 0.8830744 0.9020106    0
## 46  0.8323994 0.8618853 0.8707390 0.8694861 0.8823526 0.9025898    0
## 47  0.8327267 0.8599934 0.8729769 0.8702459 0.8827439 0.9034120    0
## 48  0.8376071 0.8584855 0.8730320 0.8710765 0.8853444 0.9042117    0
## 49  0.8358789 0.8597220 0.8743291 0.8714436 0.8870161 0.9023614    0
## 50  0.8389547 0.8603616 0.8726333 0.8716439 0.8866779 0.9049917    0
## 51  0.8419517 0.8604495 0.8740084 0.8727245 0.8852330 0.9053249    0
## 52  0.8413996 0.8631122 0.8743077 0.8732416 0.8865595 0.9066726    0
## 53  0.8386256 0.8606646 0.8741618 0.8736754 0.8890823 0.9055890    0
## 54  0.8416552 0.8616950 0.8761151 0.8743615 0.8879243 0.9091747    0
## 55  0.8391514 0.8633617 0.8766670 0.8748452 0.8884414 0.9073924    0
## 56  0.8408018 0.8637007 0.8782611 0.8754938 0.8863970 0.9082039    0
## 57  0.8404610 0.8658167 0.8764874 0.8759836 0.8883902 0.9102767    0
## 58  0.8453413 0.8644829 0.8786615 0.8772460 0.8924827 0.9088652    0
## 59  0.8472079 0.8682529 0.8779231 0.8775193 0.8906943 0.9093083    0
## 60  0.8453384 0.8654515 0.8786645 0.8779595 0.8943880 0.9084916    0
## 61  0.8467237 0.8691419 0.8804080 0.8792525 0.8928542 0.9098727    0
## 62  0.8457729 0.8686723 0.8784291 0.8790338 0.8937238 0.9113696    0
## 63  0.8462661 0.8690397 0.8812381 0.8799722 0.8947076 0.9139931    0
## 64  0.8476681 0.8700031 0.8814034 0.8802404 0.8951376 0.9104881    0
## 65  0.8491696 0.8696865 0.8813780 0.8810536 0.8946896 0.9145454    0
## 66  0.8472686 0.8690959 0.8833176 0.8811315 0.8950914 0.9115304    0
## 67  0.8498139 0.8677762 0.8813530 0.8815968 0.8968548 0.9139532    0
## 68  0.8481227 0.8705064 0.8825581 0.8820567 0.8961643 0.9144121    0
## 69  0.8519540 0.8720514 0.8809492 0.8824080 0.8975689 0.9147929    0
## 70  0.8513292 0.8702583 0.8839389 0.8831761 0.8985530 0.9148137    0
## 71  0.8522718 0.8730012 0.8840907 0.8840232 0.8982276 0.9149224    0
## 72  0.8503925 0.8728253 0.8848462 0.8836068 0.8986651 0.9162657    0
## 73  0.8515365 0.8722780 0.8840341 0.8843985 0.8996749 0.9173734    0
## 74  0.8537785 0.8694639 0.8841254 0.8842204 0.8987510 0.9170192    0
## 75  0.8540693 0.8734242 0.8851601 0.8849116 0.8990751 0.9183854    0
## 76  0.8564032 0.8724323 0.8859806 0.8856975 0.9003750 0.9180545    0
## 77  0.8539086 0.8737777 0.8865691 0.8854249 0.8987792 0.9158453    0
## 78  0.8522086 0.8749575 0.8861577 0.8858395 0.9009660 0.9183126    0
## 79  0.8567629 0.8723976 0.8873079 0.8866617 0.9016161 0.9189873    0
## 80  0.8563860 0.8756476 0.8865698 0.8865471 0.8997373 0.9181920    0
## 81  0.8521307 0.8728108 0.8871073 0.8861656 0.8995245 0.9199105    0
## 82  0.8534849 0.8743344 0.8866648 0.8862607 0.9004343 0.9188570    0
## 83  0.8569996 0.8747999 0.8884931 0.8870344 0.8995704 0.9171836    0
## 84  0.8596946 0.8757804 0.8880005 0.8879764 0.8996355 0.9202597    0
## 85  0.8587317 0.8760967 0.8862931 0.8880883 0.9030902 0.9199343    0
## 86  0.8570934 0.8765749 0.8877665 0.8877778 0.9008505 0.9199563    0
## 87  0.8570892 0.8746729 0.8883795 0.8878677 0.9022920 0.9191473    0
## 88  0.8586336 0.8770482 0.8884854 0.8890743 0.9023103 0.9217356    0
## 89  0.8597849 0.8767616 0.8877296 0.8887483 0.9008787 0.9221986    0
## 90  0.8599164 0.8756657 0.8873177 0.8888206 0.9023069 0.9215736    0
## 91  0.8615889 0.8781280 0.8892474 0.8900939 0.9024703 0.9217348    0
## 92  0.8625955 0.8762253 0.8896783 0.8897324 0.9031238 0.9212640    0
## 93  0.8607586 0.8774487 0.8899641 0.8899865 0.9032064 0.9226214    0
## 94  0.8606682 0.8788526 0.8895659 0.8904931 0.9049399 0.9235878    0
## 95  0.8600912 0.8784980 0.8901526 0.8906271 0.9033033 0.9234815    0
## 96  0.8624230 0.8775882 0.8905696 0.8904766 0.9032791 0.9222995    0
## 97  0.8612935 0.8764298 0.8915705 0.8906040 0.9027021 0.9228581    0
## 98  0.8622274 0.8766053 0.8903308 0.8903151 0.9037348 0.9220020    0
## 99  0.8600106 0.8795097 0.8912030 0.8913267 0.9035608 0.9252951    0
## 100 0.8605627 0.8768052 0.8932481 0.8914790 0.9057399 0.9239863    0

The higher the maxnodes the better the performance. The 10 last maxnodes return almost the same performance. Let’s just try maxnodes of 90

Best nTrees

store_maxtrees <-  list()
for (ntree in c(250, 300, 350, 400, 450, 500, 550, 600, 800, 1000)) {
  set.seed(111)
  random_forest_maxtree <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T,tuneGrid = tuneGrid, ntree = ntree, maxnodes = 90)

  key <- toString(ntree)
  store_maxtrees[[key]] <- random_forest_maxtree

}
results_tree <- resamples(store_maxtrees)
summary(results_tree)

## 
## Call:
## summary.resamples(object = results_tree)
## 
## Models: 250, 300, 350, 400, 450, 500, 550, 600, 800, 1000 
## Number of resamples: 12 
## 
## MAE 
##          Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## 250  3.895912 4.082000 4.285613 4.339462 4.592441 4.933105    0
## 300  3.898656 4.087173 4.288965 4.334791 4.576438 4.921247    0
## 350  3.909302 4.090478 4.280679 4.331180 4.576194 4.903621    0
## 400  3.928810 4.095702 4.273540 4.332917 4.583629 4.892261    0
## 450  3.923761 4.099865 4.266525 4.333419 4.586127 4.884295    0
## 500  3.915706 4.105556 4.268747 4.332452 4.581236 4.888070    0
## 550  3.912079 4.118299 4.266197 4.331660 4.578614 4.881327    0
## 600  3.917555 4.119647 4.266863 4.330920 4.578060 4.873924    0
## 800  3.905900 4.121835 4.260137 4.326028 4.560735 4.877705    0
## 1000 3.913228 4.116296 4.252034 4.323004 4.571270 4.867959    0
## 
## RMSE 
##          Min.  1st Qu.   Median     Mean  3rd Qu.     Max. NA's
## 250  4.994902 5.363187 5.688904 5.666331 5.898793 6.448471    0
## 300  4.983465 5.373398 5.677615 5.658660 5.871945 6.432675    0
## 350  4.985320 5.358902 5.665844 5.649038 5.847199 6.425646    0
## 400  5.013175 5.357417 5.666252 5.650805 5.852934 6.406395    0
## 450  5.012709 5.367422 5.660790 5.648297 5.861439 6.401608    0
## 500  5.002926 5.379314 5.658140 5.648100 5.857846 6.411636    0
## 550  5.000061 5.380619 5.660641 5.647077 5.845445 6.406783    0
## 600  5.002928 5.381998 5.657586 5.644557 5.850677 6.393408    0
## 800  4.999468 5.384560 5.639036 5.637612 5.853339 6.387184    0
## 1000 5.007758 5.380485 5.636893 5.633613 5.844059 6.372587    0
## 
## Rsquared 
##           Min.   1st Qu.    Median      Mean   3rd Qu.      Max. NA's
## 250  0.8577124 0.8746659 0.8859580 0.8880820 0.9021097 0.9219139    0
## 300  0.8591152 0.8752758 0.8864010 0.8883665 0.9016869 0.9221414    0
## 350  0.8605070 0.8755559 0.8869922 0.8887831 0.9022169 0.9218977    0
## 400  0.8602265 0.8755973 0.8869780 0.8886895 0.9025050 0.9210347    0
## 450  0.8598399 0.8759949 0.8872410 0.8887891 0.9025228 0.9211672    0
## 500  0.8599164 0.8756657 0.8873177 0.8888206 0.9023069 0.9215736    0
## 550  0.8606834 0.8760324 0.8871309 0.8888849 0.9024121 0.9216680    0
## 600  0.8601538 0.8764164 0.8872429 0.8889417 0.9026535 0.9215043    0
## 800  0.8596993 0.8764750 0.8882434 0.8892288 0.9026496 0.9214786    0
## 1000 0.8603660 0.8768495 0.8885004 0.8893595 0.9030244 0.9210641    0

Number of trees doesn’t really affect the model performance.

random_forest_kf_final <- caret::train(strength ~., train1 , method = "rf", trControl = ctrl, importance = T, ntree = 500, maxnodes = 100, tuneGrid = tuneGrid)

random_forest_kf_final

## Random Forest 
## 
## 640 samples
##   8 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (4 fold, repeated 3 times) 
## Summary of sample sizes: 480, 480, 480, 480, 480, 480, ... 
## Resampling results:
## 
##   RMSE      Rsquared   MAE     
##   5.581129  0.8890584  4.238691
## 
## Tuning parameter 'mtry' was held constant at a value of 6

predict_rf_kf_train <- predict(random_forest_kf_final, train1)
predict_rf_kf <- predict(random_forest_kf_final, test1)

Model Performance on Data Train

MAE(y_pred = predict_rf_kf_train, train1$strength)

## [1] 2.522225

R2_Score(y_pred = predict_rf_kf_train, train1$strength)

## [1] 0.9609133

Model Performance on Data Test

MAE(y_pred = predict_rf_kf, test1$strength)

## [1] 3.932892

R2_Score(y_pred = predict_rf_kf, test1$strength)

## [1] 0.8997213

The model performance of Repeated Cross-Validation is still better then the tuned Random Forest.

XGBTree (Extra Gradient Boosting)

X_train = xgb.DMatrix(as.matrix(train1 %>% select(-strength)))
y_train = train1$strength
X_test = xgb.DMatrix(as.matrix(test1 %>% select(-strength)))
y_test = test1$strength

xgb_trcontrol = trainControl(
  method = "repeatedcv",
  number = 4,
  repeats = 3,
  allowParallel = TRUE,
  verboseIter = FALSE,
  returnData = FALSE
)

set.seed(111) 
xgb_model = caret::train(
  X_train, y_train,  
  trControl = xgb_trcontrol,
  method = "xgbTree",
  importance = T
)

predict_xgbt_train <- predict(xgb_model, train1)
predict_xgbt <- predict(xgb_model, test1)

Model Performance on Data Train

MAE(y_pred = predict_xgbt_train, train1$strength)

## [1] 1.286544

R2_Score(y_pred = predict_xgbt_train, train1$strength)

## [1] 0.988718

Model Performance on Data Train

MAE(y_pred = predict_xgbt, test1$strength)

## [1] 2.673547

R2_Score(y_pred = predict_xgbt, test1$strength)

## [1] 0.9412361

The model performance is better than the model created by default random forest with k-fold.

Neural Network

Data Preprocess

train_scaled <- train1 %>% 
  select(-9) %>% 
  scale() %>% 
  data.frame() %>% 
  mutate(strength = train1$strength)

test_scaled <- test1 %>% 
  select(-9) %>% 
  scale() %>% 
  data.frame() %>%
   mutate(strength = test1$strength)

train_matrix <- data.matrix(train_scaled)
test_matrix <- data.matrix(test_scaled)

train_x <- train_matrix[,-9]
train_y <- train_matrix[,9] 

test_x <- test_matrix[,-9]
test_y <- test_matrix[,9]

Model Design

model_nn <- keras_model_sequential()

model_nn %>% 
  layer_dense(input_shape = c(8),
              units = 512,
              activation = "relu",
              kernel_regularizer = regularizer_l2(l=0.001)) %>%
  layer_dense(units = 256,
              activation = "relu",
              kernel_regularizer = regularizer_l2(l=0.001)) %>%
  layer_dense(units = 128,
              activation = "relu",
              kernel_regularizer = regularizer_l2(l=0.001)) %>%
  layer_dense(units = 64,
              activation = "relu",
              kernel_regularizer = regularizer_l2(l=0.001)) %>%
  layer_dense( units = 1)


model_nn %>% 
  compile(loss = "mse",
          optimizer_adamax(lr = 0.0005),
          metrics = c("mae"))

hist <- model_nn %>% fit(train_x, 
                         train_y, 
                         epoch = 200)

pred_train <- model_nn %>% 
  predict(train_x)
pred_test <- model_nn %>% 
  predict(test_x)

Model Performance on Data Train

MAE(pred_train,train_y)

## [1] 1.997067

R2_Score(pred_train,train_y)

## [1] 0.973989

Model Performance on Data Test

MAE(pred_test,test_y)

## [1] 3.647482

R2_Score(pred_test,test_y)

## [1] 0.907411

The model performance after being tested to the data test is still not as good as the XGBTree model.

Evaluation

The XGBTree model return the best performance among all created models.

Model Performance on Data Train

predict_xgb_train <- predict(xgb_model, train1)

MAE(y_pred = predict_xgb_train, train1$strength)

## [1] 1.286544

R2_Score(y_pred = predict_xgb_train, train1$strength)

## [1] 0.988718

Model Performance on Data Test

predict_xgb_test <- predict(xgb_model, test1)

MAE(y_pred = predict_xgb_test, test1$strength)

## [1] 2.673547

R2_Score(y_pred = predict_xgb_test, test1$strength)

## [1] 0.9412361

Variable Importance

x <- varImp(xgb_model)
var <- x$importance
var$varname <- colnames(train1[, as.numeric(rownames(var)) +1])
var

##       Overall     varname
## 7 100.0000000         age
## 0  99.3055253      cement
## 3  26.0828847       water
## 1  12.3470518        slag
## 4   9.5222047 super_plast
## 6   8.7157322    fine_agg
## 5   0.5344267  coarse_agg
## 2   0.0000000      flyash

Age , cement, water are the most important variables for predicting the concrete strength.

Other Portfolio

https://owlly.shinyapps.io/demoday/