library(MASS) #for obtaining Boston Housing Data
library(rpart) #for regression trees
library(rpart.plot) #for plotting regression trees
library(ipred) #for bagging
library(randomForest) #for random forest
library(dplyr) #for manipulation
library(gbm) #for boosting
library(tidyverse) #for tidying the data
library(kableExtra) #for table presentationset.seed(2)
sample_index <- sample(nrow(Boston),nrow(Boston)*0.75)
boston_train <- Boston[sample_index,]
boston_test <- Boston[-sample_index,]I would like to predict the median value of owner-occupied homes, medv. Initially I look at how we can regress it on other predictor variables and use Multiple Linear Regression to predict the values.
I have used combination of forward and backward stepwise variable selection method to find the best subsets of variables from 13 variables.
The linear model gives me following varibales that are significant and help in explaining the variability in median housing value.
##fitting linear regression
nullmodel=lm(medv~1, data=boston_train)
fullmodel=lm(medv~., data=boston_train)
#using AIC to find the best subsets of predictors
#using a combination of forward and backward stepwise variable selection
model_step_s <- step(nullmodel, scope=list(lower=nullmodel,
upper=fullmodel), direction='both')## Start: AIC=1686.05
## medv ~ 1
##
## Df Sum of Sq RSS AIC
## + lstat 1 17428.0 14811 1393.3
## + rm 1 15543.6 16696 1438.7
## + indus 1 8413.4 23826 1573.4
## + ptratio 1 6985.8 25253 1595.5
## + tax 1 6911.4 25328 1596.6
## + nox 1 6005.6 26234 1609.9
## + crim 1 5096.6 27143 1622.8
## + age 1 4651.6 27588 1629.0
## + rad 1 4555.2 27684 1630.3
## + zn 1 4260.8 27978 1634.3
## + black 1 3250.9 28988 1647.8
## + dis 1 2238.1 30001 1660.8
## + chas 1 588.4 31651 1681.1
## <none> 32239 1686.0
##
## Step: AIC=1393.27
## medv ~ lstat
##
## Df Sum of Sq RSS AIC
## + rm 1 2993.9 11817 1309.7
## + ptratio 1 1647.8 13164 1350.6
## + dis 1 511.8 14299 1381.9
## + chas 1 298.4 14513 1387.5
## + black 1 219.1 14592 1389.6
## + age 1 217.9 14593 1389.7
## + crim 1 193.7 14618 1390.3
## + tax 1 176.3 14635 1390.7
## + indus 1 136.6 14675 1391.8
## + zn 1 128.3 14683 1392.0
## <none> 14811 1393.3
## + rad 1 15.5 14796 1394.9
## + nox 1 4.5 14807 1395.2
## - lstat 1 17428.0 32239 1686.0
##
## Step: AIC=1309.68
## medv ~ lstat + rm
##
## Df Sum of Sq RSS AIC
## + ptratio 1 1061.5 10756 1276.0
## + crim 1 348.4 11469 1300.3
## + tax 1 310.2 11507 1301.6
## + black 1 309.7 11508 1301.6
## + chas 1 252.4 11565 1303.5
## + dis 1 222.1 11595 1304.5
## + rad 1 142.2 11675 1307.1
## + indus 1 68.0 11749 1309.5
## <none> 11817 1309.7
## + zn 1 45.4 11772 1310.2
## + nox 1 10.7 11807 1311.3
## + age 1 7.6 11810 1311.4
## - rm 1 2993.9 14811 1393.3
## - lstat 1 4878.3 16696 1438.7
##
## Step: AIC=1276.01
## medv ~ lstat + rm + ptratio
##
## Df Sum of Sq RSS AIC
## + dis 1 343.3 10413 1265.7
## + black 1 245.1 10511 1269.3
## + chas 1 239.2 10517 1269.5
## + crim 1 166.6 10589 1272.1
## <none> 10756 1276.0
## + tax 1 45.7 10710 1276.4
## + age 1 31.2 10725 1276.9
## + nox 1 14.1 10742 1277.5
## + zn 1 13.2 10743 1277.5
## + rad 1 0.9 10755 1278.0
## + indus 1 0.0 10756 1278.0
## - ptratio 1 1061.5 11817 1309.7
## - rm 1 2407.6 13164 1350.6
## - lstat 1 3880.4 14636 1390.8
##
## Step: AIC=1265.72
## medv ~ lstat + rm + ptratio + dis
##
## Df Sum of Sq RSS AIC
## + nox 1 497.6 9915 1249.2
## + black 1 334.1 10079 1255.4
## + crim 1 275.0 10138 1257.6
## + tax 1 203.0 10210 1260.3
## + indus 1 187.1 10226 1260.8
## + chas 1 158.1 10255 1261.9
## + zn 1 64.4 10348 1265.4
## + age 1 63.3 10349 1265.4
## <none> 10413 1265.7
## + rad 1 24.7 10388 1266.8
## - dis 1 343.3 10756 1276.0
## - ptratio 1 1182.6 11595 1304.5
## - rm 1 2069.6 12482 1332.4
## - lstat 1 4062.5 14475 1388.6
##
## Step: AIC=1249.16
## medv ~ lstat + rm + ptratio + dis + nox
##
## Df Sum of Sq RSS AIC
## + black 1 204.92 9710.1 1243.2
## + chas 1 199.92 9715.1 1243.4
## + crim 1 192.93 9722.1 1243.7
## + zn 1 61.08 9854.0 1248.8
## <none> 9915.0 1249.2
## + tax 1 25.24 9889.8 1250.2
## + indus 1 24.80 9890.2 1250.2
## + rad 1 18.68 9896.4 1250.4
## + age 1 4.49 9910.6 1251.0
## - nox 1 497.61 10412.7 1265.7
## - dis 1 826.80 10741.8 1277.5
## - ptratio 1 1354.56 11269.6 1295.7
## - rm 1 2030.00 11945.0 1317.8
## - lstat 1 2812.63 12727.7 1341.8
##
## Step: AIC=1243.24
## medv ~ lstat + rm + ptratio + dis + nox + black
##
## Df Sum of Sq RSS AIC
## + chas 1 174.50 9535.6 1238.4
## + crim 1 133.74 9576.4 1240.0
## + zn 1 74.53 9635.6 1242.3
## + rad 1 72.25 9637.9 1242.4
## <none> 9710.1 1243.2
## + indus 1 13.75 9696.4 1244.7
## + age 1 13.25 9696.9 1244.7
## + tax 1 1.89 9708.2 1245.2
## - black 1 204.92 9915.0 1249.2
## - nox 1 368.44 10078.6 1255.4
## - dis 1 800.64 10510.8 1271.3
## - ptratio 1 1275.78 10985.9 1288.0
## - rm 1 2081.27 11791.4 1314.8
## - lstat 1 2590.29 12300.4 1330.9
##
## Step: AIC=1238.37
## medv ~ lstat + rm + ptratio + dis + nox + black + chas
##
## Df Sum of Sq RSS AIC
## + crim 1 119.71 9415.9 1235.6
## + zn 1 74.80 9460.8 1237.4
## + rad 1 62.06 9473.6 1237.9
## <none> 9535.6 1238.4
## + age 1 17.22 9518.4 1239.7
## + indus 1 16.31 9519.3 1239.7
## + tax 1 1.85 9533.8 1240.3
## - chas 1 174.50 9710.1 1243.2
## - black 1 179.51 9715.1 1243.4
## - nox 1 407.76 9943.4 1252.2
## - dis 1 741.75 10277.4 1264.8
## - ptratio 1 1258.30 10793.9 1283.3
## - rm 1 2080.55 11616.2 1311.2
## - lstat 1 2419.79 11955.4 1322.1
##
## Step: AIC=1235.58
## medv ~ lstat + rm + ptratio + dis + nox + black + chas + crim
##
## Df Sum of Sq RSS AIC
## + rad 1 200.99 9214.9 1229.4
## + zn 1 111.59 9304.3 1233.1
## <none> 9415.9 1235.6
## + age 1 23.98 9391.9 1236.6
## + indus 1 17.20 9398.7 1236.9
## + tax 1 7.84 9408.1 1237.3
## - crim 1 119.71 9535.6 1238.4
## - black 1 128.18 9544.1 1238.7
## - chas 1 160.47 9576.4 1240.0
## - nox 1 362.93 9778.8 1247.9
## - dis 1 775.83 10191.7 1263.6
## - ptratio 1 1094.59 10510.5 1275.3
## - lstat 1 2121.41 11537.3 1310.6
## - rm 1 2149.29 11565.2 1311.5
##
## Step: AIC=1229.41
## medv ~ lstat + rm + ptratio + dis + nox + black + chas + crim +
## rad
##
## Df Sum of Sq RSS AIC
## + tax 1 219.59 8995.3 1222.3
## + zn 1 71.17 9143.7 1228.5
## <none> 9214.9 1229.4
## + indus 1 32.28 9182.6 1230.1
## + age 1 12.39 9202.5 1230.9
## - chas 1 133.22 9348.1 1232.8
## - black 1 200.84 9415.8 1235.6
## - rad 1 200.99 9415.9 1235.6
## - crim 1 258.65 9473.6 1237.9
## - nox 1 522.65 9737.6 1248.3
## - dis 1 827.53 10042.4 1260.0
## - ptratio 1 1294.50 10509.4 1277.2
## - rm 1 1857.31 11072.2 1297.0
## - lstat 1 2190.54 11405.5 1308.2
##
## Step: AIC=1222.26
## medv ~ lstat + rm + ptratio + dis + nox + black + chas + crim +
## rad + tax
##
## Df Sum of Sq RSS AIC
## + zn 1 119.28 8876.1 1219.2
## <none> 8995.3 1222.3
## + age 1 12.50 8982.8 1223.7
## + indus 1 0.28 8995.0 1224.2
## - chas 1 107.82 9103.2 1224.8
## - black 1 178.91 9174.2 1227.7
## - tax 1 219.59 9214.9 1229.4
## - crim 1 265.92 9261.3 1231.3
## - nox 1 383.61 9378.9 1236.1
## - rad 1 412.74 9408.1 1237.3
## - dis 1 852.97 9848.3 1254.6
## - ptratio 1 1252.79 10248.1 1269.7
## - rm 1 1676.30 10671.6 1285.0
## - lstat 1 2129.44 11124.8 1300.8
##
## Step: AIC=1219.21
## medv ~ lstat + rm + ptratio + dis + nox + black + chas + crim +
## rad + tax + zn
##
## Df Sum of Sq RSS AIC
## <none> 8876.1 1219.2
## + age 1 4.38 8871.7 1221.0
## + indus 1 0.01 8876.1 1221.2
## - chas 1 106.31 8982.4 1221.7
## - zn 1 119.28 8995.3 1222.3
## - black 1 174.12 9050.2 1224.6
## - tax 1 267.69 9143.7 1228.5
## - crim 1 294.04 9170.1 1229.6
## - nox 1 328.81 9204.9 1231.0
## - rad 1 428.02 9304.1 1235.0
## - ptratio 1 829.04 9705.1 1251.0
## - dis 1 961.09 9837.1 1256.2
## - rm 1 1538.72 10414.8 1277.8
## - lstat 1 2113.60 10989.7 1298.2
(model_summary <- summary(model_step_s))##
## Call:
## lm(formula = medv ~ lstat + rm + ptratio + dis + nox + black +
## chas + crim + rad + tax + zn, data = boston_train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.8197 -2.7675 -0.7004 1.6232 26.5551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.013831 6.097676 5.578 4.72e-08 ***
## lstat -0.532238 0.056934 -9.348 < 2e-16 ***
## rm 3.971642 0.497929 7.976 1.94e-14 ***
## ptratio -0.899235 0.153590 -5.855 1.06e-08 ***
## dis -1.389423 0.220409 -6.304 8.34e-10 ***
## nox -15.703142 4.258821 -3.687 0.000261 ***
## black 0.008873 0.003307 2.683 0.007623 **
## chas 2.379287 1.134825 2.097 0.036712 *
## crim -0.134317 0.038522 -3.487 0.000548 ***
## rad 0.347270 0.082549 4.207 3.26e-05 ***
## tax -0.014500 0.004358 -3.327 0.000967 ***
## zn 0.036694 0.016523 2.221 0.026978 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.918 on 367 degrees of freedom
## Multiple R-squared: 0.7247, Adjusted R-squared: 0.7164
## F-statistic: 87.82 on 11 and 367 DF, p-value: < 2.2e-16
The in-sample MSE comes out to be 24.185
predicted_val <- predict(object = model_step_s, newdata = boston_test)
lin_train_mse <- round((model_summary$sigma)^2,3)
lin_test_mse <- round(mean((predicted_val - boston_test$medv)^2),3)The MSE for test data comes out to be 18.102
I would try to further improve the accuracy using other modelling techniques and compare them on the basis of Root Mean Squared error. Let’s look at Regression Trees first.
boston_rpart <- rpart(formula = medv ~ ., data = boston_train, cp = 0.00001)
plotcp(boston_rpart)printcp(boston_rpart)##
## Regression tree:
## rpart(formula = medv ~ ., data = boston_train, cp = 1e-05)
##
## Variables actually used in tree construction:
## [1] age black crim dis indus lstat nox ptratio
## [9] rm tax
##
## Root node error: 32239/379 = 85.064
##
## n= 379
##
## CP nsplit rel error xerror xstd
## 1 0.44674924 0 1.00000 1.00591 0.094565
## 2 0.16631896 1 0.55325 0.57745 0.057263
## 3 0.06378413 2 0.38693 0.43737 0.050097
## 4 0.05587702 3 0.32315 0.41961 0.051179
## 5 0.03423855 4 0.26727 0.39771 0.048341
## 6 0.02516202 5 0.23303 0.34989 0.042127
## 7 0.01634600 6 0.20787 0.29575 0.037785
## 8 0.01297229 7 0.19152 0.27925 0.038362
## 9 0.00783403 8 0.17855 0.25512 0.037736
## 10 0.00656300 9 0.17072 0.24918 0.037665
## 11 0.00610276 10 0.16415 0.25175 0.039508
## 12 0.00550626 11 0.15805 0.25050 0.039488
## 13 0.00458964 12 0.15255 0.25058 0.039451
## 14 0.00398972 13 0.14796 0.24447 0.039346
## 15 0.00398407 14 0.14397 0.24051 0.039233
## 16 0.00398106 15 0.13998 0.24051 0.039233
## 17 0.00237774 16 0.13600 0.23626 0.039234
## 18 0.00219110 17 0.13362 0.23229 0.039074
## 19 0.00197157 18 0.13143 0.23060 0.038846
## 20 0.00179284 20 0.12749 0.23126 0.038837
## 21 0.00177033 21 0.12570 0.22897 0.038833
## 22 0.00131561 22 0.12393 0.23014 0.038860
## 23 0.00122621 23 0.12261 0.23355 0.039440
## 24 0.00122604 24 0.12138 0.23383 0.039439
## 25 0.00114331 25 0.12016 0.23433 0.039440
## 26 0.00102879 26 0.11901 0.23482 0.039617
## 27 0.00100222 27 0.11799 0.23342 0.039626
## 28 0.00096814 28 0.11698 0.23254 0.039599
## 29 0.00092994 29 0.11602 0.23411 0.039856
## 30 0.00058257 30 0.11509 0.23662 0.041150
## 31 0.00038713 31 0.11450 0.23731 0.041148
## 32 0.00027305 32 0.11412 0.23787 0.041164
## 33 0.00001000 33 0.11384 0.23787 0.041164
The minimum cp value is 0.0017703
boston_rpart <- rpart(formula = medv ~ ., data = boston_train, cp = 0.01)
prp(boston_rpart,digits = 4, extra = 1)boston_train_pred_tree = predict(boston_rpart)
boston_test_pred_tree = predict(boston_rpart, boston_test)
reg_train_mse <- round(mean((boston_train_pred_tree - boston_train$medv)^2),3)
reg_test_mse <- round(mean((boston_test_pred_tree - boston_test$medv)^2),3)The in-sample MSE comes out to be: 15.188 Whereas, the test set MSE comes out to be: 16.155
Linear regression models fail in situations where the relationship between features and outcome is nonlinear or where features interact with each other.
Next, I will use use Bagging which is a general approach that uses bootstrapping in conjunction with any regression model to construct an ensemble.
Bagging models provide several advantages over models that are not bagged. First, bagging effectively reduces the variance of a prediction through its aggregation process. For models that produce an unstable prediction, like regression trees, aggregating over many versions of the training data actually reduces the variance in the prediction and, hence, makes the prediction more stable.
ntree<- c(seq(10, 200, 10))
oob_error<- rep(0, length(ntree))
for(i in 1:length(ntree)){
set.seed(2)
boston.bag<- bagging(medv~., data = boston_train, nbagg=ntree[i])
oob_error[i] <- bagging(medv~., data = boston_train, nbagg=ntree[i], coob=T)$err
}
plot(ntree, oob_error, type = 'l', col=2, lwd=2, xaxt="n")
axis(1, at = ntree, las=1)Building the final model with 70 trees
boston_bag<- bagging(medv~., data = boston_train, nbagg= 70)
boston_train_bag_tree = predict(boston_bag)
boston_bag_pred<- predict(boston_bag, newdata = boston_test)
boston_bag_oob<- bagging(medv~., data = boston_train, coob=T, nbagg= 70)
bag_train_mse <- round(mean((boston_train_bag_tree - boston_train$medv)^2),3)
bag_test_mse <- round(mean((boston_test$medv-boston_bag_pred)^2),3)The in-sample MSE comes out to be - 18.043 Whereas the test-set MSE comes out to be - 8.828
Thus, when compared to a single regression tree, the MSE has significantly reduced.
Another advantage of bagging models is that they can provide their own internal estimate of predictive performance that correlates well with either cross-validation estimates or test set estimates. Thus the OOB estimate, which is the root mean squared error value obtained is - 4.362
The trees in bagging, are not completely independent of each other since all of the original predictors are considered at every split of every tree. Reducing correlation among trees, known as de-correlating trees, is then the next logical step to improving the performance of bagging. Thus, we use Random Forest where trees are built using a random subset of the top k predictors at each split in the tree.
By default, k = P/3.
boston_rf<- randomForest(medv~., data = boston_train, importance=TRUE)
boston_rf_train<- predict(boston_rf)
boston_rf_pred<- predict(boston_rf, boston_test)
#boston_rfwe can see the important variables -
boston_rf$importance ## %IncMSE IncNodePurity
## crim 8.8444623 2063.6917
## zn 0.5836399 213.8538
## indus 7.2368757 2111.4653
## chas 0.4285816 176.8836
## nox 13.2467309 2745.0711
## rm 32.8841101 8848.1181
## age 3.6841547 904.0701
## dis 8.3369999 2147.3566
## rad 1.6011582 327.4526
## tax 3.6387667 975.9978
## ptratio 4.7672099 1447.7767
## black 1.7108403 668.1993
## lstat 59.7962858 8632.2048
We can further see the OOB Error which is MSE for every size of tree considered. We observe that the error is stabilized around 300 trees.
plot(boston_rf$mse, type='l', col=2, lwd=2, xlab = "ntree", ylab = "OOB Error")We can also compare the test set error with the OOB error.
oob.err<- rep(0, 13)
test.err<- rep(0, 13)
for(i in 1:13){
fit<- randomForest(medv~., data = boston_train, mtry=i)
oob.err[i]<- fit$mse[500]
test.err[i]<- mean((boston_test$medv-predict(fit, boston_test))^2)
cat(i, " ")
}## 1 2 3 4 5 6 7 8 9 10 11 12 13
matplot(cbind(test.err, oob.err), pch=15, col = c("red", "blue"),
type = "b", ylab = "MSE", xlab = "mtry")
legend("topright", legend = c("test Error", "OOB Error"),
pch = 15, col = c("red", "blue"))The optimal subset of predictor variables that should be used by each tree is approximately 6.
Final tree after obtaining the tuned parameters-
boston_rf<- randomForest(medv~., data = boston_train, importance=TRUE, ntree = 300, mtry = 6)
boston_rf_train<- predict(boston_rf)
boston_rf_pred<- predict(boston_rf, boston_test)
rf_train_mse <- round(mean((boston_train$medv-boston_rf_train)^2),3)
rf_test_mse <- round(mean((boston_test$medv-boston_rf_pred)^2),3)The MSE of training sample comes out to be: 12.173 The MSE on the test sample comes out to be: 5.538
The minimum OOB error comes out to be: 5.638
so, with Random Forest, a set of independent trees are grown and then a strong ensemble is formed. While it is a great technique to improve the model prediction performance, there is another great technique known as Boosting. Boosting works in a similar way, except that the trees are grown sequentially: each tree is grown using information from previously grown trees. Boosting does not involve bootstrap sampling; instead each tree is fit on a modified version of the original data set.
The motivation for boosting was a procedure that combines the outputs of many “weak” classifiers to produce a powerful “committee.”
boston.boost<- gbm(medv~., data = boston_train, distribution = "gaussian",
n.trees = 10000, shrinkage = 0.01, interaction.depth = 8)
summary(boston.boost)## var rel.inf
## lstat lstat 37.9517771
## rm rm 27.2130163
## dis dis 9.4770881
## nox nox 5.6450559
## crim crim 5.4662302
## age age 4.3412137
## black black 3.2406271
## ptratio ptratio 2.3356085
## tax tax 1.6850969
## indus indus 1.2366370
## rad rad 0.6928438
## chas chas 0.5990241
## zn zn 0.1157814
We observe that lstat is the most important variable here.
We can also visualize how the testing error changes with different number of trees.
ntree<- seq(100, 10000, 100)
predmat<- predict(boston.boost, newdata = boston_test, n.trees = ntree)
err<- apply((predmat-boston_test$medv)^2, 2, mean)
plot(ntree, err, type = 'l', col=2, lwd=2, xlab = "n.trees", ylab = "Test MSE")
abline(h=min(err), lty=2)boston.boost<- gbm(medv~., data = boston_train, distribution = "gaussian",
n.trees = 2000, shrinkage = 0.01, interaction.depth = 8)
summary(boston.boost)## var rel.inf
## lstat lstat 36.67095769
## rm rm 30.97032287
## dis dis 8.75665950
## nox nox 5.39464792
## crim crim 4.85895807
## age age 4.09968917
## black black 2.44298478
## ptratio ptratio 2.31132672
## tax tax 1.76325406
## indus indus 1.30891230
## rad rad 0.81304258
## chas chas 0.53494510
## zn zn 0.07429925
boston.boost.pred.train <- predict(boston.boost, n.trees = 2000)
boston.boost.pred.test <- predict(boston.boost, boston_test, n.trees = 2000)
boost_train_mse <- round(mean((boston_train$medv-boston.boost.pred.train)^2),3)
boost_test_mse <- round(mean((boston_test$medv-boston.boost.pred.test)^2),3)The training set MSE comes out to be: 1.276.
The test-set MSE in this case comes out to be: 5.365
However, gradient boosting machine could be susceptible to over-fitting, since the learner employed—even in its weakly defined learning capacity—is tasked with optimally fitting the gradient. This means that boosting will select the optimal learner at each stage of the algorithm. Despite using weak learners, boosting still employs the greedy strategy of choosing the optimal weak learner at each stage. Although this strategy generates an optimal solution at the current stage, it has the drawbacks of not finding the optimal global model as well as over-fitting the training data.
There are further improvements that can be made upon boosting mechanism.
Linear Regression
At first, I used simple linear regression to predict the variable. I have performed variable selective using step-wise method and used AIC as the measure of variable selection.
Decision Trees
Trees were created using CART to improve the accuracy of prediction. They are best suited when the relationship between the variables are non-linear.
Bagging
While decision trees are easy to interpret, they sometimes cause overfitting. If we try to reduce overfitting (low bias high variance), prediction accuracy is compromised. To improve the prediction accuracy, we use Bootstrap Aggregating, where we bootstrap multiple samples and use them as an ensemble. This thereby helps in reducing the variance.
Random Forest
Bagging helps in reducing variance but since the bootstrap samples are very highly correlated, variance isn’t reduced much. Random Forest provides further improvement by taking a set of de-correlated trees to form ensembles. This helps in reducing variance significantly over Bagging.
Boosting
Boosting is another technique where unlike Random Forests and Bagging, where a set of independent trees are used as an ensemble, boosting models and tries to improve over the existing trees. It models on the residuals of the previously fit trees and tries to improve the accuracy thereby. This method provides us better prediction performance over any other technique.
model = factor(c("Linear Regression", "Decision Tree", "Bagging",
"Random Forest", "Boosting"),
levels=c("Linear Regression", "Decision Tree", "Bagging",
"Random Forest", "Boosting"))
train_mse <- c(lin_train_mse,
reg_train_mse,
bag_train_mse,
rf_train_mse,
boost_train_mse)
test_mse <- c(lin_test_mse,
reg_test_mse,
bag_test_mse,
rf_test_mse,
boost_test_mse)
table <- data.frame(model=model,
train_mse = train_mse,
test_mse = test_mse)
kable(table)| model | train_mse | test_mse |
|---|---|---|
| Linear Regression | 24.185 | 18.102 |
| Decision Tree | 15.188 | 16.155 |
| Bagging | 18.043 | 8.828 |
| Random Forest | 12.173 | 5.538 |
| Boosting | 1.276 | 5.365 |
Thus, we see that the prediction accuracy on the test-set continues to improve as we model using slightly better technique than the previous model and minimum MSE is obtained for Boosting followed by Random Forest and then by Bagging.