Previously on STAT412:
Naive Bayes Classifier
Decision Tree
Random Forest
Today’s focus is on XGBoost
Ensemble Learning
Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models.
We can mention three major kinds of meta-algorithms that aims at combining weak learners:
bagging, that often considers homogeneous weak learners, learns them independently from each other in parallel and combines them following some kind of deterministic averaging process
boosting, that often considers homogeneous weak learners, learns them sequentially in a very adaptative way (a base model depends on the previous ones) and combines them following a deterministic strategy
stacking, that often considers heterogeneous weak learners, learns them in parallel and combines them by training a meta-model to output a prediction based on the different weak models predictions
Boosting
Several supervised machine learning models are founded on a single predictive model (i.e. linear regression, penalized models, naive Bayes, support vector machines). Alternatively, other approaches such as bagging and random forests are built on the idea of building an ensemble of models where each individual model predicts the outcome and then the ensemble simply averages the predicted values. The family of boosting methods is based on a different, constructive strategy of ensemble formation.
The main idea of boosting is to add new models to the ensemble sequentially. At each particular iteration, a new weak, base-learner model is trained with respect to the error of the whole ensemble learnt so far.
Let’s discuss each component of the previous sentence in closer detail because they are important.
Base-learning models: Boosting is a framework that iteratively improves any weak learning model. Many gradient boosting applications allow you to plug in various classes of weak learners at your disposal. In practice however, boosted algorithms almost always use decision trees as the base-learner.
Training weak models: A weak model is one whose error rate is only slightly better than random guessing. The idea behind boosting is that each sequential model builds a simple weak model to slightly improve the remaining errors. With regards to decision trees, shallow trees represent a weak learner. Commonly, trees with only 1-6 splits are used. Combining many weak models (versus strong ones) has a few benefits:
Speed: Constructing weak models is computationally cheap.
Accuracy improvement: Weak models allow the algorithm to learn slowly; making minor adjustments in new areas where it does not perform well. In general, statistical approaches that learn slowly tend to perform well.
Avoids overfitting: Due to making only small incremental improvements with each model in the ensemble, this allows us to stop the learning process as soon as overfitting has been detected (typically by using cross-validation).
Note: In general, in terms of model performance, we have the following hierarchy:
\[ Boosting > Random Forest > Bagging > Single Tree \] There are several boosting techniques such as AdaBoost, Gradient Boosting, XGBoost, LightGBM. In this part of the recitation, we’ll focus on mainly XGBoost.
XGBoost
XGBoost stands for Extreme Gradient Boosting; it is a specific implementation of the Gradient Boosting method which uses more accurate approximations to find the best tree model. It employs a number of nifty tricks that make it exceptionally successful, particularly with structured data. The most important is computing second-order gradients, i.e. second partial derivatives of the loss function (similar to Newton’s method), which provides more information about the direction of gradients and how to get to the minimum of our loss function. While regular gradient boosting uses the loss function of our base model (e.g. decision tree) as a proxy for minimizing the error of the overall model, XGBoost uses the 2nd order derivative as an approximation.
And advanced regularization (L1 & L2), which improves model generalization.
XGBoost has additional advantages: training is very fast and can be parallelized / distributed across clusters.
Analysis
XGBoost for Classification
## Warning: package 'tidyverse' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 777
## Columns: 18
## $ Private <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes…
## $ Apps <dbl> 1660, 2186, 1428, 417, 193, 587, 353, 1899, 1038, 582, 173…
## $ Accept <dbl> 1232, 1924, 1097, 349, 146, 479, 340, 1720, 839, 498, 1425…
## $ Enroll <dbl> 721, 512, 336, 137, 55, 158, 103, 489, 227, 172, 472, 484,…
## $ Top10perc <dbl> 23, 16, 22, 60, 16, 38, 17, 37, 30, 21, 37, 44, 38, 44, 23…
## $ Top25perc <dbl> 52, 29, 50, 89, 44, 62, 45, 68, 63, 44, 75, 77, 64, 73, 46…
## $ F.Undergrad <dbl> 2885, 2683, 1036, 510, 249, 678, 416, 1594, 973, 799, 1830…
## $ P.Undergrad <dbl> 537, 1227, 99, 63, 869, 41, 230, 32, 306, 78, 110, 44, 638…
## $ Outstate <dbl> 7440, 12280, 11250, 12960, 7560, 13500, 13290, 13868, 1559…
## $ Room.Board <dbl> 3300, 6450, 3750, 5450, 4120, 3335, 5720, 4826, 4400, 3380…
## $ Books <dbl> 450, 750, 400, 450, 800, 500, 500, 450, 300, 660, 500, 400…
## $ Personal <dbl> 2200, 1500, 1165, 875, 1500, 675, 1500, 850, 500, 1800, 60…
## $ PhD <dbl> 70, 29, 53, 92, 76, 67, 90, 89, 79, 40, 82, 73, 60, 79, 36…
## $ Terminal <dbl> 78, 30, 66, 97, 72, 73, 93, 100, 84, 41, 88, 91, 84, 87, 6…
## $ S.F.Ratio <dbl> 18.1, 12.2, 12.9, 7.7, 11.9, 9.4, 11.5, 13.7, 11.3, 11.5, …
## $ perc.alumni <dbl> 12, 16, 30, 37, 2, 11, 26, 37, 23, 15, 31, 41, 21, 32, 26,…
## $ Expend <dbl> 7041, 10527, 8735, 19016, 10922, 9727, 8861, 11487, 11644,…
## $ Grad.Rate <dbl> 60, 56, 54, 59, 15, 55, 63, 73, 80, 52, 73, 76, 74, 68, 55…
You can see some details about the dataset. It has 777 observations and 18 variables where 17 of them numeric and one of them categorical variable which is our response variable.
There are many packages in R to apply XGBoost algorithm. In this
example, we’ll use xgboost for XGBoost.
This dataset is very small to not make the R package too heavy, however XGBoost is built to manage huge datasets very efficiently.
Before the application part, let’s divide our data into two sets.
## Warning: package 'caret' was built under R version 4.3.3
## Zorunlu paket yükleniyor: lattice
##
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
##
## lift
# Partition into training and test data
set.seed(42)
index <- createDataPartition(ml_data$Private, p = 0.8, list = FALSE)
train_data <- ml_data[index, ]
test_data <- ml_data[-index, ]XGBoost only works with numeric vectors. Therefore, you need to convert all other forms of data into numeric vectors if they exist. Luckily, we have all numeric features, except the response. Now, we can train our model. Be careful! This is the most critical part of the process for the quality of our model.
The easiest way to work with xgboost is with the
xgboost() function. The four most important arguments to
give are
data: a matrix of the training data
label: the response variable in numeric format (for binary classification 0 & 1)
objective: defines what learning task should be trained, here binary classification If you have a regression problem, reg:squarederror. If you are interested in multiclass classification, it should be multisoft:prob
nrounds: number of boosting iterations
max.depth = depth of the tree
nthread = the number of CPU threads we are going to use
We know that XGBoost, like most other algorithms, works best when its parameters are hypertuned for optimal performance. The algorithm requires that we define the booster, objective, learning rate, and other parameters. You’ll need to spend most of your time in this step; it’s imperative that you understand your data and use cross-validation.
## Warning: package 'xgboost' was built under R version 4.3.3
##
## Attaching package: 'xgboost'
## The following object is masked from 'package:dplyr':
##
## slice
set.seed(1)
xgboost_model <- xgboost(data = as.matrix(train_data[, -1]),
label = as.numeric(train_data$Private)-1,
max_depth = 3,
objective = "binary:logistic",
nrounds = 10,
verbose = FALSE)
xgboost_model## ##### xgb.Booster
## raw: 15.2 Kb
## call:
## xgb.train(params = params, data = dtrain, nrounds = nrounds,
## watchlist = watchlist, verbose = verbose, print_every_n = print_every_n,
## early_stopping_rounds = early_stopping_rounds, maximize = maximize,
## save_period = save_period, save_name = save_name, xgb_model = xgb_model,
## callbacks = callbacks, max_depth = 3, objective = "binary:logistic")
## params (as set within xgb.train):
## max_depth = "3", objective = "binary:logistic", validate_parameters = "TRUE"
## xgb.attributes:
## niter
## callbacks:
## cb.evaluation.log()
## # of features: 17
## niter: 10
## nfeatures : 17
## evaluation_log:
## iter train_logloss
## <num> <num>
## 1 0.4829681
## 2 0.3649178
## ---
## 9 0.1129882
## 10 0.1020342
With predict(), we can use this model to make predictions on test data. Here, I’ll be feeding this directly to the confusionMatrix function.
data<-predict(xgboost_model, as.matrix(test_data[,-1]),type="response")
data<-as.factor(ifelse(data>0.5,1,0))## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 33 6
## 1 9 107
##
## Accuracy : 0.9032
## 95% CI : (0.8454, 0.9448)
## No Information Rate : 0.729
## P-Value [Acc > NIR] : 7.156e-08
##
## Kappa : 0.7494
##
## Mcnemar's Test P-Value : 0.6056
##
## Sensitivity : 0.7857
## Specificity : 0.9469
## Pos Pred Value : 0.8462
## Neg Pred Value : 0.9224
## Prevalence : 0.2710
## Detection Rate : 0.2129
## Detection Prevalence : 0.2516
## Balanced Accuracy : 0.8663
##
## 'Positive' Class : 0
##
After constructing your model in xgboost, you can draw a variable importance plot (VIP). The purpose of this function is to easily represent the importance of each feature of a model. There are several functions from different package for this plot.
#view variable importance plot
mat<-xgb.importance (feature_names = colnames(train_data[, -1]),model = xgboost_model)
xgb.plot.importance (mat)As you see F.Undergrad is the most important feature in xgboost algorithm. On the other hand,Top25Perc-Accept-Personal are not that important. Thus, you can rebuild your model by omitting those unimportant features.
The parameter tunning in xgboost is time consuming. You can consider alternative packages for xgboost models. The simple example is shown below.
| library(caret) param_grid <- expand.grid(nrounds = c(10,12), #Specifies the number of boosting iterations. max_depth = c(3, 4), #Determines the maximum depth of each decision tree. |
| eta = c(0.01, 0.1), #Specifies the learning rate. This parameter controls how much the weights are updated in each boosting iteration. |
| gamma = c(0, 0.1, 0.2), #Specifies the minimum loss reduction required to make a further partition on a leaf node of the tree. |
| colsample_bytree = c(0.7), #Specifies the fraction of features (columns) to be randomly sampled for each tree. |
| min_child_weight = c(1,5), #Specifies the minimum sum of instance weight (hessian) needed in a child node. |
| subsample = c(0.7) #Specifies the fraction of samples (observations) to be randomly sampled for each tree. ) |
| xgb_train_control <- trainControl(method = “cv”, number = 5) |
| xgb_model <- train( x = as.matrix(train_data[, -1]),y = as.numeric(train_data$Private) - 1,method = “xgbTree”, trControl = xgb_train_control,tuneGrid = param_grid) |
| print(xgb_model$bestTune) |
XgBoost for Regression
Let’s consider Prestige data that we have studied in the previous recitation. Lets remember the data set.
## Zorunlu paket yükleniyor: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
| education | income | women | prestige | census | type | |
|---|---|---|---|---|---|---|
| gov.administrators | 13.11 | 12351 | 11.16 | 68.8 | 1113 | prof |
| general.managers | 12.26 | 25879 | 4.02 | 69.1 | 1130 | prof |
| accountants | 12.77 | 9271 | 15.70 | 63.4 | 1171 | prof |
| purchasing.officers | 11.42 | 8865 | 9.11 | 56.8 | 1175 | prof |
| chemists | 14.62 | 8403 | 11.68 | 73.5 | 2111 | prof |
| physicists | 15.64 | 11030 | 5.13 | 77.6 | 2113 | prof |
## [1] 102 6
The data set has 102 observations and 6 variables. You can see the name of the variables and their corresponding class below.
Data dictionary:
education: average education of occupational incumbents, years, in 1971
income: average income of incumbents, dollars, in 1971
women: percentage of incumbents who are women
prestige: Pineo-Porter prestige score for occupation, from a social survey conducted in the mid- 1960s
census: Canadian Census occupational code
type: a factor for type of occupation with levels bc (Blue Collar), prof (Professional, Managerial and Technical) and wc (White Collar)
## 'data.frame': 102 obs. of 6 variables:
## $ education: num 13.1 12.3 12.8 11.4 14.6 ...
## $ income : int 12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
## $ women : num 11.16 4.02 15.7 9.11 11.68 ...
## $ prestige : num 68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
## $ census : int 1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
## $ type : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
## education income women prestige
## Min. : 6.380 Min. : 611 Min. : 0.000 Min. :14.80
## 1st Qu.: 8.445 1st Qu.: 4106 1st Qu.: 3.592 1st Qu.:35.23
## Median :10.540 Median : 5930 Median :13.600 Median :43.60
## Mean :10.738 Mean : 6798 Mean :28.979 Mean :46.83
## 3rd Qu.:12.648 3rd Qu.: 8187 3rd Qu.:52.203 3rd Qu.:59.27
## Max. :15.970 Max. :25879 Max. :97.510 Max. :87.20
## census type
## Min. :1113 bc :44
## 1st Qu.:3120 prof:31
## Median :5135 wc :23
## Mean :5402 NA's: 4
## 3rd Qu.:8312
## Max. :9517
We have some missing values and we’ll ignore them for this study.
## education income women prestige
## Min. : 6.380 Min. : 1656 Min. : 0.000 Min. :17.30
## 1st Qu.: 8.445 1st Qu.: 4250 1st Qu.: 3.268 1st Qu.:35.38
## Median :10.605 Median : 6036 Median :14.475 Median :43.60
## Mean :10.795 Mean : 6939 Mean :28.986 Mean :47.33
## 3rd Qu.:12.755 3rd Qu.: 8226 3rd Qu.:52.203 3rd Qu.:59.90
## Max. :15.970 Max. :25879 Max. :97.510 Max. :87.20
## census type
## Min. :1113 bc :44
## 1st Qu.:3116 prof:31
## Median :5132 wc :23
## Mean :5400
## 3rd Qu.:8328
## Max. :9517
Missing values are omitted.
Now, we can partition the data into train and test sets.
## [1] 98 6
set.seed(123)
trainindex<-sample(1:98,round(0.8*98))
train<-data[trainindex,]
test<-data[-trainindex,]Now, we are ready to build a boosting model. Please note that you have three alternatives for xgboosting in caret package. You have ‘xgbDART’,‘xgbLinear’ and ‘xgbTree’. All of them can be used for either classification or regression problem.
In this tutorial, let’s move on with xgbTree.
| model | parameter | label | forReg | forClass | probModel |
|---|---|---|---|---|---|
| xgbTree | nrounds | # Boosting Iterations | TRUE | TRUE | TRUE |
| xgbTree | max_depth | Max Tree Depth | TRUE | TRUE | TRUE |
| xgbTree | eta | Shrinkage | TRUE | TRUE | TRUE |
| xgbTree | gamma | Minimum Loss Reduction | TRUE | TRUE | TRUE |
| xgbTree | colsample_bytree | Subsample Ratio of Columns | TRUE | TRUE | TRUE |
| xgbTree | min_child_weight | Minimum Sum of Instance Weight | TRUE | TRUE | TRUE |
| xgbTree | subsample | Subsample Percentage | TRUE | TRUE | TRUE |
The function trainControl generates parameters that further control how models are created, with possible values:
train.control <- trainControl( method = "repeatedcv", repeats = 3, number = 10, search = 'grid')
train.control## $method
## [1] "repeatedcv"
##
## $number
## [1] 10
##
## $repeats
## [1] 3
##
## $search
## [1] "grid"
##
## $p
## [1] 0.75
##
## $initialWindow
## NULL
##
## $horizon
## [1] 1
##
## $fixedWindow
## [1] TRUE
##
## $skip
## [1] 0
##
## $verboseIter
## [1] FALSE
##
## $returnData
## [1] TRUE
##
## $returnResamp
## [1] "final"
##
## $savePredictions
## [1] FALSE
##
## $classProbs
## [1] FALSE
##
## $summaryFunction
## function (data, lev = NULL, model = NULL)
## {
## if (is.character(data$obs))
## data$obs <- factor(data$obs, levels = lev)
## postResample(data[, "pred"], data[, "obs"])
## }
## <bytecode: 0x00000283e00695b0>
## <environment: namespace:caret>
##
## $selectionFunction
## [1] "best"
##
## $preProcOptions
## $preProcOptions$thresh
## [1] 0.95
##
## $preProcOptions$ICAcomp
## [1] 3
##
## $preProcOptions$k
## [1] 5
##
## $preProcOptions$freqCut
## [1] 19
##
## $preProcOptions$uniqueCut
## [1] 10
##
## $preProcOptions$cutoff
## [1] 0.9
##
##
## $sampling
## NULL
##
## $index
## NULL
##
## $indexOut
## NULL
##
## $indexFinal
## NULL
##
## $timingSamps
## [1] 0
##
## $predictionBounds
## [1] FALSE FALSE
##
## $seeds
## [1] NA
##
## $adaptive
## $adaptive$min
## [1] 5
##
## $adaptive$alpha
## [1] 0.05
##
## $adaptive$method
## [1] "gls"
##
## $adaptive$complete
## [1] TRUE
##
##
## $trim
## [1] FALSE
##
## $allowParallel
## [1] TRUE
## eXtreme Gradient Boosting
##
## 78 samples
## 5 predictor
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 70, 70, 70, 70, 70, 70, ...
## Resampling results across tuning parameters:
##
## eta max_depth colsample_bytree subsample nrounds RMSE Rsquared
## 0.3 1 0.6 0.50 50 7.752333 0.8370037
## 0.3 1 0.6 0.50 100 8.015271 0.8199954
## 0.3 1 0.6 0.50 150 8.187536 0.8123232
## 0.3 1 0.6 0.75 50 7.552155 0.8474650
## 0.3 1 0.6 0.75 100 7.700261 0.8395046
## 0.3 1 0.6 0.75 150 7.885341 0.8298171
## 0.3 1 0.6 1.00 50 7.562948 0.8416019
## 0.3 1 0.6 1.00 100 7.776726 0.8329751
## 0.3 1 0.6 1.00 150 7.988729 0.8235724
## 0.3 1 0.8 0.50 50 7.648696 0.8353407
## 0.3 1 0.8 0.50 100 7.957648 0.8210009
## 0.3 1 0.8 0.50 150 8.170910 0.8127447
## 0.3 1 0.8 0.75 50 7.557307 0.8415985
## 0.3 1 0.8 0.75 100 7.857606 0.8286172
## 0.3 1 0.8 0.75 150 8.189405 0.8137349
## 0.3 1 0.8 1.00 50 7.549337 0.8424114
## 0.3 1 0.8 1.00 100 7.878907 0.8285814
## 0.3 1 0.8 1.00 150 8.086710 0.8196157
## 0.3 2 0.6 0.50 50 7.869123 0.8345575
## 0.3 2 0.6 0.50 100 8.133372 0.8203450
## 0.3 2 0.6 0.50 150 8.257531 0.8146044
## 0.3 2 0.6 0.75 50 7.766373 0.8349972
## 0.3 2 0.6 0.75 100 8.082333 0.8214051
## 0.3 2 0.6 0.75 150 8.171637 0.8185383
## 0.3 2 0.6 1.00 50 7.972773 0.8286888
## 0.3 2 0.6 1.00 100 8.215664 0.8197436
## 0.3 2 0.6 1.00 150 8.316742 0.8157639
## 0.3 2 0.8 0.50 50 7.718295 0.8393723
## 0.3 2 0.8 0.50 100 7.875644 0.8293285
## 0.3 2 0.8 0.50 150 8.016627 0.8274856
## 0.3 2 0.8 0.75 50 7.586260 0.8407252
## 0.3 2 0.8 0.75 100 7.795302 0.8320091
## 0.3 2 0.8 0.75 150 7.932628 0.8258335
## 0.3 2 0.8 1.00 50 7.818485 0.8333674
## 0.3 2 0.8 1.00 100 8.072236 0.8209317
## 0.3 2 0.8 1.00 150 8.156141 0.8171898
## 0.3 3 0.6 0.50 50 7.720348 0.8379868
## 0.3 3 0.6 0.50 100 7.961974 0.8287066
## 0.3 3 0.6 0.50 150 8.025675 0.8257836
## 0.3 3 0.6 0.75 50 8.057500 0.8211377
## 0.3 3 0.6 0.75 100 8.185507 0.8144843
## 0.3 3 0.6 0.75 150 8.228133 0.8128665
## 0.3 3 0.6 1.00 50 7.800971 0.8312237
## 0.3 3 0.6 1.00 100 7.842201 0.8293366
## 0.3 3 0.6 1.00 150 7.858624 0.8283794
## 0.3 3 0.8 0.50 50 8.160567 0.8181445
## 0.3 3 0.8 0.50 100 8.300231 0.8138446
## 0.3 3 0.8 0.50 150 8.392770 0.8103206
## 0.3 3 0.8 0.75 50 7.625124 0.8439644
## 0.3 3 0.8 0.75 100 7.761710 0.8378224
## 0.3 3 0.8 0.75 150 7.770609 0.8375130
## 0.3 3 0.8 1.00 50 7.792008 0.8309176
## 0.3 3 0.8 1.00 100 7.858128 0.8286100
## 0.3 3 0.8 1.00 150 7.876888 0.8278376
## 0.4 1 0.6 0.50 50 7.949183 0.8362333
## 0.4 1 0.6 0.50 100 8.097466 0.8301564
## 0.4 1 0.6 0.50 150 8.436288 0.8178521
## 0.4 1 0.6 0.75 50 7.743761 0.8270844
## 0.4 1 0.6 0.75 100 7.884670 0.8203779
## 0.4 1 0.6 0.75 150 8.205818 0.8093017
## 0.4 1 0.6 1.00 50 8.057334 0.8232600
## 0.4 1 0.6 1.00 100 8.342742 0.8088843
## 0.4 1 0.6 1.00 150 8.543378 0.7998819
## 0.4 1 0.8 0.50 50 8.172878 0.8248439
## 0.4 1 0.8 0.50 100 8.273292 0.8158022
## 0.4 1 0.8 0.50 150 8.511259 0.8037771
## 0.4 1 0.8 0.75 50 7.796722 0.8284525
## 0.4 1 0.8 0.75 100 8.275050 0.8071278
## 0.4 1 0.8 0.75 150 8.500491 0.7966827
## 0.4 1 0.8 1.00 50 7.685630 0.8354141
## 0.4 1 0.8 1.00 100 7.968479 0.8239433
## 0.4 1 0.8 1.00 150 8.196255 0.8144187
## 0.4 2 0.6 0.50 50 8.428225 0.8030814
## 0.4 2 0.6 0.50 100 8.779375 0.7909881
## 0.4 2 0.6 0.50 150 8.953197 0.7795901
## 0.4 2 0.6 0.75 50 7.975749 0.8186646
## 0.4 2 0.6 0.75 100 8.108609 0.8123720
## 0.4 2 0.6 0.75 150 8.214081 0.8079324
## 0.4 2 0.6 1.00 50 7.961918 0.8184069
## 0.4 2 0.6 1.00 100 8.166591 0.8091396
## 0.4 2 0.6 1.00 150 8.225765 0.8056856
## 0.4 2 0.8 0.50 50 8.161019 0.8220064
## 0.4 2 0.8 0.50 100 8.429348 0.8084968
## 0.4 2 0.8 0.50 150 8.606653 0.8024146
## 0.4 2 0.8 0.75 50 7.936306 0.8202314
## 0.4 2 0.8 0.75 100 8.110384 0.8119163
## 0.4 2 0.8 0.75 150 8.173843 0.8096364
## 0.4 2 0.8 1.00 50 7.895045 0.8284976
## 0.4 2 0.8 1.00 100 8.062923 0.8226782
## 0.4 2 0.8 1.00 150 8.140647 0.8194252
## 0.4 3 0.6 0.50 50 7.821404 0.8377507
## 0.4 3 0.6 0.50 100 7.932114 0.8325645
## 0.4 3 0.6 0.50 150 8.076532 0.8254944
## 0.4 3 0.6 0.75 50 7.769051 0.8284678
## 0.4 3 0.6 0.75 100 7.872527 0.8239171
## 0.4 3 0.6 0.75 150 7.884213 0.8232936
## 0.4 3 0.6 1.00 50 7.799050 0.8263725
## 0.4 3 0.6 1.00 100 7.845549 0.8239333
## 0.4 3 0.6 1.00 150 7.852862 0.8235760
## 0.4 3 0.8 0.50 50 8.104270 0.8304077
## 0.4 3 0.8 0.50 100 8.205482 0.8252899
## 0.4 3 0.8 0.50 150 8.265548 0.8231424
## 0.4 3 0.8 0.75 50 7.957764 0.8248469
## 0.4 3 0.8 0.75 100 8.022601 0.8199958
## 0.4 3 0.8 0.75 150 7.996814 0.8215958
## 0.4 3 0.8 1.00 50 7.894760 0.8266699
## 0.4 3 0.8 1.00 100 7.918315 0.8260777
## 0.4 3 0.8 1.00 150 7.923373 0.8259441
## MAE
## 6.569946
## 6.699075
## 6.865963
## 6.461797
## 6.493482
## 6.565247
## 6.538796
## 6.585536
## 6.663360
## 6.572011
## 6.699939
## 6.845253
## 6.351256
## 6.501742
## 6.745407
## 6.549843
## 6.711613
## 6.817804
## 6.629649
## 6.750524
## 6.845219
## 6.537164
## 6.707817
## 6.722335
## 6.750328
## 6.845069
## 6.904649
## 6.410945
## 6.461442
## 6.553715
## 6.300494
## 6.423482
## 6.503863
## 6.547676
## 6.615540
## 6.635876
## 6.372196
## 6.515309
## 6.574282
## 6.588381
## 6.662916
## 6.682900
## 6.497673
## 6.468126
## 6.470875
## 6.725149
## 6.791477
## 6.890304
## 6.240829
## 6.301835
## 6.318129
## 6.414138
## 6.439506
## 6.449135
## 6.776529
## 6.889290
## 7.106158
## 6.520009
## 6.555447
## 6.742598
## 6.977977
## 7.103711
## 7.181817
## 6.827643
## 6.840188
## 6.966932
## 6.631681
## 6.980508
## 7.094701
## 6.521408
## 6.675830
## 6.775570
## 6.999318
## 7.242151
## 7.418802
## 6.629591
## 6.701313
## 6.830977
## 6.605572
## 6.679783
## 6.720112
## 6.716098
## 6.937560
## 7.083846
## 6.647584
## 6.802193
## 6.852267
## 6.408433
## 6.462858
## 6.505561
## 6.513028
## 6.597156
## 6.685818
## 6.412956
## 6.473415
## 6.489057
## 6.446347
## 6.439062
## 6.443325
## 6.730861
## 6.811883
## 6.887423
## 6.455723
## 6.498299
## 6.496352
## 6.471984
## 6.477547
## 6.480408
##
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning
## parameter 'min_child_weight' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 50, max_depth = 1, eta
## = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample
## = 1.
Let’s calculate MSE on test data.
## Warning in pred - test[, 4]: uzun olan nesne uzunluğu kısa olan nesne
## uzunluğunun bir katı değil
## [1] "MSE for XGBoost: 472.222"
Remember that MSE for Random Forest was 468.735 and the MSE for linear regression was 483.118, obtained in the previous recitation. Thus, Random Forest outperforms the XGBoost. However, if we tune hyperparameters, we may get better results than random forest.