Previously on STAT412:

  • Naive Bayes Classifier

  • Decision Tree

  • Random Forest

Today’s focus is on XGBoost

XGBoost

Ensemble Learning

Ensemble learning is a machine learning paradigm where multiple models (often called “weak learners”) are trained to solve the same problem and combined to get better results. The main hypothesis is that when weak models are correctly combined we can obtain more accurate and/or robust models.

We can mention three major kinds of meta-algorithms that aims at combining weak learners:

  • bagging, that often considers homogeneous weak learners, learns them independently from each other in parallel and combines them following some kind of deterministic averaging process

  • boosting, that often considers homogeneous weak learners, learns them sequentially in a very adaptative way (a base model depends on the previous ones) and combines them following a deterministic strategy

  • stacking, that often considers heterogeneous weak learners, learns them in parallel and combines them by training a meta-model to output a prediction based on the different weak models predictions

Boosting

Several supervised machine learning models are founded on a single predictive model (i.e. linear regression, penalized models, naive Bayes, support vector machines). Alternatively, other approaches such as bagging and random forests are built on the idea of building an ensemble of models where each individual model predicts the outcome and then the ensemble simply averages the predicted values. The family of boosting methods is based on a different, constructive strategy of ensemble formation.

The main idea of boosting is to add new models to the ensemble sequentially. At each particular iteration, a new weak, base-learner model is trained with respect to the error of the whole ensemble learnt so far.

Let’s discuss each component of the previous sentence in closer detail because they are important.

Base-learning models: Boosting is a framework that iteratively improves any weak learning model. Many gradient boosting applications allow you to plug in various classes of weak learners at your disposal. In practice however, boosted algorithms almost always use decision trees as the base-learner.

Training weak models: A weak model is one whose error rate is only slightly better than random guessing. The idea behind boosting is that each sequential model builds a simple weak model to slightly improve the remaining errors. With regards to decision trees, shallow trees represent a weak learner. Commonly, trees with only 1-6 splits are used. Combining many weak models (versus strong ones) has a few benefits:

  • Speed: Constructing weak models is computationally cheap.

  • Accuracy improvement: Weak models allow the algorithm to learn slowly; making minor adjustments in new areas where it does not perform well. In general, statistical approaches that learn slowly tend to perform well.

  • Avoids overfitting: Due to making only small incremental improvements with each model in the ensemble, this allows us to stop the learning process as soon as overfitting has been detected (typically by using cross-validation).

Note: In general, in terms of model performance, we have the following hierarchy:

\[ Boosting > Random Forest > Bagging > Single Tree \] There are several boosting techniques such as AdaBoost, Gradient Boosting, XGBoost, LightGBM. In this part of the recitation, we’ll focus on mainly XGBoost.

XGBoost

XGBoost stands for Extreme Gradient Boosting; it is a specific implementation of the Gradient Boosting method which uses more accurate approximations to find the best tree model. It employs a number of nifty tricks that make it exceptionally successful, particularly with structured data. The most important is computing second-order gradients, i.e. second partial derivatives of the loss function (similar to Newton’s method), which provides more information about the direction of gradients and how to get to the minimum of our loss function. While regular gradient boosting uses the loss function of our base model (e.g. decision tree) as a proxy for minimizing the error of the overall model, XGBoost uses the 2nd order derivative as an approximation.

And advanced regularization (L1 & L2), which improves model generalization.

XGBoost has additional advantages: training is very fast and can be parallelized / distributed across clusters.

Analysis

XGBoost for Classification

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ISLR)
ml_data <- College
ml_data %>%glimpse()
## Rows: 777
## Columns: 18
## $ Private     <fct> Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes, Yes…
## $ Apps        <dbl> 1660, 2186, 1428, 417, 193, 587, 353, 1899, 1038, 582, 173…
## $ Accept      <dbl> 1232, 1924, 1097, 349, 146, 479, 340, 1720, 839, 498, 1425…
## $ Enroll      <dbl> 721, 512, 336, 137, 55, 158, 103, 489, 227, 172, 472, 484,…
## $ Top10perc   <dbl> 23, 16, 22, 60, 16, 38, 17, 37, 30, 21, 37, 44, 38, 44, 23…
## $ Top25perc   <dbl> 52, 29, 50, 89, 44, 62, 45, 68, 63, 44, 75, 77, 64, 73, 46…
## $ F.Undergrad <dbl> 2885, 2683, 1036, 510, 249, 678, 416, 1594, 973, 799, 1830…
## $ P.Undergrad <dbl> 537, 1227, 99, 63, 869, 41, 230, 32, 306, 78, 110, 44, 638…
## $ Outstate    <dbl> 7440, 12280, 11250, 12960, 7560, 13500, 13290, 13868, 1559…
## $ Room.Board  <dbl> 3300, 6450, 3750, 5450, 4120, 3335, 5720, 4826, 4400, 3380…
## $ Books       <dbl> 450, 750, 400, 450, 800, 500, 500, 450, 300, 660, 500, 400…
## $ Personal    <dbl> 2200, 1500, 1165, 875, 1500, 675, 1500, 850, 500, 1800, 60…
## $ PhD         <dbl> 70, 29, 53, 92, 76, 67, 90, 89, 79, 40, 82, 73, 60, 79, 36…
## $ Terminal    <dbl> 78, 30, 66, 97, 72, 73, 93, 100, 84, 41, 88, 91, 84, 87, 6…
## $ S.F.Ratio   <dbl> 18.1, 12.2, 12.9, 7.7, 11.9, 9.4, 11.5, 13.7, 11.3, 11.5, …
## $ perc.alumni <dbl> 12, 16, 30, 37, 2, 11, 26, 37, 23, 15, 31, 41, 21, 32, 26,…
## $ Expend      <dbl> 7041, 10527, 8735, 19016, 10922, 9727, 8861, 11487, 11644,…
## $ Grad.Rate   <dbl> 60, 56, 54, 59, 15, 55, 63, 73, 80, 52, 73, 76, 74, 68, 55…

You can see some details about the dataset. It has 777 observations and 18 variables where 17 of them numeric and one of them categorical variable which is our response variable.

There are many packages in R to apply XGBoost algorithm. In this example, we’ll use xgboost for XGBoost.

This dataset is very small to not make the R package too heavy, however XGBoost is built to manage huge datasets very efficiently.

Before the application part, let’s divide our data into two sets.

library(caret)
## Warning: package 'caret' was built under R version 4.3.3
## Zorunlu paket yükleniyor: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
# Partition into training and test data
set.seed(42)
index <- createDataPartition(ml_data$Private, p = 0.8, list = FALSE)
train_data <- ml_data[index, ]
test_data  <- ml_data[-index, ]

XGBoost only works with numeric vectors. Therefore, you need to convert all other forms of data into numeric vectors if they exist. Luckily, we have all numeric features, except the response. Now, we can train our model. Be careful! This is the most critical part of the process for the quality of our model.

The easiest way to work with xgboost is with the xgboost() function. The four most important arguments to give are

  • data: a matrix of the training data

  • label: the response variable in numeric format (for binary classification 0 & 1)

  • objective: defines what learning task should be trained, here binary classification If you have a regression problem, reg:squarederror. If you are interested in multiclass classification, it should be multisoft:prob

  • nrounds: number of boosting iterations

  • max.depth = depth of the tree

  • nthread = the number of CPU threads we are going to use

We know that XGBoost, like most other algorithms, works best when its parameters are hypertuned for optimal performance. The algorithm requires that we define the booster, objective, learning rate, and other parameters. You’ll need to spend most of your time in this step; it’s imperative that you understand your data and use cross-validation.

#install.packages("xgboost")
library(xgboost)
## Warning: package 'xgboost' was built under R version 4.3.3
## 
## Attaching package: 'xgboost'
## The following object is masked from 'package:dplyr':
## 
##     slice
set.seed(1)
xgboost_model <- xgboost(data = as.matrix(train_data[, -1]), 
                         label = as.numeric(train_data$Private)-1,
                         max_depth = 3, 
                         objective = "binary:logistic", 
                         nrounds = 10, 
                         verbose = FALSE)
xgboost_model
## ##### xgb.Booster
## raw: 15.2 Kb 
## call:
##   xgb.train(params = params, data = dtrain, nrounds = nrounds, 
##     watchlist = watchlist, verbose = verbose, print_every_n = print_every_n, 
##     early_stopping_rounds = early_stopping_rounds, maximize = maximize, 
##     save_period = save_period, save_name = save_name, xgb_model = xgb_model, 
##     callbacks = callbacks, max_depth = 3, objective = "binary:logistic")
## params (as set within xgb.train):
##   max_depth = "3", objective = "binary:logistic", validate_parameters = "TRUE"
## xgb.attributes:
##   niter
## callbacks:
##   cb.evaluation.log()
## # of features: 17 
## niter: 10
## nfeatures : 17 
## evaluation_log:
##      iter train_logloss
##     <num>         <num>
##         1     0.4829681
##         2     0.3649178
## ---                    
##         9     0.1129882
##        10     0.1020342

With predict(), we can use this model to make predictions on test data. Here, I’ll be feeding this directly to the confusionMatrix function.

data<-predict(xgboost_model, as.matrix(test_data[,-1]),type="response")
data<-as.factor(ifelse(data>0.5,1,0))
confusionMatrix(data,reference = as.factor(as.numeric(test_data$Private)-1)) 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0  33   6
##          1   9 107
##                                           
##                Accuracy : 0.9032          
##                  95% CI : (0.8454, 0.9448)
##     No Information Rate : 0.729           
##     P-Value [Acc > NIR] : 7.156e-08       
##                                           
##                   Kappa : 0.7494          
##                                           
##  Mcnemar's Test P-Value : 0.6056          
##                                           
##             Sensitivity : 0.7857          
##             Specificity : 0.9469          
##          Pos Pred Value : 0.8462          
##          Neg Pred Value : 0.9224          
##              Prevalence : 0.2710          
##          Detection Rate : 0.2129          
##    Detection Prevalence : 0.2516          
##       Balanced Accuracy : 0.8663          
##                                           
##        'Positive' Class : 0               
## 

After constructing your model in xgboost, you can draw a variable importance plot (VIP). The purpose of this function is to easily represent the importance of each feature of a model. There are several functions from different package for this plot.

#view variable importance plot
mat<-xgb.importance (feature_names = colnames(train_data[, -1]),model = xgboost_model)
xgb.plot.importance (mat)

As you see F.Undergrad is the most important feature in xgboost algorithm. On the other hand,Top25Perc-Accept-Personal are not that important. Thus, you can rebuild your model by omitting those unimportant features.

The parameter tunning in xgboost is time consuming. You can consider alternative packages for xgboost models. The simple example is shown below.

library(caret) param_grid <- expand.grid(nrounds = c(10,12), #Specifies the number of boosting iterations. max_depth = c(3, 4), #Determines the maximum depth of each decision tree.
eta = c(0.01, 0.1), #Specifies the learning rate. This parameter controls how much the weights are updated in each boosting iteration.
gamma = c(0, 0.1, 0.2), #Specifies the minimum loss reduction required to make a further partition on a leaf node of the tree.
colsample_bytree = c(0.7), #Specifies the fraction of features (columns) to be randomly sampled for each tree.
min_child_weight = c(1,5), #Specifies the minimum sum of instance weight (hessian) needed in a child node.
subsample = c(0.7) #Specifies the fraction of samples (observations) to be randomly sampled for each tree. )
xgb_train_control <- trainControl(method = “cv”, number = 5)
xgb_model <- train( x = as.matrix(train_data[, -1]),y = as.numeric(train_data$Private) - 1,method = “xgbTree”, trControl = xgb_train_control,tuneGrid = param_grid)
print(xgb_model$bestTune)

XgBoost for Regression

Let’s consider Prestige data that we have studied in the previous recitation. Lets remember the data set.

library(car)
## Zorunlu paket yükleniyor: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
## The following object is masked from 'package:purrr':
## 
##     some
data(Prestige)
head(Prestige)
education income women prestige census type
gov.administrators 13.11 12351 11.16 68.8 1113 prof
general.managers 12.26 25879 4.02 69.1 1130 prof
accountants 12.77 9271 15.70 63.4 1171 prof
purchasing.officers 11.42 8865 9.11 56.8 1175 prof
chemists 14.62 8403 11.68 73.5 2111 prof
physicists 15.64 11030 5.13 77.6 2113 prof
dim(Prestige)
## [1] 102   6

The data set has 102 observations and 6 variables. You can see the name of the variables and their corresponding class below.

Data dictionary:

education: average education of occupational incumbents, years, in 1971

income: average income of incumbents, dollars, in 1971

women: percentage of incumbents who are women

prestige: Pineo-Porter prestige score for occupation, from a social survey conducted in the mid- 1960s

census: Canadian Census occupational code

type: a factor for type of occupation with levels bc (Blue Collar), prof (Professional, Managerial and Technical) and wc (White Collar)

str(Prestige)
## 'data.frame':    102 obs. of  6 variables:
##  $ education: num  13.1 12.3 12.8 11.4 14.6 ...
##  $ income   : int  12351 25879 9271 8865 8403 11030 8258 14163 11377 11023 ...
##  $ women    : num  11.16 4.02 15.7 9.11 11.68 ...
##  $ prestige : num  68.8 69.1 63.4 56.8 73.5 77.6 72.6 78.1 73.1 68.8 ...
##  $ census   : int  1113 1130 1171 1175 2111 2113 2133 2141 2143 2153 ...
##  $ type     : Factor w/ 3 levels "bc","prof","wc": 2 2 2 2 2 2 2 2 2 2 ...
summary(Prestige)
##    education          income          women           prestige    
##  Min.   : 6.380   Min.   :  611   Min.   : 0.000   Min.   :14.80  
##  1st Qu.: 8.445   1st Qu.: 4106   1st Qu.: 3.592   1st Qu.:35.23  
##  Median :10.540   Median : 5930   Median :13.600   Median :43.60  
##  Mean   :10.738   Mean   : 6798   Mean   :28.979   Mean   :46.83  
##  3rd Qu.:12.648   3rd Qu.: 8187   3rd Qu.:52.203   3rd Qu.:59.27  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3120   prof:31  
##  Median :5135   wc  :23  
##  Mean   :5402   NA's: 4  
##  3rd Qu.:8312            
##  Max.   :9517

We have some missing values and we’ll ignore them for this study.

data<-na.omit(Prestige)
summary(data)
##    education          income          women           prestige    
##  Min.   : 6.380   Min.   : 1656   Min.   : 0.000   Min.   :17.30  
##  1st Qu.: 8.445   1st Qu.: 4250   1st Qu.: 3.268   1st Qu.:35.38  
##  Median :10.605   Median : 6036   Median :14.475   Median :43.60  
##  Mean   :10.795   Mean   : 6939   Mean   :28.986   Mean   :47.33  
##  3rd Qu.:12.755   3rd Qu.: 8226   3rd Qu.:52.203   3rd Qu.:59.90  
##  Max.   :15.970   Max.   :25879   Max.   :97.510   Max.   :87.20  
##      census       type   
##  Min.   :1113   bc  :44  
##  1st Qu.:3116   prof:31  
##  Median :5132   wc  :23  
##  Mean   :5400            
##  3rd Qu.:8328            
##  Max.   :9517

Missing values are omitted.

Now, we can partition the data into train and test sets.

dim(data)
## [1] 98  6
set.seed(123)
trainindex<-sample(1:98,round(0.8*98))
train<-data[trainindex,]
test<-data[-trainindex,]

Now, we are ready to build a boosting model. Please note that you have three alternatives for xgboosting in caret package. You have ‘xgbDART’,‘xgbLinear’ and ‘xgbTree’. All of them can be used for either classification or regression problem.

In this tutorial, let’s move on with xgbTree.

modelLookup("xgbTree") #lets see tuneable parameters for decision tree
model parameter label forReg forClass probModel
xgbTree nrounds # Boosting Iterations TRUE TRUE TRUE
xgbTree max_depth Max Tree Depth TRUE TRUE TRUE
xgbTree eta Shrinkage TRUE TRUE TRUE
xgbTree gamma Minimum Loss Reduction TRUE TRUE TRUE
xgbTree colsample_bytree Subsample Ratio of Columns TRUE TRUE TRUE
xgbTree min_child_weight Minimum Sum of Instance Weight TRUE TRUE TRUE
xgbTree subsample Subsample Percentage TRUE TRUE TRUE

The function trainControl generates parameters that further control how models are created, with possible values:

train.control <- trainControl( method = "repeatedcv", repeats = 3, number = 10, search = 'grid')
train.control
## $method
## [1] "repeatedcv"
## 
## $number
## [1] 10
## 
## $repeats
## [1] 3
## 
## $search
## [1] "grid"
## 
## $p
## [1] 0.75
## 
## $initialWindow
## NULL
## 
## $horizon
## [1] 1
## 
## $fixedWindow
## [1] TRUE
## 
## $skip
## [1] 0
## 
## $verboseIter
## [1] FALSE
## 
## $returnData
## [1] TRUE
## 
## $returnResamp
## [1] "final"
## 
## $savePredictions
## [1] FALSE
## 
## $classProbs
## [1] FALSE
## 
## $summaryFunction
## function (data, lev = NULL, model = NULL) 
## {
##     if (is.character(data$obs)) 
##         data$obs <- factor(data$obs, levels = lev)
##     postResample(data[, "pred"], data[, "obs"])
## }
## <bytecode: 0x00000283e00695b0>
## <environment: namespace:caret>
## 
## $selectionFunction
## [1] "best"
## 
## $preProcOptions
## $preProcOptions$thresh
## [1] 0.95
## 
## $preProcOptions$ICAcomp
## [1] 3
## 
## $preProcOptions$k
## [1] 5
## 
## $preProcOptions$freqCut
## [1] 19
## 
## $preProcOptions$uniqueCut
## [1] 10
## 
## $preProcOptions$cutoff
## [1] 0.9
## 
## 
## $sampling
## NULL
## 
## $index
## NULL
## 
## $indexOut
## NULL
## 
## $indexFinal
## NULL
## 
## $timingSamps
## [1] 0
## 
## $predictionBounds
## [1] FALSE FALSE
## 
## $seeds
## [1] NA
## 
## $adaptive
## $adaptive$min
## [1] 5
## 
## $adaptive$alpha
## [1] 0.05
## 
## $adaptive$method
## [1] "gls"
## 
## $adaptive$complete
## [1] TRUE
## 
## 
## $trim
## [1] FALSE
## 
## $allowParallel
## [1] TRUE
xgb_fit
## eXtreme Gradient Boosting 
## 
## 78 samples
##  5 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times) 
## Summary of sample sizes: 70, 70, 70, 70, 70, 70, ... 
## Resampling results across tuning parameters:
## 
##   eta  max_depth  colsample_bytree  subsample  nrounds  RMSE      Rsquared 
##   0.3  1          0.6               0.50        50      7.752333  0.8370037
##   0.3  1          0.6               0.50       100      8.015271  0.8199954
##   0.3  1          0.6               0.50       150      8.187536  0.8123232
##   0.3  1          0.6               0.75        50      7.552155  0.8474650
##   0.3  1          0.6               0.75       100      7.700261  0.8395046
##   0.3  1          0.6               0.75       150      7.885341  0.8298171
##   0.3  1          0.6               1.00        50      7.562948  0.8416019
##   0.3  1          0.6               1.00       100      7.776726  0.8329751
##   0.3  1          0.6               1.00       150      7.988729  0.8235724
##   0.3  1          0.8               0.50        50      7.648696  0.8353407
##   0.3  1          0.8               0.50       100      7.957648  0.8210009
##   0.3  1          0.8               0.50       150      8.170910  0.8127447
##   0.3  1          0.8               0.75        50      7.557307  0.8415985
##   0.3  1          0.8               0.75       100      7.857606  0.8286172
##   0.3  1          0.8               0.75       150      8.189405  0.8137349
##   0.3  1          0.8               1.00        50      7.549337  0.8424114
##   0.3  1          0.8               1.00       100      7.878907  0.8285814
##   0.3  1          0.8               1.00       150      8.086710  0.8196157
##   0.3  2          0.6               0.50        50      7.869123  0.8345575
##   0.3  2          0.6               0.50       100      8.133372  0.8203450
##   0.3  2          0.6               0.50       150      8.257531  0.8146044
##   0.3  2          0.6               0.75        50      7.766373  0.8349972
##   0.3  2          0.6               0.75       100      8.082333  0.8214051
##   0.3  2          0.6               0.75       150      8.171637  0.8185383
##   0.3  2          0.6               1.00        50      7.972773  0.8286888
##   0.3  2          0.6               1.00       100      8.215664  0.8197436
##   0.3  2          0.6               1.00       150      8.316742  0.8157639
##   0.3  2          0.8               0.50        50      7.718295  0.8393723
##   0.3  2          0.8               0.50       100      7.875644  0.8293285
##   0.3  2          0.8               0.50       150      8.016627  0.8274856
##   0.3  2          0.8               0.75        50      7.586260  0.8407252
##   0.3  2          0.8               0.75       100      7.795302  0.8320091
##   0.3  2          0.8               0.75       150      7.932628  0.8258335
##   0.3  2          0.8               1.00        50      7.818485  0.8333674
##   0.3  2          0.8               1.00       100      8.072236  0.8209317
##   0.3  2          0.8               1.00       150      8.156141  0.8171898
##   0.3  3          0.6               0.50        50      7.720348  0.8379868
##   0.3  3          0.6               0.50       100      7.961974  0.8287066
##   0.3  3          0.6               0.50       150      8.025675  0.8257836
##   0.3  3          0.6               0.75        50      8.057500  0.8211377
##   0.3  3          0.6               0.75       100      8.185507  0.8144843
##   0.3  3          0.6               0.75       150      8.228133  0.8128665
##   0.3  3          0.6               1.00        50      7.800971  0.8312237
##   0.3  3          0.6               1.00       100      7.842201  0.8293366
##   0.3  3          0.6               1.00       150      7.858624  0.8283794
##   0.3  3          0.8               0.50        50      8.160567  0.8181445
##   0.3  3          0.8               0.50       100      8.300231  0.8138446
##   0.3  3          0.8               0.50       150      8.392770  0.8103206
##   0.3  3          0.8               0.75        50      7.625124  0.8439644
##   0.3  3          0.8               0.75       100      7.761710  0.8378224
##   0.3  3          0.8               0.75       150      7.770609  0.8375130
##   0.3  3          0.8               1.00        50      7.792008  0.8309176
##   0.3  3          0.8               1.00       100      7.858128  0.8286100
##   0.3  3          0.8               1.00       150      7.876888  0.8278376
##   0.4  1          0.6               0.50        50      7.949183  0.8362333
##   0.4  1          0.6               0.50       100      8.097466  0.8301564
##   0.4  1          0.6               0.50       150      8.436288  0.8178521
##   0.4  1          0.6               0.75        50      7.743761  0.8270844
##   0.4  1          0.6               0.75       100      7.884670  0.8203779
##   0.4  1          0.6               0.75       150      8.205818  0.8093017
##   0.4  1          0.6               1.00        50      8.057334  0.8232600
##   0.4  1          0.6               1.00       100      8.342742  0.8088843
##   0.4  1          0.6               1.00       150      8.543378  0.7998819
##   0.4  1          0.8               0.50        50      8.172878  0.8248439
##   0.4  1          0.8               0.50       100      8.273292  0.8158022
##   0.4  1          0.8               0.50       150      8.511259  0.8037771
##   0.4  1          0.8               0.75        50      7.796722  0.8284525
##   0.4  1          0.8               0.75       100      8.275050  0.8071278
##   0.4  1          0.8               0.75       150      8.500491  0.7966827
##   0.4  1          0.8               1.00        50      7.685630  0.8354141
##   0.4  1          0.8               1.00       100      7.968479  0.8239433
##   0.4  1          0.8               1.00       150      8.196255  0.8144187
##   0.4  2          0.6               0.50        50      8.428225  0.8030814
##   0.4  2          0.6               0.50       100      8.779375  0.7909881
##   0.4  2          0.6               0.50       150      8.953197  0.7795901
##   0.4  2          0.6               0.75        50      7.975749  0.8186646
##   0.4  2          0.6               0.75       100      8.108609  0.8123720
##   0.4  2          0.6               0.75       150      8.214081  0.8079324
##   0.4  2          0.6               1.00        50      7.961918  0.8184069
##   0.4  2          0.6               1.00       100      8.166591  0.8091396
##   0.4  2          0.6               1.00       150      8.225765  0.8056856
##   0.4  2          0.8               0.50        50      8.161019  0.8220064
##   0.4  2          0.8               0.50       100      8.429348  0.8084968
##   0.4  2          0.8               0.50       150      8.606653  0.8024146
##   0.4  2          0.8               0.75        50      7.936306  0.8202314
##   0.4  2          0.8               0.75       100      8.110384  0.8119163
##   0.4  2          0.8               0.75       150      8.173843  0.8096364
##   0.4  2          0.8               1.00        50      7.895045  0.8284976
##   0.4  2          0.8               1.00       100      8.062923  0.8226782
##   0.4  2          0.8               1.00       150      8.140647  0.8194252
##   0.4  3          0.6               0.50        50      7.821404  0.8377507
##   0.4  3          0.6               0.50       100      7.932114  0.8325645
##   0.4  3          0.6               0.50       150      8.076532  0.8254944
##   0.4  3          0.6               0.75        50      7.769051  0.8284678
##   0.4  3          0.6               0.75       100      7.872527  0.8239171
##   0.4  3          0.6               0.75       150      7.884213  0.8232936
##   0.4  3          0.6               1.00        50      7.799050  0.8263725
##   0.4  3          0.6               1.00       100      7.845549  0.8239333
##   0.4  3          0.6               1.00       150      7.852862  0.8235760
##   0.4  3          0.8               0.50        50      8.104270  0.8304077
##   0.4  3          0.8               0.50       100      8.205482  0.8252899
##   0.4  3          0.8               0.50       150      8.265548  0.8231424
##   0.4  3          0.8               0.75        50      7.957764  0.8248469
##   0.4  3          0.8               0.75       100      8.022601  0.8199958
##   0.4  3          0.8               0.75       150      7.996814  0.8215958
##   0.4  3          0.8               1.00        50      7.894760  0.8266699
##   0.4  3          0.8               1.00       100      7.918315  0.8260777
##   0.4  3          0.8               1.00       150      7.923373  0.8259441
##   MAE     
##   6.569946
##   6.699075
##   6.865963
##   6.461797
##   6.493482
##   6.565247
##   6.538796
##   6.585536
##   6.663360
##   6.572011
##   6.699939
##   6.845253
##   6.351256
##   6.501742
##   6.745407
##   6.549843
##   6.711613
##   6.817804
##   6.629649
##   6.750524
##   6.845219
##   6.537164
##   6.707817
##   6.722335
##   6.750328
##   6.845069
##   6.904649
##   6.410945
##   6.461442
##   6.553715
##   6.300494
##   6.423482
##   6.503863
##   6.547676
##   6.615540
##   6.635876
##   6.372196
##   6.515309
##   6.574282
##   6.588381
##   6.662916
##   6.682900
##   6.497673
##   6.468126
##   6.470875
##   6.725149
##   6.791477
##   6.890304
##   6.240829
##   6.301835
##   6.318129
##   6.414138
##   6.439506
##   6.449135
##   6.776529
##   6.889290
##   7.106158
##   6.520009
##   6.555447
##   6.742598
##   6.977977
##   7.103711
##   7.181817
##   6.827643
##   6.840188
##   6.966932
##   6.631681
##   6.980508
##   7.094701
##   6.521408
##   6.675830
##   6.775570
##   6.999318
##   7.242151
##   7.418802
##   6.629591
##   6.701313
##   6.830977
##   6.605572
##   6.679783
##   6.720112
##   6.716098
##   6.937560
##   7.083846
##   6.647584
##   6.802193
##   6.852267
##   6.408433
##   6.462858
##   6.505561
##   6.513028
##   6.597156
##   6.685818
##   6.412956
##   6.473415
##   6.489057
##   6.446347
##   6.439062
##   6.443325
##   6.730861
##   6.811883
##   6.887423
##   6.455723
##   6.498299
##   6.496352
##   6.471984
##   6.477547
##   6.480408
## 
## Tuning parameter 'gamma' was held constant at a value of 0
## Tuning
##  parameter 'min_child_weight' was held constant at a value of 1
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were nrounds = 50, max_depth = 1, eta
##  = 0.3, gamma = 0, colsample_bytree = 0.8, min_child_weight = 1 and subsample
##  = 1.

Let’s calculate MSE on test data.

pred<-predict(xgb_fit,data=test[,-4])
test_xgb_mse<-mean((pred-test[,4])^2)
## Warning in pred - test[, 4]: uzun olan nesne uzunluğu kısa olan nesne
## uzunluğunun bir katı değil
paste("MSE for XGBoost:",round(test_xgb_mse,3))
## [1] "MSE for XGBoost: 472.222"

Remember that MSE for Random Forest was 468.735 and the MSE for linear regression was 483.118, obtained in the previous recitation. Thus, Random Forest outperforms the XGBoost. However, if we tune hyperparameters, we may get better results than random forest.