Close price was extracted as a numeric vector to act as the target variable (Y). This separation ensures a clear
distinction between independent variables and the dependent variable in a supervised learning setup. The
same process was applied to both training and test sets to maintain consistency in model evaluation. Overall,
this structure allows XGBoost to learn the relationship between past values and the current stock price.
Training XGBoost Model
dtrain <- xgb.DMatrix(data = x_train, label = y_train)
dtest <- xgb.DMatrix(data = x_test)
params <- list(
objective
eta
= "reg:squarederror",
= 0.1,
= 8
max_depth
)
set.seed(3274)
model_xgb <- xgb.train(
params
data
= params,
= dtrain,
nrounds
watchlist
= 200,
= list(train = dtrain, test = dtest),
early_stopping_rounds = 50,
print_every_n
= 10)
The training and test datasets were converted into XGBoost’s DMatrix format to optimize computation and
model efficiency. The training set (dtrain) contains the feature matrix x_train and target y_train, while the
test set (dtest) contains x_test and y_test for model evaluation. Model parameters were specified using a
regression objective (reg:squarederror), with a learning rate (eta = 0.1) controlling update magnitude and
max_depth = 8 limiting tree complexity to reduce overfitting. The model was trained over 200 boosting
rounds while monitoring both training and test performance through a watchlist. Early stopping was applied
with a tolerance of 50 rounds to prevent unnecessary training once performance stopped improving. A fixed
seed was used to ensure reproducibility of results.
Evaluating XGBoost Model Performance
Table 15
Model Metrics
Values
RMSE
MAPE
4.22
22.92
The model diagnostics for XGBoost yielded an RMSE of 4.22 and a MAPE of 22.92, indicating the average
magnitude of prediction error. Unlike traditional statistical models, AIC and BIC cannot be computed for
XGBoost because it is not based on a likelihood or link function but on an ensemble of decision trees. As such,
its performance is evaluated using predictive accuracy metrics rather than goodness-of-fit criteria.
When compared to the SARIMA model, which achieved a lower MAPE, the results suggest that SARIMA
provides more accurate forecasts for the dataset. This indicates that the time series structure, such as
autocorrelation and seasonality, is better captured by SARIMA. Overall, while XGBoost offers flexibility,
SARIMA demonstrates superior predictive performance in this study.
Forecasting
In this section, both the SARIMA and XGBoost models are used to perform out-of-sample forecasting on the test
dataset to evaluate their predictive performance on unseen data. Although the SARIMA model demonstrated
17