Forecasting Closing Stock Prices from 1996 to 2020  
Forecasting Closing Stock Prices using  
ARIMA, SARIMA, and XGBoost  
Bautista, Benedict D.  
Table of contents  
2
2
2
2
2
3
3
Loading Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Loading the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Filtering Variables of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Splitting into Train-Test sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Plotting Train-Test sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
4
4
5
6
7
8
Monthly Trend of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Seasonality of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Lag Plot for Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Plotting Decomposed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
8
9
9
ARIMA and SARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  
Partial / Autocorrelation Plot of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . .  
Evaluating Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10  
Partial / Autocorrelation Plot of Differenced Values . . . . . . . . . . . . . . . . . . . . . . . 10  
Plotting Close and Differenced Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12  
Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13  
Evaluating Time Series Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13  
Residuals of the Best Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14  
Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14  
Fitting XGBoost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15  
Creating Lags as Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15  
Splitting Train and Test set for XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15  
Plotting Response Variable for XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15  
Separating Features and Response Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16  
Training XGBoost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17  
Evaluating XGBoost Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17  
17  
Forecasting using SARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18  
Forecasting using XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19  
20  
1
Introduction  
The dataset provided by Devarash Raval on kaggle (you can click this for the dataset) , it was created to predict  
market recession inspired by assignment notebook in an online course. However, in this paper the dataset  
will be utilized to serve as the dataset for time series modelling using Seasonal Autoregressive Intergrated  
Moving Average (SARIMA) Model, Autoregressive Intergrated Moving Average (ARIMA) Model, and eXtreme  
Gradient Boosting (XGBoost) Model to predict future Closing stock prices, performing in-sample forecasting  
and ex-ante forecasting. Furthermore, these models will be compared by its predictive power using several  
metrics that will be specified later on this paper.  
Data Pre-processing  
Loading Libraries  
library(fpp3)  
library(here)  
library(tidyverse)  
library(xgboost)  
library(pander) # for quarto purposes only  
library(patchwork) # for data viz arrangements  
library(knitr) # for tidy tables  
Loading the Data  
Table 1  
Preview of the data  
Date  
Open  
High  
Low  
Close  
Adj Close  
Volume  
1996-01-01  
1996-02-01  
1996-03-01  
1996-04-01  
1996-05-01  
1996-06-01  
1996-07-01  
1996-08-01  
26.62  
29.88  
30.25  
30.25  
30.62  
31.62  
27.25  
27.75  
30.12  
32.88  
32  
32.12  
32.12  
31.75  
29.25  
29.5  
26.62  
29.88  
28.25  
29.38  
29.75  
26.88  
24.88  
26.5  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
20.75  
21.37  
21.46  
21.64  
22.26  
19.21  
19.74  
19.12  
41074000  
46490100  
36573500  
34821300  
27064600  
33389300  
26284100  
20525500  
Filtering Variables of Interest  
Table 2  
Preview of aggregated dataset  
Date  
Close  
1996 Jan  
1996 Feb  
1996 Mar  
1996 Apr  
1996 May  
1996 Jun  
1996 Jul  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
1996 Aug  
2
         
Splitting into Train-Test sets  
Table 3  
Preview of training dataset  
Date  
Close  
1996 Jan  
1996 Feb  
1996 Mar  
1996 Apr  
1996 May  
1996 Jun  
1996 Jul  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
1996 Aug  
Table 4  
Preview of testing dataset  
Date  
Close  
2015 Jan  
2015 Feb  
2015 Mar  
2015 Apr  
2015 May  
2015 Jun  
2015 Jul  
12.78  
13.02  
10.96  
13.02  
11.86  
10.66  
7.06  
2015 Aug  
6.95  
Plotting Train-Test sets  
Figure 1  
Splitting Train-Test sets  
50  
40  
30  
20  
10  
2000 Jan  
2010 Jan  
2020 Jan  
Date [1M]  
3
     
Both the training and test data sets were graphically represented and are displayed in Figure 1 above,  
demonstrating how the data were divided into two sets for the purpose of building the prediction model.  
The training set, which consisted of the past data records between January 1996 and December 2014, was  
used for the construction of the prediction model, while the test data set, which consisted of the most recent  
data values between January 2015 and November 2020, was used to evaluate the accuracy of predictions.  
Exploratory Data Analysis  
This section aims to examine the underlying structure of the dataset through a comprehensive exploration  
of its key characteristics and visual representations. By systematically analyzing the behavior of the time  
series, this part seeks to provide a clearer understanding of its overall pattern and the range within which  
the data is expected to vary. To achieve this, several analytical techniques will be utilized, including time  
series decomposition to separate trend, seasonality, and irregular components, as well as seasonality checks  
to identify recurring patterns over time. In addition, lag plots will be used to assess potential relationships  
between observations at different time intervals, while Autocorrelation Function (ACF) and Partial Autocor-  
relation Function (PACF) plots will help in identifying the presence of correlation structures within the series.  
Through these methods, the section intends to uncover meaningful insights into the data’s behavior, which  
will serve as a foundation for further modeling and analysis.  
Monthly Trend of Closing Stock Prices  
Figure 2  
Graph of Closing Stock Prices  
50  
40  
30  
20  
10  
2000 Jan  
2010 Jan  
Date [1M]  
2020 Jan  
Figure 2 illustrates the monthly trend of closing stock prices spanning from 1996 to 2020. The data indicate  
a steady upward trend from 1996 until 2010. This is followed by a noticeable decline between 2011 and  
the latter part of 2015, which may be attributed to economic issues during that period. Furthermore, from  
2016 to 2020, the series demonstrates a recovery phase, with closing stock prices starting to have an upward  
trend again.  
4
     
Seasonality of Closing Stock Prices  
Figure 3  
Seasonality Plot for Closing Stock Prices  
2010  
2011  
50  
40  
30  
20  
10  
2007  
2009  
2008  
2012  
2006  
1996  
2005  
2020  
2004  
2003  
2014  
2015  
Jan  
Feb Mar  
Apr  
Apr  
May  
May  
Jun  
Jul  
Jul  
Aug  
Aug  
Sept  
Sept  
Oct  
Oct  
Nov  
Nov  
Dec  
Date  
Jan  
Feb  
Mar  
Jun  
Dec  
50  
40  
30  
20  
10  
Date  
From Figure 3 a seasonality plot grouped by year, with the months shown along the x-axis. A closer examina-  
tion reveals no consistent or repeating pattern across different years, indicating that strong seasonality is  
not present in the data. Although the subseries plot shows some limited seasonal effects may appear within  
specific ranges, these are not sustained throughout the series. Overall, the data exhibit noticeable trends  
rather than clear seasonal patterns.  
5
   
Lag Plot for Closing Stock Prices  
Figure 4  
Lag Plot (k = 12)  
lag 1  
lag 2  
lag 3  
lag 4  
50  
40  
30  
20  
10  
season  
Jan  
Feb  
Mar  
Apr  
May  
Jun  
Jul  
lag 5  
lag 6  
lag 7  
lag 8  
50  
40  
30  
20  
10  
Aug  
Sept  
Oct  
Nov  
Dec  
lag 9  
lag 10  
lag 11  
lag 12  
50  
40  
30  
20  
10  
10 20 30 40 50  
10 20 30 40 50  
10 20 30 40 50  
10 20 30 40 50  
lag(Close, n)  
The twelve lag plots presented in Figure 4 exhibit a strong linear relationship across all twelve lags. This  
pattern typically indicates a high degree of autocorrelation, suggesting that the closing stock prices are  
influenced by their previous values. It is evident that from lag 1 through lag 12, a linear trend remains observ-  
able; however, the pattern becomes slightly more dispersed as the lag increases. This gradual weakening of  
linearity implies that the predictive influence of past closing prices slightly declines as the time gap becomes  
larger.  
6
   
Decomposition  
Table 5  
Decomposition Using STL  
.model  
Date  
Close  
trend  
season_year  
remainder  
season_adjust  
stl  
stl  
stl  
stl  
stl  
stl  
stl  
stl  
1996 Jan  
1996 Feb  
1996 Mar  
1996 Apr  
1996 May  
1996 Jun  
1996 Jul  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
30.42  
30.09  
29.75  
29.42  
29.09  
28.75  
28.41  
28.05  
-0.2178  
0.2521  
-0.2142  
0.7301  
1.035  
-0.05651  
-0.8301  
-0.9024  
-0.8257  
-0.08838  
0.8351  
0.4731  
1.379  
-1.568  
0.2922  
-0.1452  
29.59  
30  
30.59  
29.89  
30.46  
27.18  
28.71  
27.9  
1996 Aug  
Figure 5  
Decomposed Closing Stock Prices  
STL decomposition  
Close = trend + season_year + remainder  
50  
40  
30  
20  
10  
50  
40  
30  
20  
10  
2
1
0
−1  
−2  
10  
5
0
−5  
−10  
2000 Jan  
2010 Jan  
Date  
2020 Jan  
Figure 5 presents the decomposed time series of the closing stock prices. Decomposition was first employed  
to examine the behavior of the series by isolating its individual components. Upon decomposition, it becomes  
evident that the seasonal component is not constant over time, as it varies from year to year. This is par-  
ticularly noticeable when the trend and remainder components are removed, revealing fluctuations in the  
seasonal pattern. In contrast, when the seasonal and remainder components are removed, the underlying  
trend and cyclical movements of the data become more apparent. Given this non-constant seasonality, STL  
decomposition is therefore the more appropriate method for analyzing the time series.  
7
   
Plotting Decomposed Data  
Figure 6  
Line Graphs of each Component  
Trend VS Actual  
50  
Seasonally Adjusted VS Actual  
50  
40  
30  
20  
10  
40  
30  
20  
10  
2000 Jan  
2010 Jan  
Date [1M]  
2020 Jan  
2000 Jan  
2010 Jan  
2020 Jan  
Date [1M]  
Decomposed Components  
50  
40  
30  
20  
10  
2000 Jan  
2010 Jan  
Date [1M]  
2020 Jan  
In Figure 6, a comparison between the seasonally trend adjusted data vs the actual series, seasonally adjusted  
data vs the actual series reveals that once periodic effects are removed, the price action aligns closely with  
the smoothed trend line. This high degree of alignment confirms that the primary driver of the asset’s value  
is its long-term growth trajectory rather than cyclical seasonal variations.  
Model Fitting  
This section is structured into two main components: the application of traditional Seasonal Autoregressive  
Integrated Moving Average (S/ARIMA) models and the implementation of a machine learning approach,  
specifically eXtreme Gradient Boosting (XGBoost), for forecasting. Each modeling approach will be sys-  
tematically developed and assessed to compare their performance in capturing the dynamics of the time  
series.  
8
     
To evaluate the effectiveness of the models, several performance metrics will be employed, including Mean  
Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) for measuring forecast accuracy,  
as well as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for model selection  
and goodness-of-fit. The dataset will be divided into training and testing subsets, where the training set  
will be used to fit and calibrate the models, while the testing set will be utilized to generate forecasts and  
assess predictive performance through out-of-sample evaluation. This approach ensures a more reliable  
comparison of the models’ generalization capabilities in forecasting unseen data.  
ARIMA and SARIMA  
Partial / Autocorrelation Plot of Closing Stock Prices  
Figure 7  
PACF and ACF Plot  
1.00  
0.75  
0.50  
0.25  
0.00  
0
6
12  
18  
24  
lag [1M]  
1.00  
0.75  
0.50  
0.25  
0.00  
0
6
12  
18  
24  
lag [1M]  
The Autocorrelation Function (ACF) plot presented in Figure 7 displays significant spikes that follow a gradual,  
decaying pattern. The absence of a “scalloping” or wave-like structure further confirms that the series lacks a  
distinct seasonal component. This persistent, slow decay is a primary diagnostic indicator of non-stationarity,  
reflecting a strong trend component within the closing stock prices. Conversely the Partial Autocorrelation  
Function (PACF) plot presented a significant spike at lag 1 and lag 12 and nothing further beyond lag 12.  
This could indicate that there is a relationship between the present closing stock prices to the value of the  
9
     
prices 12 months ago. Collectively, these plots confirm that the series is non-stationary and requires further  
differencing to achieve stationarity before modelling.  
Evaluating Stationarity  
Table 6  
Unitroot KPSS Test  
.model  
kpss_stat  
kpss_pvalue  
arima  
1.676  
1.676  
1.676  
1.676  
0.01  
0.01  
0.01  
0.01  
sarima1  
sarima2  
sarima3  
KPSS test was employed to evaluate the stationarity of the time series. As it turns out, the series is not  
stationary for the four time series models. This statistic affirms the presence of significant trend component.  
Partial / Autocorrelation Plot of Differenced Values  
10  
   
Figure 8  
Differencing: PACF and ACF Plot  
0.1  
0.0  
−0.1  
0
6
12  
18  
24  
lag [1M]  
0.1  
0.0  
−0.1  
0
6
12  
18  
24  
lag [1M]  
11  
 
The ACF plot from Figure 7 of the original series exhibits a slow and gradual decay, with significant correlations  
across many lags, indicating the presence of non-stationarity and a strong trend component. Similarly,  
the PACF shows a dominant spike at lag 1, which is typical of non-stationary behavior rather than a true  
autoregressive structure. After applying first-order differencing, the ACF and PACF plots from Figure 8 show a  
substantial change, with most autocorrelations falling within the confidence bounds and no evident long-term  
persistence. This suggests that the differenced series is stationary, making it suitable for ARIMA modeling.  
The absence of a clear cutoff in the PACF and only minor spikes in the ACF indicate that low-order AR or MA  
components may be sufficient to capture the remaining dependence structure.  
Plotting Close and Differenced Values  
Figure 9  
Comparing Differenced Values and Original Series  
50  
40  
30  
20  
10  
2000 Jan  
2010 Jan  
Date [1M]  
2020 Jan  
10  
5
0
−5  
−10  
−15  
2000 Jan  
2010 Jan  
Date [1M]  
2020 Jan  
Upon differencing, as shown in Figure 9, the most immediate observation is the stabilization of the variance  
across time. In contrast to the original series, which exhibits pronounced upward and downward trends,  
the differenced series fluctuates around a relatively constant mean close to zero. This indicates that the  
trend component present in the original data has been effectively removed through first-order differencing.  
Additionally, the original time series displays long-term movements, specifically during those interval with  
sharp decline and rapid increase suggesting non-stationary behavior.  
12  
   
Training the Model  
Table 7  
Fitted Models  
Model  
Order  
sarima1  
sarima2  
sarima3  
arima  
<ARIMA(0,1,1)(0,1,0)[12]>  
<ARIMA(0,0,1)(0,1,1)[12]>  
<ARIMA(0,1,1)(0,1,1)[12]>  
<ARIMA(0,1,2)>  
The model selection process involved fitting multiple candidate models to the training dataset in order to  
identify the most parsimonious yet accurate representation of the time series. As presented in Table 6, four  
models were considered, including a seasonal autoregressive integrated moving average (SARIMA) model  
with drift, specified as ARIMA(0,1,1)(0,1,0)[12], two other SARIMA variants, and a baseline non-seasonal  
ARIMA(0,1,2) model. Model evaluation was primarily based on the Akaike Information Criterion (AIC),  
corrected Akaike Information Criterion (AICc), and Bayesian Information Criterion (BIC), where lower values  
indicate a better balance between goodness-of-fit and model complexity.  
Evaluating Time Series Model Performance  
Table 8  
Model Metrics  
Model  
Variance  
AIC  
BIC  
MAPE  
RMSE  
sarima1  
sarima2  
sarima3  
arima  
18.64  
34.98  
9.029  
8.923  
1248  
1392  
1128  
1150  
1255  
1402  
1138  
1160  
11.71  
15.77  
7.923  
8.255  
4.183  
5.731  
2.905  
2.968  
As shown in Table 8, among the four fitted SARIMA models, SARIMA 3 exhibited the best overall performance  
based on both information criteria and forecast accuracy measures. It obtained the lowest Akaike Information  
Criterion (AIC = 1128) and Bayesian Information Criterion (BIC = 1138), indicating a comparatively better  
balance between model fit and complexity. These criteria assess how well each model fits the training  
data while penalizing unnecessary complexity. In addition, SARIMA 3 also yielded the lowest forecasting  
error, with a mean absolute percentage error (MAPE) of 7.923% and a root mean square error (RMSE) of  
2.905. These results suggest that SARIMA 3 not only provides the best fit to the training dataset but also  
demonstrates strong predictive performance for future values.  
13  
   
Residuals of the Best Model  
Figure 10  
Residual Plots for SARIMA 3  
10  
5
0
−5  
−10  
2000 Jan  
2005 Jan  
Date  
2010 Jan  
2015 Jan  
60  
40  
20  
0
0.2  
0.1  
0.0  
−0.1  
0
6
12  
18  
24  
−15  
−10  
−5  
0
5
10  
lag [1M]  
.resid  
Figure 10 presents the residual diagnostics of the selected SARIMA(3) model. The residuals are approxi-  
mately centered around zero with relatively constant variance, and the histogram suggests approximate  
normality. However, the autocorrelation function (ACF) of the residuals shows a significant spike at lag 12,  
indicating remaining seasonal dependence not fully captured by the model. This suggests that while the  
model adequately describes the general structure of the series, some seasonal dynamics remain unmodeled,  
implying that further refinement of the seasonal component may improve model performance.  
Ljung-Box Test  
Table 9  
Ljung-Box Test for SARIMA 3 Model  
.model  
lb_stat  
lb_pvalue  
arima  
8.094  
11.05  
430.5  
13.78  
0.6196  
0.3538  
0
sarima1  
sarima2  
sarima3  
0.1834  
The Ljung–Box test was conducted to further assess the presence of autocorrelation in the residuals, as  
suggested by the previous diagnostic plot. As shown in Table 9, the SARIMA 3 model yielded a test statistic  
of 13.777 with a p-value of 0.183. Since the p-value exceeds the 0.05 significance level, there is insufficient  
evidence to reject the null hypothesis of no autocorrelation. This indicates that the residuals do not exhibit  
significant autocorrelation, thereby supporting the adequacy of the model and contradicting the initial  
visual suggestion of autocorrelation observed in Figure 10. Therefore we do not reject the null hypothesis,  
and therefore say that the claim of autocorrelated residuals do not have sufficient evidence to prove its  
significance.  
14  
     
Fitting XGBoost Model  
Creating Lags as Features  
Table 10  
Preview of Lags Created for XGBoost  
Date  
Close  
Lag1  
Lag3  
Lag6  
Lag12  
1997 Jan  
1997 Feb  
1997 Mar  
1997 Apr  
1997 May  
1997 Jun  
1997 Jul  
26.88  
28.25  
23.75  
22.38  
25.25  
21.88  
22.81  
22.75  
28.75  
26.88  
28.25  
23.75  
22.38  
25.25  
21.88  
22.81  
26.12  
30  
27.88  
27  
25.12  
26.12  
30  
28.75  
26.88  
28.25  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
28.75  
26.88  
28.25  
23.75  
22.38  
25.25  
1997 Aug  
In Table 10, lagged variables were constructed and used as input features for the XGBoost model to incorporate  
temporal information from the time series data. Since XGBoost does not inherently account for sequential  
order, these lag features transform the dataset into a supervised learning framework. Lag 1 represents the  
previous month’s closing stock price and captures short-term price movements. Lag 3 reflects the previous  
quarter’s closing price, allowing the model to learn medium-term trends. Lag 6 corresponds to the previous  
semester’s closing price, while lag 12 represents the previous year’s closing price to capture longer-term  
and seasonal patterns. By including these lagged features, the model is able to learn relationships across  
different time scales, improving its ability to predict future stock prices.  
Splitting Train and Test set for XGBoost  
Table 11  
Preview of Train Dataset  
Date  
Close  
Lag1  
Lag3  
Lag6  
Lag12  
1997 Jan  
1997 Feb  
1997 Mar  
1997 Apr  
1997 May  
26.88  
28.25  
23.75  
22.38  
25.25  
28.75  
26.88  
28.25  
23.75  
22.38  
26.12  
30  
28.75  
26.88  
28.25  
27.88  
27  
25.12  
26.12  
30  
29.38  
30.25  
30.38  
30.62  
31.5  
Table 12  
Preview of Test Dataset  
Date  
Close  
Lag1  
Lag3  
Lag6  
Lag12  
2015 Jan  
2015 Feb  
2015 Mar  
2015 Apr  
2015 May  
12.78  
13.02  
10.96  
13.02  
11.86  
10.75  
12.78  
13.02  
10.96  
13.02  
11.87  
11.89  
10.75  
12.78  
13.02  
18.08  
18.39  
14.66  
11.87  
11.89  
19.28  
20.38  
17.83  
17.47  
16.11  
Plotting Response Variable for XGBoost  
Both the training and test datasets are visually presented in Figure 11, illustrating how the data were  
partitioned into two distinct subsets for model development and evaluation. The training set includes  
historical observations from January 1996 to December 2014 and was used to build the prediction model. In  
contrast, the test set comprises the most recent observations from January 2015 to November 2020. This  
subset was reserved to assess the model’s predictive performance on unseen data. The split ensures that the  
model is trained on past information while its accuracy is validated using more recent values.  
15  
       
Figure 11  
Splitting Train-Test for XGBoost  
50  
40  
30  
20  
10  
2000 Jan  
2005 Jan  
2010 Jan  
Date [1M]  
2015 Jan  
2020 Jan  
Separating Features and Response Variable  
Table 13  
Preview of Features Matrix  
Lag1  
Lag3  
Lag6  
Lag12  
Date  
28.75  
26.88  
28.25  
23.75  
22.38  
25.25  
21.88  
22.81  
26.12  
30  
27.88  
27  
25.12  
26.12  
30  
28.75  
26.88  
28.25  
29.38  
30.25  
30.38  
30.62  
31.5  
27.12  
27.88  
27  
324  
325  
326  
327  
328  
329  
330  
331  
28.75  
26.88  
28.25  
23.75  
22.38  
25.25  
Table 14  
Preview of Response Variable  
26.88  
28.25  
23.75  
22.38  
25.25  
21.88  
22.81  
22.75  
In the tables above, the dataset was separated into feature matrices and a response variable to meet the  
input requirements of the XGBoost algorithm. The predictor variables (Lag1, Lag3, Lag6, and Lag12) were  
converted into a numeric matrix to serve as the feature set (X) for both training and testing. Meanwhile, the  
16  
   
Close price was extracted as a numeric vector to act as the target variable (Y). This separation ensures a clear  
distinction between independent variables and the dependent variable in a supervised learning setup. The  
same process was applied to both training and test sets to maintain consistency in model evaluation. Overall,  
this structure allows XGBoost to learn the relationship between past values and the current stock price.  
Training XGBoost Model  
dtrain <- xgb.DMatrix(data = x_train, label = y_train)  
dtest <- xgb.DMatrix(data = x_test)  
params <- list(  
objective  
eta  
= "reg:squarederror",  
= 0.1,  
= 8  
max_depth  
)
set.seed(3274)  
model_xgb <- xgb.train(  
params  
data  
= params,  
= dtrain,  
nrounds  
watchlist  
= 200,  
= list(train = dtrain, test = dtest),  
early_stopping_rounds = 50,  
print_every_n  
= 10)  
The training and test datasets were converted into XGBoost’s DMatrix format to optimize computation and  
model efficiency. The training set (dtrain) contains the feature matrix x_train and target y_train, while the  
test set (dtest) contains x_test and y_test for model evaluation. Model parameters were specified using a  
regression objective (reg:squarederror), with a learning rate (eta = 0.1) controlling update magnitude and  
max_depth = 8 limiting tree complexity to reduce overfitting. The model was trained over 200 boosting  
rounds while monitoring both training and test performance through a watchlist. Early stopping was applied  
with a tolerance of 50 rounds to prevent unnecessary training once performance stopped improving. A fixed  
seed was used to ensure reproducibility of results.  
Evaluating XGBoost Model Performance  
Table 15  
Model Metrics  
Values  
RMSE  
MAPE  
4.22  
22.92  
The model diagnostics for XGBoost yielded an RMSE of 4.22 and a MAPE of 22.92, indicating the average  
magnitude of prediction error. Unlike traditional statistical models, AIC and BIC cannot be computed for  
XGBoost because it is not based on a likelihood or link function but on an ensemble of decision trees. As such,  
its performance is evaluated using predictive accuracy metrics rather than goodness-of-fit criteria.  
When compared to the SARIMA model, which achieved a lower MAPE, the results suggest that SARIMA  
provides more accurate forecasts for the dataset. This indicates that the time series structure, such as  
autocorrelation and seasonality, is better captured by SARIMA. Overall, while XGBoost offers flexibility,  
SARIMA demonstrates superior predictive performance in this study.  
Forecasting  
In this section, both the SARIMA and XGBoost models are used to perform out-of-sample forecasting on the test  
dataset to evaluate their predictive performance on unseen data. Although the SARIMA model demonstrated  
17  
     
better accuracy based on the evaluation metrics, the XGBoost model is still included for comparative analysis.  
This allows for a more comprehensive assessment of both statistical and machine learning approaches within  
the study. Including both models also enhances reproducibility by showing how different methods perform  
under the same conditions. Furthermore, it supports the overall narrative by highlighting the strengths and  
limitations of each approach. SARIMA effectively captures the inherent time series structure, while XGBoost  
relies on feature-based learning. Presenting both results provides a balanced comparison of methodologies.  
Ultimately, this approach strengthens the validity and depth of the analysis.  
Forecasting using SARIMA  
Figure 12  
SARIMA Forecasted Values  
30  
20  
10  
2016 Jan  
2018 Jan  
2020 Jan  
2022 Jan  
Date  
The out-of-sample forecast results presented in Figure 12 a indicate a clear upward movement in the predicted  
values, accompanied by a pronounced seasonal pattern starting from every first month of the year. This  
suggests that the SARIMA model successfully captured both the underlying trend and the recurring seasonal  
fluctuations present in the historical time series. Based on the estimated parameters from the observed data,  
the forecasted values exhibit a gradual increase from approximately 26 to 30 units over the forecast horizon.  
This upward trajectory implies a sustained growth pattern rather than random variation, reinforcing the  
presence of a systematic trend component in the series. The seasonal behavior further indicates that the  
data follows a consistent repeating cycle over time. Overall, the forecast reflects both trend and seasonality,  
suggesting that future values are expected to continue increasing within a stable seasonal structure.  
18  
   
Forecasting using XGBoost  
Figure 13  
XGBoost Forecasted Values  
30  
25  
20  
15  
10  
2016  
2018  
2020  
2022  
Index  
The out-of-sample forecast presented in Figure 13 indicates a generally stable downward movement in the  
predicted closing stock prices generated by the XGBoost model. The forecasted values show a gradual decline,  
ranging approximately from 27 to 24 units, suggesting a slight weakening trend over the forecast horizon.  
Unlike the SARIMA results, the XGBoost forecast does not exhibit a clear seasonal pattern, implying that  
the model did not capture or was not explicitly provided with strong seasonal structure in the data. After  
the initial decline, the values appear to stabilize around 25 units beyond 2022, indicating that the model  
converges toward a relatively steady long-term level. This behavior reflects the model’s reliance on lag-based  
learning rather than explicit decomposition of trend and seasonality. Overall, the forecast suggests a mild  
downward adjustment followed by stabilization in future stock prices.  
19  
   
Codes Used  
library(fpp3)  
library(here)  
library(tidyverse)  
library(xgboost)  
data <- read_csv(here("GOLD.csv"))  
Close <- data %>%  
select(Date, Close)  
Close <- Close %>%  
slice(1:298) %>%  
bind_rows(  
Close %>%  
slice(299:300) %>%  
summarise(  
Date = min(Date),  
Close = mean(Close, na.rm = TRUE)  
)
)
Close <- Close %>%  
mutate(Date = yearmonth(Date)) %>%  
as_tsibble(index = Date)  
Close %>% autoplot() ## plotting the whole dataset  
Close  
data_train <- Close %>%  
filter_index("1996 Jan" ~ "2015 Jan")  
data_test <- Close %>%  
filter_index("2015 Jan" ~ "2020 Jan")  
# eda  
Close %>% gg_season(labels = "right", labels_repel = T)  
Close %>% gg_subseries()  
Close %>% gg_lag(geom = "point", lag = c(1:12))  
decomp <- Close %>%  
model(  
stl = STL(Close)  
)
components(decomp) %>% autoplot()  
components(decomp) %>%  
as_tsibble() %>%  
autoplot((Close), colour = "gray") +  
geom_line(aes(y = trend), colour = "#E63946") #trend vs actual  
components(decomp) %>%  
as_tsibble() %>%  
autoplot((Close), colour = "gray") +  
geom_line(aes(y = season_adjust), colour = "#2A9D8F") ## season_adj vs actual  
data_model %>%  
ACF(Close) %>%  
20  
 
autoplot()  
Close %>%  
PACF(Close) %>%  
autoplot()  
# modelling  
data_test %>% autoplot() +  
geom_line(data = data_train,  
aes(x =Date, y = Close),  
colour = "darkgreen")  
data_train1 <- data_train %>%  
mutate(Close = log(Close))  
model_train <- data_train %>%  
model(  
sarima1 = ARIMA(Close ~ pdq(0,1,0) + PDQ(0,0,1)),  
sarima2 = ARIMA(Close ~ pdq(0,1,1) + PDQ(0,0,1)),  
sarima3 = ARIMA(Close ~ pdq(0,1,1) + PDQ(0,1,1)),  
arima = ARIMA(Close)  
)
glance(model_train) %>% select(.model, sigma2, AIC, BIC)  
resids <- model_train %>% augment()  
resids %>% features(Close, unitroot_kpss)  
resids %>% features(.innov, ljung_box, lag = 12)  
fc <- model_train %>%  
forecast(h = 60)  
fit1 <- model_train %>% pivot_longer(  
everything(),  
names_to = "Model",  
values_to = "Order")  
fit1  
glance(model_train) %>%  
arrange(AICc) %>%  
select(.model:BIC)  
accuracy(fc, data_test[c(1,2)])  
model_train %>% report() # sarima3 for insample sarima1 for out of sample  
# sarima1 and sarima3 based on evaluation  
model_train %>% select(sarima1) %>% report()  
model_train %>% select(sarima1) %>% gg_tsresiduals()  
model_train %>% select(sarima3) %>% report()  
model_train %>% select(sarima3) %>% gg_tsresiduals()  
# forecasting  
# insample  
model_train %>%  
forecast(h = 60) %>%  
filter(.model == "sarima3") %>%  
autoplot(data_train, level = c(90, 95)) +  
geom_line(data = data_test, aes(x = Date, y = Close))  
21  
# out of sample forecast  
model_train %>%  
refit(data_test) %>%  
forecast(h = 24) %>%  
filter(.model == "sarima3") %>%  
autoplot(data_test, level = c(90, 95))  
# ex-ante forecasting  
model_train %>%  
refit(Close) %>%  
forecast(h = 24) %>%  
filter(.model == "sarima1") %>%  
autoplot(Close, level = c(90, 95))  
model_train %>%  
refit(Close) %>%  
forecast(h = 24) %>%  
filter(.model == "sarima3") %>%  
autoplot(Close, level = c(90, 95))  
data_XGB <- Close %>%  
mutate(  
Lag1 = lag(Close, 1),  
Lag3 = lag(Close, 3),  
Lag6 = lag(Close, 6),  
Lag12 = lag(Close, 12)  
) %>% drop_na()  
data_XGB %>%  
head(n = 8) %>%  
pander(caption = "Preview of Lags Created for XGBoost")  
data_XGB <- Close %>%  
mutate(  
Lag1 = lag(Close, 1),  
Lag3 = lag(Close, 3),  
Lag6 = lag(Close, 6),  
Lag12 = lag(Close, 12)  
) %>% drop_na()  
data_XGB %>%  
head(n = 8) %>%  
pander(caption = "Preview of Lags Created for XGBoost")  
n_test <- nrow(data_test)  
train_xgb <- data_XGB %>% head(-n_test)  
test_xgb <- data_XGB %>% tail(n_test)  
train_xgb %>%  
head(n = 5) %>%  
pander(caption = "Preview of Train Dataset")  
test_xgb %>%  
head(n = 5) %>%  
pander(caption = "Preview of Test Dataset")  
data_XGB %>%  
autoplot(Close) +  
geom_line(data = test_xgb, aes(y = Close), colour = "red")  
22  
x_train <- train_xgb %>%  
select(Lag1, Lag3, Lag6, Lag12) %>%  
mutate(across(everything(), as.numeric)) %>%  
as.matrix()  
y_train <- train_xgb %>%  
pull(Close) %>%  
as.numeric()  
x_test <- test_xgb %>%  
select(Lag1, Lag3, Lag6, Lag12) %>%  
mutate(across(everything(), as.numeric)) %>%  
as.matrix()  
y_test <- test_xgb %>%  
pull(Close) %>% as.numeric()  
x_train %>%  
head(n = 8) %>%  
pander(caption = "Preview of Features Matrix")  
y_train %>% as.matrix() %>%  
head(n = 8) %>%  
pander(caption = "Preview of Response Variable")  
dtrain <- xgb.DMatrix(data = x_train, label = y_train)  
dtest <- xgb.DMatrix(data = x_test, label = y_test)  
params <- list(  
objective  
eta  
= "reg:squarederror",  
= 0.1,  
= 8,  
max_depth  
nround  
= 100  
)
set.seed(123)  
model_xgb <- xgb.train(  
params  
data  
= params,  
= dtrain,  
= 200)  
nrounds  
y_pred <- predict(model_xgb, dtest)  
results <- tibble(  
Date  
= test_xgb$Date,  
# replace with your date column name  
Actual = y_test,  
Predicted = y_pred  
)
colnames(results) <- c("Date", "Actual", "Predicted")  
rmse <- sqrt(mean((y_pred - y_test)^2))  
mape <- mean(abs((y_pred - y_test) / y_test)) * 100  
diag <- rbind(rmse, mape)  
rownames(diag) <- c("RMSE", "MAPE")  
colnames(diag) <- ("Values")  
diag %>% pander(caption = "Model Metrics")  
23  
model_train %>%  
refit(data_test) %>%  
forecast(h = 24) %>%  
filter(.model == "sarima3") %>%  
autoplot(data_test, level = NULL)  
n_future <- 24  
future_preds <- numeric(n_future)  
# Correctly extract last row of x_test  
last_vals <- as.numeric(tail(x_test, 1)) # strip names, just get 4 values  
for (i in 1:n_future) {  
d_future <- xgb.DMatrix(data = matrix(last_vals, nrow = 1))  
future_preds[i] <- predict(model_xgb, d_future)  
# Roll lags forward  
new_val  
<- future_preds[i]  
last_vals[1] <- new_val  
# Lag1  
last_vals[2] <- ifelse(i >= 3, future_preds[max(1, i-2)], last_vals[2]) # Lag3  
last_vals[3] <- ifelse(i >= 6, future_preds[max(1, i-5)], last_vals[3]) # Lag6  
last_vals[4] <- ifelse(i >= 12, future_preds[max(1, i-11)], last_vals[4]) # Lag12  
}
# Future dates  
library(zoo)  
library(ggplot2)  
library(lubridate)  
# Convert yearmonth to Date  
library(zoo)  
library(ggplot2)  
library(lubridate)  
# Convert dates to Date class  
test_dates  
<- as.Date(test_xgb$Date)  
future_dates <- seq(max(test_dates) %m+% months(1),  
by = "month",  
length.out = n_future)  
# Build zoo objects — test actual only (no train)  
actual_zoo  
<- zoo(y_test,  
order.by = test_dates)  
forecast_zoo <- zoo(future_preds, order.by = future_dates)  
# Merge  
combined <- merge(Actual = actual_zoo, Forecast = forecast_zoo)  
autoplot(combined, facets = NULL) +  
scale_color_manual(values = c(  
"Actual"  
"Forecast" = "steelblue"  
)) +  
theme(  
legend.position = "none"  
= "black",  
# change this to any color  
# change this to any color  
)
24