Forecasting Closing Stock Prices from 1996 to 2020

Forecasting Closing Stock Prices using

ARIMA, SARIMA, and XGBoost

Bautista, Benedict D.

Table of contents

Introduction

2

Data Pre-processing

2

3

Loading Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Loading the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Filtering Variables of Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Splitting into Train -T e st sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Plotting Train -T e st sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Exploratory Data Analysis

4

5

6

7

8

Monthly Trend of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Seasonality of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Lag Plot for Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Plotting Decomposed Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Model Fitting

8

9

ARIMA and SARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Partial / Autocorrelation Plot of Closing Stock Prices . . . . . . . . . . . . . . . . . . . . . . .

Evaluating Stationarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Partial / Autocorrelation Plot of Differenced Values . . . . . . . . . . . . . . . . . . . . . . . 10

Plotting Close and Differenced Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Training the Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Evaluating Time Series Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Residuals of the Best Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Ljung-Box Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Fitting XGBoost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Creating Lags as Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Splitting Train and Test set for XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Plotting Response Variable for XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

Separating Features and Response Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Training XGBoost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Evaluating XGBoost Model Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Forecasting

17

Forecasting using SARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Forecasting using XGBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Codes Used

20

1

Introduction

The dataset provided by Devarash Raval on kaggle (you can click this for the dataset) , it was created to predict

market recession inspired by assignment notebook in an online course. However, in this paper the dataset

will be utilized to serve as the dataset for time series modelling using Seasonal Autoregressive Intergrated

Moving Average (SARIMA) Model, Autoregressive Intergrated Moving Average (ARIMA) Model, and eXtreme

Gradient Boosting (XGBoost) Model to predict future Closing stock prices, performing in-sample forecasting

and ex-ante forecasting. Furthermore, these models will be compared by its predictive power using several

metrics that will be specified later on this paper.

Data Pre-processing

Loading Libraries

library(fpp3)

library(here)

library(tidyverse)

library(xgboost)

library(pander) # for quarto purposes only

library(patchwork) # for data viz arrangements

library(knitr) # for tidy tables

Loading the Data

Table 1

Preview of the data

Date

Open

High

Low

Close

Adj Close

Volume

1996-01-01

1996-02-01

1996-03-01

1996-04-01

1996-05-01

1996-06-01

1996-07-01

1996-08-01

26.62

29.88

30.25

30.62

31.62

27.25

27.75

30.12

32.88

32

32.12

31.75

29.25

29.5

26.62

29.88

28.25

29.38

29.75

26.88

24.88

26.5

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

20.75

21.37

21.46

21.64

22.26

19.21

19.74

19.12

41074000

46490100

36573500

34821300

27064600

33389300

26284100

20525500

Filtering Variables of Interest

Table 2

Preview of aggregated dataset

Date

Close

1996 Jan

1996 Feb

1996 Mar

1996 Apr

1996 May

1996 Jun

1996 Jul

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

1996 Aug

2

Splitting into Train-Test sets

Table 3

Preview of training dataset

Date

Close

1996 Jan

1996 Feb

1996 Mar

1996 Apr

1996 May

1996 Jun

1996 Jul

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

1996 Aug

Table 4

Preview of testing dataset

Date

Close

2015 Jan

2015 Feb

2015 Mar

2015 Apr

2015 May

2015 Jun

2015 Jul

12.78

13.02

10.96

13.02

11.86

10.66

7.06

2015 Aug

6.95

Plotting Train-Test sets

Figure 1

Splitting Train-Test sets

50

40

30

20

10

2000 Jan

2010 Jan

2020 Jan

Date [1M]

3

Both the training and test data sets were graphically represented and are displayed in Figure 1 above,

demonstrating how the data were divided into two sets for the purpose of building the prediction model.

The training set, which consisted of the past data records between January 1996 and December 2014, was

used for the construction of the prediction model, while the test data set, which consisted of the most recent

data values between January 2015 and November 2020, was used to evaluate the accuracy of predictions.

Exploratory Data Analysis

This section aims to examine the underlying structure of the dataset through a comprehensive exploration

of its key characteristics and visual representations. By systematically analyzing the behavior of the time

series, this part seeks to provide a clearer understanding of its overall pattern and the range within which

the data is expected to vary. To achieve this, several analytical techniques will be utilized, including time

series decomposition to separate trend, seasonality, and irregular components, as well as seasonality checks

to identify recurring patterns over time. In addition, lag plots will be used to assess potential relationships

between observations at different time intervals, while Autocorrelation Function (ACF) and Partial Autocor-

relation Function (PACF) plots will help in identifying the presence of correlation structures within the series.

Through these methods, the section intends to uncover meaningful insights into the data’s behavior, which

will serve as a foundation for further modeling and analysis.

Monthly Trend of Closing Stock Prices

Figure 2

Graph of Closing Stock Prices

50

40

30

20

10

2000 Jan

2010 Jan

Date [1M]

2020 Jan

Figure 2 illustrates the monthly trend of closing stock prices spanning from 1996 to 2020. The data indicate

a steady upward trend from 1996 until 2010. This is followed by a noticeable decline between 2011 and

the latter part of 2015, which may be attributed to economic issues during that period. Furthermore, from

2016 to 2020, the series demonstrates a recovery phase, with closing stock prices starting to have an upward

trend again.

4

Seasonality of Closing Stock Prices

Figure 3

Seasonality Plot for Closing Stock Prices

2010

2011

50

40

30

20

10

2007

2009

2008

2012

2006

1996

2005

2020

2004

2003

2014

2015

Jan

Feb Mar

Apr

May

Jun

Jul

Aug

Sept

Oct

Nov

Dec

Date

Jan

Feb

Mar

Jun

Dec

50

40

30

20

10

Date

From Figure 3 a seasonality plot grouped by year, with the months shown along the x-axis. A closer examina-

tion reveals no consistent or repeating pattern across different years, indicating that strong seasonality is

not present in the data. Although the subseries plot shows some limited seasonal effects may appear within

specific ranges, these are not sustained throughout the series. Overall, the data exhibit noticeable trends

rather than clear seasonal patterns.

5

Lag Plot for Closing Stock Prices

Figure 4

Lag Plot (k = 12)

lag 1

lag 2

lag 3

lag 4

50

40

30

20

10

season

Jan

Feb

Mar

Apr

May

Jun

Jul

lag 5

lag 6

lag 7

lag 8

50

40

30

20

10

Aug

Sept

Oct

Nov

Dec

lag 9

lag 10

lag 11

lag 12

50

40

30

20

10

10 20 30 40 50

lag(Close, n)

The twelve lag plots presented in Figure 4 exhibit a strong linear relationship across all twelve lags. This

pattern typically indicates a high degree of autocorrelation, suggesting that the closing stock prices are

influenced by their previous values. It is evident that from lag 1 through lag 12, a linear trend remains observ-

able; however, the pattern becomes slightly more dispersed as the lag increases. This gradual weakening of

linearity implies that the predictive influence of past closing prices slightly declines as the time gap becomes

larger.

6

Decomposition

Table 5

Decomposition Using STL

.model

Date

Close

trend

season_year

remainder

season_adjust

stl

1996 Jan

1996 Feb

1996 Mar

1996 Apr

1996 May

1996 Jun

1996 Jul

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

30.42

30.09

29.75

29.42

29.09

28.75

28.41

28.05

-0.2178

0.2521

-0.2142

0.7301

1.035

-0.05651

-0.8301

-0.9024

-0.8257

-0.08838

0.8351

0.4731

1.379

-1.568

0.2922

-0.1452

29.59

30

30.59

29.89

30.46

27.18

28.71

27.9

1996 Aug

Figure 5

Decomposed Closing Stock Prices

STL decomposition

Close = trend + season_year + remainder

50

40

30

20

10

50

40

30

20

10

2

1

0

−1

−2

10

5

0

−5

−10

2000 Jan

2010 Jan

Date

2020 Jan

Figure 5 presents the decomposed time series of the closing stock prices. Decomposition was first employed

to examine the behavior of the series by isolating its individual components. Upon decomposition, it becomes

evident that the seasonal component is not constant over time, as it varies from year to year. This is par-

ticularly noticeable when the trend and remainder components are removed, revealing fluctuations in the

seasonal pattern. In contrast, when the seasonal and remainder components are removed, the underlying

trend and cyclical movements of the data become more apparent. Given this non-constant seasonality, STL

decomposition is therefore the more appropriate method for analyzing the time series.

7

Plotting Decomposed Data

Figure 6

Line Graphs of each Component

Trend VS Actual

50

Seasonally Adjusted VS Actual

50

40

30

20

10

40

30

20

10

2000 Jan

2010 Jan

Date [1M]

2020 Jan

2000 Jan

2010 Jan

2020 Jan

Date [1M]

Decomposed Components

50

40

30

20

10

2000 Jan

2010 Jan

Date [1M]

2020 Jan

In Figure 6, a comparison between the seasonally trend adjusted data vs the actual series, seasonally adjusted

data vs the actual series reveals that once periodic effects are removed, the price action aligns closely with

the smoothed trend line. This high degree of alignment confirms that the primary driver of the asset’s value

is its long-term growth trajectory rather than cyclical seasonal variations.

Model Fitting

This section is structured into two main components: the application of traditional Seasonal Autoregressive

Integrated Moving Average (S/ARIMA) models and the implementation of a machine learning approach,

specifically eXtreme Gradient Boosting (XGBoost), for forecasting. Each modeling approach will be sys-

tematically developed and assessed to compare their performance in capturing the dynamics of the time

series.

8

To evaluate the effectiveness of the models, several performance metrics will be employed, including Mean

Absolute Percentage Error (MAPE) and Root Mean Squared Error (RMSE) for measuring forecast accuracy,

as well as Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for model selection

and goodness-of-fit. The dataset will be divided into training and testing subsets, where the training set

will be used to fit and calibrate the models, while the testing set will be utilized to generate forecasts and

assess predictive performance through out-of-sample evaluation. This approach ensures a more reliable

comparison of the models’ generalization capabilities in forecasting unseen data.

ARIMA and SARIMA

Partial / Autocorrelation Plot of Closing Stock Prices

Figure 7

PACF and ACF Plot

1.00

0.75

0.50

0.25

0.00

0

6

12

18

24

lag [1M]

1.00

0.75

0.50

0.25

0.00

0

6

12

18

24

lag [1M]

The Autocorrelation Function (ACF) plot presented in Figure 7 displays significant spikes that follow a gradual,

decaying pattern. The absence of a “scalloping” or wave-like structure further confirms that the series lacks a

distinct seasonal component. This persistent, slow decay is a primary diagnostic indicator of non-stationarity,

reflecting a strong trend component within the closing stock prices. Conversely the Partial Autocorrelation

Function (PACF) plot presented a significant spike at lag 1 and lag 12 and nothing further beyond lag 12.

This could indicate that there is a relationship between the present closing stock prices to the value of the

9

prices 12 months ago. Collectively, these plots confirm that the series is non-stationary and requires further

differencing to achieve stationarity before modelling.

Evaluating Stationarity

Table 6

Unitroot KPSS Test

.model

kpss_stat

kpss_pvalue

arima

1.676

0.01

sarima1

sarima2

sarima3

KPSS test was employed to evaluate the stationarity of the time series. As it turns out, the series is not

stationary for the four time series models. This statistic affirms the presence of significant trend component.

Partial / Autocorrelation Plot of Differenced Values

10

Figure 8

Differencing: PACF and ACF Plot

0.1

0.0

−0.1

0

6

12

18

24

lag [1M]

0.1

0.0

−0.1

0

6

12

18

24

lag [1M]

11

The ACF plot from Figure 7 of the original series exhibits a slow and gradual decay, with significant correlations

across many lags, indicating the presence of non-stationarity and a strong trend component. Similarly,

the PACF shows a dominant spike at lag 1, which is typical of non-stationary behavior rather than a true

autoregressive structure. After applying first-order differencing, the ACF and PACF plots from Figure 8 show a

substantial change, with most autocorrelations falling within the confidence bounds and no evident long-term

persistence. This suggests that the differenced series is stationary, making it suitable for ARIMA modeling.

The absence of a clear cutoff in the PACF and only minor spikes in the ACF indicate that low-order AR or MA

components may be sufficient to capture the remaining dependence structure.

Plotting Close and Differenced Values

Figure 9

Comparing Differenced Values and Original Series

50

40

30

20

10

2000 Jan

2010 Jan

Date [1M]

2020 Jan

10

5

0

−5

−10

−15

2000 Jan

2010 Jan

Date [1M]

2020 Jan

Upon differencing, as shown in Figure 9, the most immediate observation is the stabilization of the variance

across time. In contrast to the original series, which exhibits pronounced upward and downward trends,

the differenced series fluctuates around a relatively constant mean close to zero. This indicates that the

trend component present in the original data has been effectively removed through first-order differencing.

Additionally, the original time series displays long-term movements, specifically during those interval with

sharp decline and rapid increase suggesting non-stationary behavior.

12

Training the Model

Table 7

Fitted Models

Model

Order

sarima1

sarima2

sarima3

arima

<ARIMA(0,1,1)(0,1,0)[12]>

<ARIMA(0,0,1)(0,1,1)[12]>

<ARIMA(0,1,1)(0,1,1)[12]>

<ARIMA(0,1,2)>

The model selection process involved fitting multiple candidate models to the training dataset in order to

identify the most parsimonious yet accurate representation of the time series. As presented in Table 6, four

models were considered, including a seasonal autoregressive integrated moving average (SARIMA) model

with drift, specified as ARIMA(0,1,1)(0,1,0)[12], two other SARIMA variants, and a baseline non-seasonal

ARIMA(0,1,2) model. Model evaluation was primarily based on the Akaike Information Criterion (AIC),

corrected Akaike Information Criterion (AICc), and Bayesian Information Criterion (BIC), where lower values

indicate a better balance between goodness-of-fit and model complexity.

Evaluating Time Series Model Performance

Table 8

Model Metrics

Model

Variance

AIC

BIC

MAPE

RMSE

sarima1

sarima2

sarima3

arima

18.64

34.98

9.029

8.923

1248

1392

1128

1150

1255

1402

1138

1160

11.71

15.77

7.923

8.255

4.183

5.731

2.905

2.968

As shown in Table 8, among the four fitted SARIMA models, SARIMA 3 exhibited the best overall performance

based on both information criteria and forecast accuracy measures. It obtained the lowest Akaike Information

Criterion (AIC = 1128) and Bayesian Information Criterion (BIC = 1138), indicating a comparatively better

balance between model fit and complexity. These criteria assess how well each model fits the training

data while penalizing unnecessary complexity. In addition, SARIMA 3 also yielded the lowest forecasting

error, with a mean absolute percentage error (MAPE) of 7.923% and a root mean square error (RMSE) of

2.905. These results suggest that SARIMA 3 not only provides the best fit to the training dataset but also

demonstrates strong predictive performance for future values.

13

Residuals of the Best Model

Figure 10

Residual Plots for SARIMA 3

10

5

0

−5

−10

2000 Jan

2005 Jan

Date

2010 Jan

2015 Jan

60

40

20

0

0.2

0.1

0.0

−0.1

0

6

12

18

24

−15

−10

−5

0

5

10

lag [1M]

.resid

Figure 10 presents the residual diagnostics of the selected SARIMA(3) model. The residuals are approxi-

mately centered around zero with relatively constant variance, and the histogram suggests approximate

normality. However, the autocorrelation function (ACF) of the residuals shows a significant spike at lag 12,

indicating remaining seasonal dependence not fully captured by the model. This suggests that while the

model adequately describes the general structure of the series, some seasonal dynamics remain unmodeled,

implying that further refinement of the seasonal component may improve model performance.

Ljung-Box Test

Table 9

Ljung-Box Test for SARIMA 3 Model

.model

lb_stat

lb_pvalue

arima

8.094

11.05

430.5

13.78

0.6196

0.3538

0

sarima1

sarima2

sarima3

0.1834

The Ljung–Box test was conducted to further assess the presence of autocorrelation in the residuals, as

suggested by the previous diagnostic plot. As shown in Table 9, the SARIMA 3 model yielded a test statistic

of 13.777 with a p-value of 0.183. Since the p-value exceeds the 0.05 significance level, there is insufficient

evidence to reject the null hypothesis of no autocorrelation. This indicates that the residuals do not exhibit

significant autocorrelation, thereby supporting the adequacy of the model and contradicting the initial

visual suggestion of autocorrelation observed in Figure 10. Therefore we do not reject the null hypothesis,

and therefore say that the claim of autocorrelated residuals do not have sufficient evidence to prove its

significance.

14

Fitting XGBoost Model

Creating Lags as Features

Table 10

Preview of Lags Created for XGBoost

Date

Close

Lag1

Lag3

Lag6

Lag12

1997 Jan

1997 Feb

1997 Mar

1997 Apr

1997 May

1997 Jun

1997 Jul

26.88

28.25

23.75

22.38

25.25

21.88

22.81

22.75

28.75

26.88

28.25

23.75

22.38

25.25

21.88

22.81

26.12

30

27.88

27

25.12

26.12

30

28.75

26.88

28.25

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

28.75

26.88

28.25

23.75

22.38

25.25

1997 Aug

In Table 10, lagged variables were constructed and used as input features for the XGBoost model to incorporate

temporal information from the time series data. Since XGBoost does not inherently account for sequential

order, these lag features transform the dataset into a supervised learning framework. Lag 1 represents the

previous month’s closing stock price and captures short-term price movements. Lag 3 reflects the previous

quarter’s closing price, allowing the model to learn medium-term trends. Lag 6 corresponds to the previous

semester’s closing price, while lag 12 represents the previous year’s closing price to capture longer-term

and seasonal patterns. By including these lagged features, the model is able to learn relationships across

different time scales, improving its ability to predict future stock prices.

Splitting Train and Test set for XGBoost

Table 11

Preview of Train Dataset

Date

Close

Lag1

Lag3

Lag6

Lag12

1997 Jan

1997 Feb

1997 Mar

1997 Apr

1997 May

26.88

28.25

23.75

22.38

25.25

28.75

26.88

28.25

23.75

22.38

26.12

30

28.75

26.88

28.25

27.88

27

25.12

26.12

30

29.38

30.25

30.38

30.62

31.5

Table 12

Preview of Test Dataset

Date

Close

Lag1

Lag3

Lag6

Lag12

2015 Jan

2015 Feb

2015 Mar

2015 Apr

2015 May

12.78

13.02

10.96

13.02

11.86

10.75

12.78

13.02

10.96

13.02

11.87

11.89

10.75

12.78

13.02

18.08

18.39

14.66

11.87

11.89

19.28

20.38

17.83

17.47

16.11

Plotting Response Variable for XGBoost

Both the training and test datasets are visually presented in Figure 11, illustrating how the data were

partitioned into two distinct subsets for model development and evaluation. The training set includes

historical observations from January 1996 to December 2014 and was used to build the prediction model. In

contrast, the test set comprises the most recent observations from January 2015 to November 2020. This

subset was reserved to assess the model’s predictive performance on unseen data. The split ensures that the

model is trained on past information while its accuracy is validated using more recent values.

15

Figure 11

Splitting Train-Test for XGBoost

50

40

30

20

10

2000 Jan

2005 Jan

2010 Jan

Date [1M]

2015 Jan

2020 Jan

Separating Features and Response Variable

Table 13

Preview of Features Matrix

Lag1

Lag3

Lag6

Lag12

Date

28.75

26.88

28.25

23.75

22.38

25.25

21.88

22.81

26.12

30

27.88

27

25.12

26.12

30

28.75

26.88

28.25

29.38

30.25

30.38

30.62

31.5

27.12

27.88

27

324

325

326

327

328

329

330

331

28.75

26.88

28.25

23.75

22.38

25.25

Table 14

Preview of Response Variable

26.88

28.25

23.75

22.38

25.25

21.88

22.81

22.75

In the tables above, the dataset was separated into feature matrices and a response variable to meet the

input requirements of the XGBoost algorithm. The predictor variables (Lag1, Lag3, Lag6, and Lag12) were

converted into a numeric matrix to serve as the feature set (X) for both training and testing. Meanwhile, the

16

Close price was extracted as a numeric vector to act as the target variable (Y). This separation ensures a clear

distinction between independent variables and the dependent variable in a supervised learning setup. The

same process was applied to both training and test sets to maintain consistency in model evaluation. Overall,

this structure allows XGBoost to learn the relationship between past values and the current stock price.

Training XGBoost Model

dtrain <- xgb.DMatrix(data = x_train, label = y_train)

dtest <- xgb.DMatrix(data = x_test)

params <- list(

objective

eta

= "reg:squarederror",

= 0.1,

= 8

max_depth

)

set.seed(3274)

model_xgb <- xgb.train(

params

data

= params,

= dtrain,

nrounds

watchlist

= 200,

= list(train = dtrain, test = dtest),

early_stopping_rounds = 50,

print_every_n

= 10)

The training and test datasets were converted into XGBoost’s DMatrix format to optimize computation and

model efficiency. The training set (dtrain) contains the feature matrix x_train and target y_train, while the

test set (dtest) contains x_test and y_test for model evaluation. Model parameters were specified using a

regression objective (reg:squarederror), with a learning rate (eta = 0.1) controlling update magnitude and

max_depth = 8 limiting tree complexity to reduce overfitting. The model was trained over 200 boosting

rounds while monitoring both training and test performance through a watchlist. Early stopping was applied

with a tolerance of 50 rounds to prevent unnecessary training once performance stopped improving. A fixed

seed was used to ensure reproducibility of results.

Evaluating XGBoost Model Performance

Table 15

Model Metrics

Values

RMSE

MAPE

4.22

22.92

The model diagnostics for XGBoost yielded an RMSE of 4.22 and a MAPE of 22.92, indicating the average

magnitude of prediction error. Unlike traditional statistical models, AIC and BIC cannot be computed for

XGBoost because it is not based on a likelihood or link function but on an ensemble of decision trees. As such,

its performance is evaluated using predictive accuracy metrics rather than goodness-of-fit criteria.

When compared to the SARIMA model, which achieved a lower MAPE, the results suggest that SARIMA

provides more accurate forecasts for the dataset. This indicates that the time series structure, such as

autocorrelation and seasonality, is better captured by SARIMA. Overall, while XGBoost offers flexibility,

SARIMA demonstrates superior predictive performance in this study.

Forecasting

In this section, both the SARIMA and XGBoost models are used to perform out-of-sample forecasting on the test

dataset to evaluate their predictive performance on unseen data. Although the SARIMA model demonstrated

17

better accuracy based on the evaluation metrics, the XGBoost model is still included for comparative analysis.

This allows for a more comprehensive assessment of both statistical and machine learning approaches within

the study. Including both models also enhances reproducibility by showing how different methods perform

under the same conditions. Furthermore, it supports the overall narrative by highlighting the strengths and

limitations of each approach. SARIMA effectively captures the inherent time series structure, while XGBoost

relies on feature-based learning. Presenting both results provides a balanced comparison of methodologies.

Ultimately, this approach strengthens the validity and depth of the analysis.

Forecasting using SARIMA

Figure 12

SARIMA Forecasted Values

30

20

10

2016 Jan

2018 Jan

2020 Jan

2022 Jan

Date

The out-of-sample forecast results presented in Figure 12 a indicate a clear upward movement in the predicted

values, accompanied by a pronounced seasonal pattern starting from every first month of the year. This

suggests that the SARIMA model successfully captured both the underlying trend and the recurring seasonal

fluctuations present in the historical time series. Based on the estimated parameters from the observed data,

the forecasted values exhibit a gradual increase from approximately 26 to 30 units over the forecast horizon.

This upward trajectory implies a sustained growth pattern rather than random variation, reinforcing the

presence of a systematic trend component in the series. The seasonal behavior further indicates that the

data follows a consistent repeating cycle over time. Overall, the forecast reflects both trend and seasonality,

suggesting that future values are expected to continue increasing within a stable seasonal structure.

18

Forecasting using XGBoost

Figure 13

XGBoost Forecasted Values

30

25

20

15

10

2016

2018

2020

2022

Index

The out-of-sample forecast presented in Figure 13 indicates a generally stable downward movement in the

predicted closing stock prices generated by the XGBoost model. The forecasted values show a gradual decline,

ranging approximately from 27 to 24 units, suggesting a slight weakening trend over the forecast horizon.

Unlike the SARIMA results, the XGBoost forecast does not exhibit a clear seasonal pattern, implying that

the model did not capture or was not explicitly provided with strong seasonal structure in the data. After

the initial decline, the values appear to stabilize around 25 units beyond 2022, indicating that the model

converges toward a relatively steady long-term level. This behavior reflects the model’s reliance on lag-based

learning rather than explicit decomposition of trend and seasonality. Overall, the forecast suggests a mild

downward adjustment followed by stabilization in future stock prices.

19

Codes Used

library(fpp3)

library(here)

library(tidyverse)

library(xgboost)

data <- read_csv(here("GOLD.csv"))

Close <- data %>%

select(Date, Close)

Close <- Close %>%

slice(1:298) %>%

bind_rows(

Close %>%

slice(299:300) %>%

summarise(

Date = min(Date),

Close = mean(Close, na.rm = TRUE)

)

Close <- Close %>%

mutate(Date = yearmonth(Date)) %>%

as_tsibble(index = Date)

Close %>% autoplot() ## plotting the whole dataset

Close

data_train <- Close %>%

filter_index("1996 Jan" ~ "2015 Jan")

data_test <- Close %>%

filter_index("2015 Jan" ~ "2020 Jan")

# eda

Close %>% gg_season(labels = "right", labels_repel = T)

Close %>% gg_subseries()

Close %>% gg_lag(geom = "point", lag = c(1:12))

decomp <- Close %>%

model(

stl = STL(Close)

)

components(decomp) %>% autoplot()

components(decomp) %>%

as_tsibble() %>%

autoplot((Close), colour = "gray") +

geom_line(aes(y = trend), colour = "#E63946") #trend vs actual

components(decomp) %>%

as_tsibble() %>%

autoplot((Close), colour = "gray") +

geom_line(aes(y = season_adjust), colour = "#2A9D8F") ## season_adj vs actual

data_model %>%

ACF(Close) %>%

20

autoplot()

Close %>%

PACF(Close) %>%

autoplot()

# modelling

data_test %>% autoplot() +

geom_line(data = data_train,

aes(x =Date, y = Close),

colour = "darkgreen")

data_train1 <- data_train %>%

mutate(Close = log(Close))

model_train <- data_train %>%

model(

sarima1 = ARIMA(Close ~ pdq(0,1,0) + PDQ(0,0,1)),

sarima2 = ARIMA(Close ~ pdq(0,1,1) + PDQ(0,0,1)),

sarima3 = ARIMA(Close ~ pdq(0,1,1) + PDQ(0,1,1)),

arima = ARIMA(Close)

)

glance(model_train) %>% select(.model, sigma2, AIC, BIC)

resids <- model_train %>% augment()

resids %>% features(Close, unitroot_kpss)

resids %>% features(.innov, ljung_box, lag = 12)

fc <- model_train %>%

forecast(h = 60)

fit1 <- model_train %>% pivot_longer(

everything(),

names_to = "Model",

values_to = "Order")

fit1

glance(model_train) %>%

arrange(AICc) %>%

select(.model:BIC)

accuracy(fc, data_test[c(1,2)])

model_train %>% report() # sarima3 for insample sarima1 for out of sample

# sarima1 and sarima3 based on evaluation

model_train %>% select(sarima1) %>% report()

model_train %>% select(sarima1) %>% gg_tsresiduals()

model_train %>% select(sarima3) %>% report()

model_train %>% select(sarima3) %>% gg_tsresiduals()

# forecasting

# insample

model_train %>%

forecast(h = 60) %>%

filter(.model == "sarima3") %>%

autoplot(data_train, level = c(90, 95)) +

geom_line(data = data_test, aes(x = Date, y = Close))

21

# out of sample forecast

model_train %>%

refit(data_test) %>%

forecast(h = 24) %>%

filter(.model == "sarima3") %>%

autoplot(data_test, level = c(90, 95))

# ex-ante forecasting

model_train %>%

refit(Close) %>%

forecast(h = 24) %>%

filter(.model == "sarima1") %>%

autoplot(Close, level = c(90, 95))

model_train %>%

refit(Close) %>%

forecast(h = 24) %>%

filter(.model == "sarima3") %>%

autoplot(Close, level = c(90, 95))

data_XGB <- Close %>%

mutate(

Lag1 = lag(Close, 1),

Lag3 = lag(Close, 3),

Lag6 = lag(Close, 6),

Lag12 = lag(Close, 12)

) %>% drop_na()

data_XGB %>%

head(n = 8) %>%

pander(caption = "Preview of Lags Created for XGBoost")

data_XGB <- Close %>%

mutate(

Lag1 = lag(Close, 1),

Lag3 = lag(Close, 3),

Lag6 = lag(Close, 6),

Lag12 = lag(Close, 12)

) %>% drop_na()

data_XGB %>%

head(n = 8) %>%

pander(caption = "Preview of Lags Created for XGBoost")

n_test <- nrow(data_test)

train_xgb <- data_XGB %>% head(-n_test)

test_xgb <- data_XGB %>% tail(n_test)

train_xgb %>%

head(n = 5) %>%

pander(caption = "Preview of Train Dataset")

test_xgb %>%

head(n = 5) %>%

pander(caption = "Preview of Test Dataset")

data_XGB %>%

autoplot(Close) +

geom_line(data = test_xgb, aes(y = Close), colour = "red")

22

x_train <- train_xgb %>%

select(Lag1, Lag3, Lag6, Lag12) %>%

mutate(across(everything(), as.numeric)) %>%

as.matrix()

y_train <- train_xgb %>%

pull(Close) %>%

as.numeric()

x_test <- test_xgb %>%

select(Lag1, Lag3, Lag6, Lag12) %>%

mutate(across(everything(), as.numeric)) %>%

as.matrix()

y_test <- test_xgb %>%

pull(Close) %>% as.numeric()

x_train %>%

head(n = 8) %>%

pander(caption = "Preview of Features Matrix")

y_train %>% as.matrix() %>%

head(n = 8) %>%

pander(caption = "Preview of Response Variable")

dtrain <- xgb.DMatrix(data = x_train, label = y_train)

dtest <- xgb.DMatrix(data = x_test, label = y_test)

params <- list(

objective

eta

= "reg:squarederror",

= 0.1,

= 8,

max_depth

nround

= 100

)

set.seed(123)

model_xgb <- xgb.train(

params

data

= params,

= dtrain,

= 200)

nrounds

y_pred <- predict(model_xgb, dtest)

results <- tibble(

Date

= test_xgb$Date,

# replace with your date column name

Actual = y_test,

Predicted = y_pred

)

colnames(results) <- c("Date", "Actual", "Predicted")

rmse <- sqrt(mean((y_pred - y_test)^2))

mape <- mean(abs((y_pred - y_test) / y_test)) * 100

diag <- rbind(rmse, mape)

rownames(diag) <- c("RMSE", "MAPE")

colnames(diag) <- ("Values")

diag %>% pander(caption = "Model Metrics")

23

model_train %>%

refit(data_test) %>%

forecast(h = 24) %>%

filter(.model == "sarima3") %>%

autoplot(data_test, level = NULL)

n_future <- 24

future_preds <- numeric(n_future)

# Correctly extract last row of x_test

last_vals <- as.numeric(tail(x_test, 1)) # strip names, just get 4 values

for (i in 1:n_future) {

d_future <- xgb.DMatrix(data = matrix(last_vals, nrow = 1))

future_preds[i] <- predict(model_xgb, d_future)

# Roll lags forward

new_val

<- future_preds[i]

last_vals[1] <- new_val

# Lag1

last_vals[2] <- ifelse(i >= 3, future_preds[max(1, i-2)], last_vals[2]) # Lag3

last_vals[3] <- ifelse(i >= 6, future_preds[max(1, i-5)], last_vals[3]) # Lag6

last_vals[4] <- ifelse(i >= 12, future_preds[max(1, i-11)], last_vals[4]) # Lag12

}

# Future dates

library(zoo)

library(ggplot2)

library(lubridate)

# Convert yearmonth to Date

library(zoo)

library(ggplot2)

library(lubridate)

# Convert dates to Date class

test_dates

<- as.Date(test_xgb$Date)

future_dates <- seq(max(test_dates) %m+% months(1),

by = "month",

length.out = n_future)

# Build zoo objects — test actual only (no train)

actual_zoo

<- zoo(y_test,

order.by = test_dates)

forecast_zoo <- zoo(future_preds, order.by = future_dates)

# Merge

combined <- merge(Actual = actual_zoo, Forecast = forecast_zoo)

autoplot(combined, facets = NULL) +

scale_color_manual(values = c(

"Actual"

"Forecast" = "steelblue"

)) +

theme(

legend.position = "none"

= "black",

# change this to any color

)

24