About me

Hassan OUKHOUYA
E-mail: hassan_oukhouya@um5.ac.ma
ResearchGate: https://www.researchgate.net/profile/Hassan-Oukhouya
LinkedIn: https://www.linkedin.com/in/hassan-oukhouya-3901b816b/
ORCID iD: https://orcid.org/0000-0002-5058-2008
Upwork: https://www.upwork.com/services/product/time-series-analysis-with-python-or-r-studio-1449669530698514432?ref=project_share
Fiverr: https://www.fiverr.com/hassanoukhouya?up_rollout=true

Introduction

The R Markdown practical report that I will draft will focus on the analysis, modeling, and forecasting of time series of the Morocco Stock Index 20 (MSI 20) by employing various methodologies. The primary objective is to explore and compare the performance of classical methods, specifically ARIMA with the Box-Jenkins methodology, juxtaposed against state-of-the-art techniques using Machine Learning approaches such as Support Vector Regression (SVR) and Multi-Layer Perceptron (MLP).

The Box-Jenkins methodology for ARIMA modeling will be scrutinized in detail, emphasizing its distinct steps, including identification, estimation, and model validation. The approach will underscore the significance of differencing, the selection of orders (p, d, q), and the verification of residuals to ensure the validity of the ARIMA model.

This comprehensive approach will provide a thorough overview of the different methodologies used in time series modeling, offering a clear understanding of the advantages and limitations of each method. Furthermore, it will establish connections between traditional methods and advanced approaches in the analysis and forecasting of time series closing price of the MSI 20.

Import the necessary libraries

The packages used in this series of studies are listed below:

Data selection and cleaning

We collected data from the following three sites :

Import data

Read data file in R

##         Date   Open  Close   High    Low
## 1 2021-01-04 924.78 928.96 928.97 922.30
## 2 2021-01-05 925.99 919.52 926.79 919.52
## 3 2021-01-06 919.52 915.67 921.57 915.01
## 4 2021-01-07 915.67 921.52 924.79 915.67
## 5 2021-01-08 921.52 919.60 924.31 918.90
## 6 2021-01-12 919.60 923.89 923.89 916.78

Data cleaning

## 'data.frame':    590 obs. of  5 variables:
##  $ Date : Date, format: "2021-01-04" "2021-01-05" ...
##  $ Open : num  925 926 920 916 922 ...
##  $ Close: num  929 920 916 922 920 ...
##  $ High : num  929 927 922 925 924 ...
##  $ Low  : num  922 920 915 916 919 ...

##         Date   Open  Close   High    Low
## 1 2021-01-04 924.78 928.96 928.97 922.30
## 2 2021-01-05 925.99 919.52 926.79 919.52
## 3 2021-01-06 919.52 915.67 921.57 915.01
## 4 2021-01-07 915.67 921.52 924.79 915.67
## 5 2021-01-08 921.52 919.60 924.31 918.90
## 6 2021-01-12 919.60 923.89 923.89 916.78

##           Date   Open  Close   High    Low
## 585 2023-04-27 838.25 840.12 843.95 838.25
## 586 2023-04-28 840.12 848.13 848.13 840.12
## 587 2023-05-02 848.13 840.54 852.01 840.54
## 588 2023-05-03 840.54 838.55 841.44 834.71
## 589 2023-05-04 838.55 837.13 838.55 831.77
## 590 2023-05-05 837.13 835.44 840.93 833.74

## [1] 590   5

## [1] 0

The size of the data available is 5 variables (Date, Open, Close, High, Low) with 590 observations representing the years between 04/01/2021 and 05/05/2023. We can see that there are no missing values in the data.

Data analysis, exploration and visualization

Descriptive statistics

We present the descriptive statistics of the daily MSI 20 :

##       nbr.val nbr.null nbr.na      min      max  range        sum    median
## Date      590        0      0 18631.00 19482.00 851.00 11243991.0 19059.500
## Open      590        0      0   775.38  1140.69 365.31   575589.3   982.695
## Close     590        0      0   775.38  1140.69 365.31   575563.4   982.695
## High      590        0      0   797.01  1142.56 345.55   578174.3   985.980
## Low       590        0      0   775.38  1135.86 360.48   573290.4   980.045
##             mean   SE.mean CI.mean.0.95       var   std.dev   coef.var
## Date  19057.6119 10.157295    19.948925 60870.679 246.71984 0.01294600
## Open    975.5752  3.543246     6.958935  7407.210  86.06515 0.08821991
## Close   975.5312  3.550503     6.973188  7437.583  86.24142 0.08840458
## High    979.9565  3.512745     6.899031  7280.233  85.32428 0.08706946
## Low     971.6786  3.552519     6.977147  7446.033  86.29040 0.08880549

This R code portion aims to convert the closing prices of a data set, stored in the dailyMSI20 variable, into a Close_Price time series. This time series is organized by date:

##              [,1]
## 2021-01-04 928.96
## 2021-01-05 919.52
## 2021-01-06 915.67
## 2021-01-07 921.52
## 2021-01-08 919.60
## 2021-01-12 923.89

##              [,1]
## 2023-04-27 840.12
## 2023-04-28 848.13
## 2023-05-02 840.54
## 2023-05-03 838.55
## 2023-05-04 837.13
## 2023-05-05 835.44

Visualize the time series

Plot the closing price of the MSI 20

Here we see the corresponding chart, as produced by chartSeries in the quantmod package.

The time series shows notable peaks and fluctuations, indicating significant variations in value. The volatility of the series also shows variations over time. In addition, there is a general upward trend in closing prices, with the exception of a downward trend from February 2022. No obvious seasonality is observed, indicating the absence of repetitive patterns in the series. The series average appears to be non-constant, suggesting a changing trend over time. The graph indicates that the series is non-stationary, and to confirm its stationarity, further analysis can be carried out using methods such as the Augmented Dickey-Fuller (ADF) test or by examining the autocorrelation function (ACF) and partial autocorrelation function (PACF).

Preprocessing

Stationarity and differentiation of time series

Checking stationarity

The initial step in modeling time series data is to check the stationarity of the series. If stationarity is not present, it is important to transform the non-stationary time series into a stationary one. This step is essential, as many statistical and economic techniques, such as ARIMA and GARCH, are based on the stationarity assumption and can only be applied to stationary time series.

Non-stationary time series exhibit irregular and unpredictable behavior, while stationary time series exhibit mean-reverting characteristics, fluctuating around a constant mean with constant variance. By ensuring stationarity, we can exploit the properties and assumptions associated with stationary time series to effectively utilize various modeling techniques.

Stationarity test of the closing price series using ACF, PACF

Correlograms of the MSI 20 closing price log are shown below for ACF and PACF. In general, the magnitude of autocorrelation should be between 1/4 and 1/3 of the total number of observations.

Analysis of the ACF and PACF graphs of the closing price series reveals some important trends. The ACF chart highlights significant autocorrelation and potential non-stationarity, indicated by the non-exponential decay of values towards zero. To remedy this in the context of financial time series, a logarithmic transformation is often applied to capture the growth rate and stabilize the variance. This transformation can lead to a series with better stationarity. However, it is crucial to consider the specific characteristics of the data and the analytical requirements before applying this transformation. It is also advisable to assess stationarity in greater depth, in particular by means of the ADF test. In summary, the ACF and PACF graphs signal autocorrelation and non-stationarity in the closing price series, suggesting the usefulness of a logarithmic transformation. However, additional diagnostics, such as the ADF test, are recommended to confirm stationarity and guide subsequent modeling steps.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  Close_Price
## Dickey-Fuller = -1.701, Lag order = 8, p-value = 0.7049
## alternative hypothesis: stationary

On the basis of the results, we are unable to reject the null hypothesis of non-stationarity. Consequently, the data do not provide strong evidence to support the presence of stationarity. This suggests that the Close_Price series may exhibit non-stationary behavior.

\[\text{Returns}_{i} = \frac{\log(\text{price}_{i}) - \log(\text{price}_{i-1})}{\log(\text{price}_{i-1})}, \quad i\in[2:590]\]

In this formula :

$\text{Returns}_i$ represents the variable where the result is stored.
$\log(\text{prix}_i)$ calculates the natural logarithm of the variable price.hist.
$[2:590]$ spécifie que nous excluons le premier élément (indice 1) jusqu’au dernier élément (indice 590) specifies that we exclude the first element (index 1) up to the last element (index 590) in the result, where n is the number of elements in the original sequence. This removes the first line of the result, which corresponds to the difference between the first and second elements and is not a valid return.

##                    [,1]
## 2021-01-05 -0.010213886
## 2021-01-06 -0.004195757
## 2021-01-07  0.006368443
## 2021-01-08 -0.002085688
## 2021-01-12  0.004654224
## 2021-01-13 -0.004338919

##                    [,1]
## 2023-04-27  0.002228353
## 2023-04-28  0.009489187
## 2023-05-02 -0.008989383
## 2023-05-03 -0.002370333
## 2023-05-04 -0.001694835
## 2023-05-05 -0.002020843

## An xts object on 2021-01-05 / 2023-05-05 containing: 
##   Data:    double [589, 1]
##   Index:   Date [589] (TZ: "UTC")

## An xts object on 2021-01-04 / 2023-05-05 containing: 
##   Data:    double [590, 1]
##   Index:   Date [590] (TZ: "UTC")

Modeling using the ARIMA model

ARMA models can be extended to non-stationary series by allowing the data series to be differentiated, resulting in ARIMA models. The $ARIMA(p,d,q)$ model is a general non-seasonal model with three parameters: $p$ is the autoregressive order, $d$ is the degree of differentiation and $q$ is the moving average order. The $ARIMA(p,d,q)$ model is expressed mathematically using lag polynomials as follows:

\[\phi_p(B)(1-B)^dY_t=\theta_q(B)\varepsilon_t,\;\;\Big(1-\sum_{i=1}^p\phi_iB^i\Big)(1-B)^dY_t=\Big(1+\sum_{j=1}^q\theta_jB^j\Big)\varepsilon_t\]

Box-Jenkins approach

The Box-Jenkins (1970) methodology for time series analysis, named after statisticians George Box and Gwilym Jenkins, uses ARIMA models to identify the best fit of a time series model to the past values of a time series. The figure below illustrates the four iterative stages of modeling using this approach.

Steps in the Box-Jenkins iterative approach

Model identification: Ensure that the variables are stationary, identify seasonality in the series, and identify which autoregressive or moving-average component should be implemented in the model using the ACF and PACF plots of the series.

Model estimation: Using computational algorithms, find the coefficients that best fit your chosen ARIMA model. Maximum likelihood estimation (MLE) or non-linear least squares estimation are the most common methods.

Model verification: By checking whether the estimated model satisfies the criteria of a stationary univariate process. The residuals must be independent of each other and constant in mean and variance over time; visualizing the ACF and PACF of the residuals can help identify a poor specification. If the estimate is not good enough, we’ll have to go back to the first step and try again. In addition, the estimated model needs to be compared with different ARIMA models to determine which one is suitable for the data. The two most common model selection criteria are the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC), both defined as follows:

\[AIC=2m-2\ln(\hat{L})\] \[BIC=\ln(n)m-2\ln(\hat{L})\]

where $\hat{L}$ represents the maximum value of the likelihood function for the model, $m$ is the number of parameters estimated by the model, and $n$ is the number of observations (sample size). In practice, $AIC$ and $BIC$ are used in conjunction with the classic criterion: mean square error ($MSE$).

Forecasting: We can use the selected ARIMA model for forecasting if it meets the criteria for a stationary univariate process.

Model identification

ACF and PACF analysis

Time series autocorrelation is used to establish and apply the time domain approach. Consequently, autocorrelation and partial autocorrelation are at the heart of the ARIMA model. The Box Jenkins approach identifies the $\text{ARIMA}(p,d,q)$ model on the basis of ACF and PACF graphs. The ARIMA parameters have three main components: $p$ is the autoregressive (AR) parameter, $d$ is the number of differentiations and $q$ is the moving average (MA) parameter.

## NULL

The ACF and PACF graphs reveal significant patterns in the time series data. In the ACF plot, significant peaks are observed at lags 1 and 2, indicating the presence of autocorrelation at these lags. In addition, the ACF plot shows a cutoff at lag 2 and a damped wave pattern, suggesting that the series can be adequately modeled using an AR process. The PACF graph shows an exponential decay from lag 8 (which is significant) to zero, followed by an alternative pattern. This pattern suggests that the series can also be well represented by an AR process. Based on these observations, it appears that the series can be modeled effectively using an ARMA (autoregressive moving average) model. We suggest $\text{ARMA}(p=1, d=1, q=1)$, $\text{ARMA}(p=1, d=1, q=2)$, $\text{ARMA}(p=2, d=1, q=1)$, $\text{ARMA}(p=2, d=1, q=2)$, where the offset (d) of 1 indicates that the return series has been differentiated.

## 
##  Phillips-Perron Unit Root Test
## 
## data:  returns_MSI20
## Dickey-Fuller Z(alpha) = -531.84, Truncation lag parameter = 6, p-value
## = 0.01
## alternative hypothesis: stationary

## 
##  KPSS Test for Level Stationarity
## 
## data:  returns_MSI20
## KPSS Level = 0.36255, Truncation lag parameter = 6, p-value = 0.0933

The Phillips-Perron (PP) test indicates that the returns_MSI20 return series is stationary, rejecting the unit root hypothesis with a p-value of 0.01.

The KPSS test suggests non-significance of non-stationarity around one level for returns_MSI20, although the p-value of 0.0933 may indicate some uncertainty.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  returns_MSI20
## Dickey-Fuller = -8.4448, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary

## 
## ############################################### 
## # Augmented Dickey-Fuller Test Unit Root Test # 
## ############################################### 
## 
## Test regression none 
## 
## 
## Call:
## lm(formula = z.diff ~ z.lag.1 - 1 + z.diff.lag)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.045280 -0.003468  0.000159  0.003296  0.028929 
## 
## Coefficients:
##            Estimate Std. Error t value Pr(>|t|)    
## z.lag.1    -0.79472    0.05459 -14.559   <2e-16 ***
## z.diff.lag -0.09993    0.04107  -2.433   0.0153 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.007468 on 585 degrees of freedom
## Multiple R-squared:  0.4471, Adjusted R-squared:  0.4452 
## F-statistic: 236.5 on 2 and 585 DF,  p-value: < 2.2e-16
## 
## 
## Value of test-statistic is: -14.5593 
## 
## Critical values for test statistics: 
##       1pct  5pct 10pct
## tau1 -2.58 -1.95 -1.62

We can see that the p-value of the ADF test is < 2.2e-16, and that we have chosen a significance level of 0.01, so the p-value is below the significance level. In this case, we can reject the null hypothesis and conclude that the MSI 20 yield series is stationary.

Selecting the optimal model for ARIMA modeling requires a thorough understanding of series and forecasting expertise. As mentioned in Damodaran Gujarati’s book, ARIMA modeling is considered an art and an iterative process. In this case, the automated method provided by R, using AIC and BIC, will be used to identify the ARIMA(p,d,q) combination with the smallest information criterion. This approach involves fitting several ARIMA models to the data and selecting the one with the lowest AIC and BIC values.

Separate training and test series

Using this iterative approach and taking into account the AIC and BIC criteria, we aim to find the ARIMA model best suited to the given data. We divided the cleaned MSI20 log series into two data sets, one for training and the other for testing. The training data set comprises the first 531 periods, representing $90\%$ of the data. This subset is used to train the model. The test data set contains the remaining 59 observations, representing the remaining $10\%$ of the data. The test data set is used to evaluate the performance of the trained model.

Parameter estimation

To estimate the parameters, we use the selected models; ARIMA(0,1,1), ARIMA(1,1,0), ARIMA(2,1,0), ARIMA(0,1,2), ARIMA(1,1,1), ARIMA(1,1,2), ARIMA(2,1,1), ARIMA(2,1,2). Using the summary function, we obtain parameter estimates for each model element. We can apply a fitting method, such as parameter estimation, which can be “maximum likelihood” (ML) or “conditional sum-of-squares minimization” (CSS). The estimation results are shown below:

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ar1  0.49937    0.18379  2.7171 0.006585 **
## ma1 -0.38338    0.19092 -2.0081 0.044637 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\[BIC = -2 \times \log(\text{likelihood}) + k \times \log(n)\]

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ar1  0.56434    0.20093  2.8087 0.004975 **
## ma1 -0.45181    0.21607 -2.0910 0.036524 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## arima(x = train_MSI20, order = c(1, 1, 1), method = "ML")
## 
## Coefficients:
##          ar1      ma1
##       0.5643  -0.4518
## s.e.  0.2009   0.2161
## 
## sigma^2 estimated as 52.32:  log likelihood = -1800.74,  aic = 3607.47
## 
## Training set error measures:
##                       ME     RMSE      MAE         MPE      MAPE      MASE
## Training set -0.08627774 7.226294 4.952053 -0.01168927 0.5111407 0.9948819
##                     ACF1
## Training set -0.01533004

## 'log Lik.' 3614.02 (df=3)

Plotting the characteristic roots of the ARIMA(1,1,1) model

In the graph below, the red dot on the left corresponds to the root of the polynomial $\varphi(B)$, while the red dot on the right corresponds to the root of $\theta(B)$.

The important point in the previous passage is to set global.par=TRUE.

Using both CSS and ML methods for parameter estimation we find that the coefficients $\hat{\varphi}_1$ and $\hat{\theta}_1$ are statistically significant, suggesting that the ARIMA(1,1,1) model is an appropriate fit for the time series.

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value  Pr(>|z|)    
## ar1  0.31483    0.19427  1.6206 0.1051081    
## ar2 -0.58963    0.20794 -2.8356 0.0045747 ** 
## ma1 -0.24151    0.16053 -1.5045 0.1324594    
## ma2  0.69482    0.20822  3.3369 0.0008471 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## arima(x = train_MSI20, order = c(2, 1, 2), method = "ML")
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       0.3148  -0.5896  -0.2415  0.6948
## s.e.  0.1943   0.2079   0.1605  0.2082
## 
## sigma^2 estimated as 51.68:  log likelihood = -1797.56,  aic = 3605.11
## 
## Training set error measures:
##                      ME     RMSE      MAE         MPE      MAPE     MASE
## Training set -0.1011205 7.182558 4.923816 -0.01367498 0.5075745 0.989209
##                    ACF1
## Training set 0.02328722

## 'log Lik.' 3620.214 (df=5)

Plotting the characteristic roots of the ARIMA(2,1,2) model

In the graph below, the red dot on the left corresponds to the root of the polynomial $\varphi(B)$, while the red dot on the right corresponds to the root of $\theta(B)$.

The ARIMA (2,1,2) model suggests significant autocorrelation at a lag of 2 ($\hat{\varphi}_2$) and significant positive error dependence at a lag of 2 ($\hat{\theta}_2$). Some coefficients are not statistically significant ($\hat{\varphi}_1$ and $\hat{\theta}_1$), which could suggest excessive complexity in the model. It may be worth exploring simpler models.

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)  
## ar1 0.098045   0.043223  2.2683  0.02331 *
## ar2 0.103647   0.043439  2.3861  0.01703 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## arima(x = train_MSI20, order = c(2, 1, 0), method = "ML")
## 
## Coefficients:
##          ar1     ar2
##       0.0980  0.1036
## s.e.  0.0432  0.0434
## 
## sigma^2 estimated as 52.09:  log likelihood = -1799.58,  aic = 3605.15
## 
## Training set error measures:
##                       ME     RMSE      MAE         MPE      MAPE      MASE
## Training set -0.08740403 7.210452 4.929905 -0.01181901 0.5087557 0.9904324
##                      ACF1
## Training set 0.0003644483

## 'log Lik.' 3611.704 (df=3)

Plotting the characteristic roots of the ARIMA(2,1,0) model

In the graph below, the red dot on the left corresponds to the root of the polynomial $\varphi(B)$, while the red dot on the right corresponds to the root of $\theta(B)$.

The ARIMA (2,1,0) model suggests significant positive autocorrelation at lags of 1 and 2.

## 
## z test of coefficients:
## 
##     Estimate Std. Error z value Pr(>|z|)   
## ma1 0.095003   0.043070  2.2058  0.02740 * 
## ma2 0.116780   0.044096  2.6483  0.00809 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:
## arima(x = train_MSI20, order = c(0, 1, 2), method = "ML")
## 
## Coefficients:
##          ma1     ma2
##       0.0950  0.1168
## s.e.  0.0431  0.0441
## 
## sigma^2 estimated as 52.07:  log likelihood = -1799.5,  aic = 3604.99
## 
## Training set error measures:
##                       ME     RMSE      MAE         MPE      MAPE      MASE
## Training set -0.09243934 7.209352 4.926385 -0.01250443 0.5083449 0.9897251
##                     ACF1
## Training set 0.002415873

## 'log Lik.' 3611.543 (df=3)

Plotting the characteristic roots of the ARIMA(0,1,2) model

In the graph below, the red dot on the left corresponds to the root of the polynomial $\varphi(B)$, while the red dot on the right corresponds to the root of $\theta(B)$.

The ARIMA(0,1,2) model suggests a significant dependence of errors on lags of 1 and 2.

Based on the above results, the ARIMA(0,1,2) model fits our series well, as indicated by the significant p-values of the estimated parameters and the low AIC and BIC criteria. Consequently, we can express our general equation model as follows:

\[X_t = X_{t-1} + 0.095\varepsilon_{t-1}+0.117\varepsilon_{t-2}+ \varepsilon_t\]

where $Y_t$ represents the current observation at time $t$ of the logarithm of the closing price of the MSI 20 share, $\varepsilon_t$ represents the error term at time $t$, and the coefficients (0.095, 0.117) correspond to the lagged error terms at the respective time lags. These coefficients capture the dynamic relationship between the current observation and previous observations with specific delay lengths.

Diagnostic verification and forecasting

The Box-Jenkins methodology consists of evaluating the residual, ACF and PACF diagrams and performing the Ljung-Box test to ensure the suitability of the selected model. The diagnostic assumptions are as follows:

Independence: The residuals must be independent, i.e. there must be no correlation or pattern in the residuals.
Normality: The residuals must follow a normal distribution, meaning that the errors are symmetrically distributed around zero.
Equality of variance: Residuals must have a constant variance, which ensures that the dispersion of residuals remains consistent. In addition, the Ljung-Box test provides a further check on the model. This test examines the autocorrelation of time series to determine whether autocorrelations differ significantly from zero. If the test rejects the null hypothesis, it suggests that the data is independent and uncorrelated. Conversely, if the test indicates a significant serial correlation, it may be necessary to modify the model.

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 11.005, df = 8, p-value = 0.2014
## 
## Model df: 2.   Total lags used: 10

The Ljung-Box test indicates that the residuals of your ARIMA(0,1,2) model have no significant autocorrelation (p-value > 5%), suggesting that the model captures the temporal structures of the series well.

Detecting residual outliers

The tsoutliers() function in R’s forecast package is useful for identifying outliers in a time series.

## $index
## [1] 287 294 295 297 338 376 507
## 
## $replacements
## [1]  11.210394   2.544107  10.948620  13.014059  -1.741477  -1.093739 -19.484213

The tsclean() function removes outliers identified in this way and replaces them (and any missing values) with linearly interpolated replacements.

Note: All outliers identified in this way are replaced by linearly interpolated values using neighboring observations, and the process is repeated.

## 
##  Ljung-Box test
## 
## data:  Residuals
## Q* = 289.95, df = 10, p-value < 2.2e-16
## 
## Model df: 0.   Total lags used: 10

The top plot shows that residual errors exhibit a random pattern around a mean of zero and have a consistent, uniform variance, indicating that the assumptions of constant mean and constant variance are reasonably satisfied.

The density diagram and QQ-plot below suggest that the distribution of residual errors follows a normal distribution with a mean of zero. This indicates that the normality assumption is approximately met.

The correlogram at bottom left represents the ACF plot. It indicates the degree of correlation between residual errors at different delays. In this case, the correlogram shows that the residual errors are not significantly autocorrelated. This implies that there is no pattern or systematic trends in the residuals after the model predictors have been taken into account.

Overall, based on these diagnostic plots, the assumptions of zero mean, constant variance, normality, and absence of autocorrelation appear to be reasonably satisfied by the model.

The plot of ACF and PACF residuals shows some significant lags. But in general, we can say that ARIMA(0,1,2) is an adequate model to represent the series.

## 
##  Box-Pierce test
## 
## data:  resid_imputed
## X-squared = 187.32, df = 1, p-value < 2.2e-16

Here we see a p-value of over $1\%$, so we can’t reject the null hypothesis, indicating that the residuals contain no autocorrelation.

## 
##  Jarque Bera Test
## 
## data:  resid_imputed
## X-squared = 0.75552, df = 2, p-value = 0.6854

## 
##  Shapiro-Wilk normality test
## 
## data:  resid_imputed
## W = 0.99693, p-value = 0.4171

Jarque-Bera Test: The high p-value suggests that there is insufficient evidence to reject the null hypothesis of normality of the residuals at a threshold of 0.05, indicating a certain normality in their distribution.

Shapiro-Wilk Normality Test: With a high p-value, no significant evidence is found to reject the null hypothesis of residual normality according to the Shapiro-Wilk test.

Anderson-Darling Normality Test: The relatively high p-value suggests that the residuals show no significant deviation from a normal distribution according to the Anderson-Darling test. There is insufficient evidence to reject the null hypothesis of normality.

The Kolmogorov-Smirnov test suggests that the residuals significantly deviate from a normal distribution.

In practice, the Shapiro-Wilk test is often preferred for normality testing, especially with moderate-sized samples. Given that the p-value for the Shapiro-Wilk test is 0.4171, you might consider that the residuals can be assumed to be approximately normally distributed, though it’s always good to keep in mind that statistical tests have limitations and the decision should also be based on the context of your analysis and the characteristics of the data.

Forecasting

## 2023-02-10 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 
##   863.3658   864.1510   856.3266   842.9247   847.2119   849.6599

## 2023-04-27 2023-04-28 2023-05-02 2023-05-03 2023-05-04 2023-05-05 
##   837.6344   839.6428   849.2266   840.7059   837.3308   836.8592

## Time Series:
## Start = 2063 
## End = 2068 
## Frequency = 1 
## [1] 863.3658 864.1510 856.3266 842.9247 847.2119 849.6599

##            ts_test ts_forecast
## 2023-02-10  864.23    863.3658
## 2023-02-13  857.13    864.1510
## 2023-02-14  844.81    856.3266
## 2023-02-15  848.06    842.9247
## 2023-02-16  848.90    847.2119
## 2023-02-17  857.58    849.6599

##            ts_test ts_forecast
## 2023-04-27  840.12    837.6344
## 2023-04-28  848.13    839.6428
## 2023-05-02  840.54    849.2266
## 2023-05-03  838.55    840.7059
## 2023-05-04  837.13    837.3308
## 2023-05-05  835.44    836.8592

Performance measures

##                        ME      RMSE       MAE         MPE      MAPE      MASE
## Training set  -0.09243934  7.209352  4.926385 -0.01250443 0.5083449 0.9897251
## Test set     -23.28370316 25.474242 23.718416 -2.77102972 2.8208400 4.7650993
##                     ACF1
## Training set 0.002415873
## Test set              NA

Modeling using the SVR model

Preparing input data for the SVR model

The SVR method was fed with one-day lagged values, so that they could be used to predict the next day’s closing price (CloseL1).

##         Date  Close
## 1 2021-01-04 928.96
## 2 2021-01-05 919.52
## 3 2021-01-06 915.67
## 4 2021-01-07 921.52
## 5 2021-01-08 919.60
## 6 2021-01-12 923.89

##         Date  Close CloseL1
## 1 2021-01-04 928.96  928.96
## 2 2021-01-05 919.52  928.96
## 3 2021-01-06 915.67  919.52
## 4 2021-01-07 921.52  915.67
## 5 2021-01-08 919.60  921.52
## 6 2021-01-12 923.89  919.60

Split the MSI 20 into training and test sets

We divide the MSI 20 closing price into two data sets: the training data excludes the first 531 days of MSI 20 closing price (January 04, 2021 - February 09, 2023) and the testing data includes the last 59 days of MSI 20.

Building an SVR model

SVR Background

Support Vector Regression (SVR) is a type of Support Vector Machine (SVM) and a supervised learning algorithm that analyzes data for regression analysis. It predicts real values instead of classes. SVR acknowledges the presence of non-linearity in the data and provides a reliable prediction model (see figure below). SVR is a non-parametric approach, which is one of its main advantages. Additionally, SVR is independent of the underlying distributions of the dependent and independent variables. Conversely, the SVR approach is based on kernel functions.

Time series prediction with SVR model. (Source)

The caret package contains a large number of predictive model interfaces. The models sub-directory contains all the code for each model with which train interfaces, and these can be used as prototypes for your model. For more details on this package, see this link: http://topepo.github.io/caret/using-your-own-model-in-train.html.

Tuning predictive models

**SVR with linear kernel

Here are the key features of the `svmLinear`` method:

Type: Regression, Classification
Adjustment parameters :
- C (Cost)
Packages required: kernlab.

## Support Vector Machines with Linear Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 425, 426, 424, 425, 424, 424, ... 
## Resampling results:
## 
##   RMSE      Rsquared  MAE     
##   7.273545  0.991478  5.007187
## 
## Tuning parameter 'C' was held constant at a value of 1

The results above show that the optimal model is $C=1$, $RMSE= 7.297$, $R^2=0.991$, and $MAE=5.005$.

## Support Vector Machines with Linear Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 426, 424, 425, 425, 424, 424, ... 
## Resampling results across tuning parameters:
## 
##   cost  RMSE      Rsquared   MAE     
##   0.25  7.292999  0.9915029  5.008912
##   0.50  7.290552  0.9915029  5.010184
##   1.00  7.292287  0.9915029  5.009096
## 
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was cost = 0.5.

According to the above results, we observe that the optimal model is $Cost=1$, $RMSE= 7.283$, $R^2=0.991$, and $MAE= 4.995$. We can plot the variation of RMSE as a function of cost as follows:

L2 (dual) regularized SVR with linear kernel

Here are the key features of the svmLinear3 method: * Type: Regression, Classification. * Adjustment parameters :

Cost (Cost)
Loss function
Packages required: LiblineaR.

## L2 Regularized Support Vector Machine (dual) with Linear Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 424, 425, 426, 423, 426, 426, ... 
## Resampling results across tuning parameters:
## 
##   cost  Loss  RMSE        Rsquared   MAE       
##   0.25  L1    849.882220  0.9914037  847.397860
##   0.25  L2      8.350466  0.9914037    6.270459
##   0.50  L1    706.636686  0.9914037  704.747736
##   0.50  L2      7.563535  0.9914037    5.377567
##   1.00  L1    420.214173  0.9914037  419.447487
##   1.00  L2      7.337262  0.9914037    5.105664
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were cost = 1 and Loss = L2.

Based on the above results, we see that the optimal model has a “cost” of 1, $Loss=L2$, $RMSE= $7.332, $R^2=0.992$, and $MAE=5.111$. However, the variation of RMSE as a function of cost for the towing loss function (L1, L2) is presented in the graph as follows:

**SVR with polynomial kernel

Here are the key features of the svmPoly method: * Type: Regression, Classification * Adjustment parameters :

degree (polynomial degree)
scale (scale)
C” (cost)
Packages required: kernlab.

## Support Vector Machines with Polynomial Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 425, 424, 424, 425, 426, 425, ... 
## Resampling results across tuning parameters:
## 
##   degree  scale  C     RMSE       Rsquared   MAE      
##   1       0.001  0.25  71.596402  0.9914825  58.280378
##   1       0.001  0.50  64.969256  0.9914825  52.877510
##   1       0.001  1.00  51.976655  0.9914825  42.296845
##   1       0.010  0.25  19.864121  0.9914825  16.064502
##   1       0.010  0.50  10.467806  0.9914825   7.852665
##   1       0.010  1.00   8.352885  0.9914825   6.007962
##   1       0.100  0.25   7.537287  0.9914825   5.237321
##   1       0.100  0.50   7.363256  0.9914825   5.063914
##   1       0.100  1.00   7.303317  0.9914825   5.010373
##   2       0.001  0.25  64.969015  0.9914824  52.877449
##   2       0.001  0.50  51.976148  0.9914824  42.296719
##   2       0.001  1.00  28.058127  0.9914824  22.837566
##   2       0.010  0.25  10.460864  0.9914774   7.852843
##   2       0.010  0.50   8.345378  0.9914752   6.009232
##   2       0.010  1.00   7.634032  0.9914707   5.322589
##   2       0.100  0.25   7.392855  0.9913846   5.072656
##   2       0.100  0.50   7.358300  0.9913563   5.049142
##   2       0.100  1.00   7.368489  0.9913113   5.065236
##   3       0.001  0.25  58.426758  0.9914823  47.551003
##   3       0.001  0.50  39.380844  0.9914823  32.040472
##   3       0.001  1.00  15.207216  0.9914822  12.064049
##   3       0.010  0.25   8.993905  0.9914695   6.577701
##   3       0.010  0.50   7.826190  0.9914694   5.518147
##   3       0.010  1.00   7.482319  0.9914498   5.183723
##   3       0.100  0.25   7.375896  0.9913524   5.055939
##   3       0.100  0.50   7.326033  0.9913910   5.016826
##   3       0.100  1.00   7.317697  0.9913915   5.007505
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were degree = 1, scale = 0.1 and C = 1.

From the results above, we see that the optimal model has $\text{degree}=1$, $\text{scale}=0.1$, $C=1$, $\text{RMSE}= 7.282$, $R^2=0.992$, and $\text{MAE}=5.005$. However, the variation of RMSE as a function of polynomial degree for the three scales ($0.1\%$, $1\%$, $10\%$) is presented in the graphs as follows:

SVR with radial basis function kernel

Here are the key features of the `svmRadial`` method: * Type: Regression, Classification * Adjustment parameters :

sigma (Sigma)
C (Cost)
Packages required: kernlab.

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 425, 425, 425, 426, 423, 426, ... 
## Resampling results across tuning parameters:
## 
##   C     RMSE      Rsquared   MAE     
##   0.25  13.24429  0.9714574  6.854114
##   0.50  10.71419  0.9803177  6.088265
##   1.00   9.38420  0.9850166  5.768810
## 
## Tuning parameter 'sigma' was held constant at a value of 15.22811
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 15.22811 and C = 1.

According to the above results, we observe that the optimal model is $=$20.541, $C=1$, $= $9.671, $R^2=0.984$, and $=$5.871. We can plot the variation of RMSE as a function of cost as follows:

SVR with radial basis function kernel

Here are the key features of the svmRadialSigma method: * Type: Regression, Classification * Setting parameters :
* sigma (Sigma)
* C (Cost) * Packages required: kernlab.

## Support Vector Machines with Radial Basis Function Kernel 
## 
## 531 samples
##   1 predictor
## 
## Pre-processing: re-scaling to [0, 1] (1) 
## Resampling: Cross-Validated (5 fold, repeated 10 times) 
## Summary of sample sizes: 424, 425, 425, 426, 424, 424, ... 
## Resampling results across tuning parameters:
## 
##   sigma     C     RMSE       Rsquared   MAE     
##    0.18005  0.25   8.559329  0.9884471  5.813289
##    0.18005  0.50   7.796519  0.9902000  5.311952
##    0.18005  1.00   7.530567  0.9907939  5.150517
##   12.90046  0.25  12.965931  0.9721356  6.789570
##   12.90046  0.50  10.692150  0.9803375  6.080821
##   12.90046  1.00   9.301179  0.9852154  5.734900
##   25.62086  0.25  14.774491  0.9646173  7.495199
##   25.62086  0.50  12.079493  0.9747589  6.535314
##   25.62086  1.00  10.241257  0.9815138  6.025003
## 
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were sigma = 0.18005 and C = 1.

From the above results, we found that the optimal model has $\text{sigma} = 0.179$, $C = 1$, $\text{RMSE}= 7.549$, $R^2=0.991$, and $\text{MAE}=5.155$. However, the variation of RMSE as a function of sigma for three costs (0.25, 0.5, 1) is illustrated in the graph below:

Notes : The cost parameter and the RBF sigma kernel parameter are set in this SVM model. Using tuneLength in this situation will analyze only six values of the kernel parameter, allowing a broad search for the cost parameter as well as a more targeted search for sigma.

Model comparison

Models	Set	Kernel type	Tuning parameters	$R^2$	RMSE	MAE
SVR.model1	Train	Linear	C	0.992	7.297	5.005
SVR.model2	Train	Linear	cost	0.992	7.268	4.999
SVR.model3	Train	Linear with dual	cost, Loss	0.992	7.345	5.108
SVR.model4	Train	Poly	degree, scale, C	0.991	7.303	5.007
SVR.model5	Train	RBF	C	0.982	9.988	5.934
SVR.model6	Train	RBF with sigma	Sigma, C.	0.973	12.536	6.799

From the table above, we can see that the fourth SVR model is the best because it has a low RMSE and MAE, and a tuning parameter degree, scale, $C$. But we’ll use the second linear kernel model, which gives good results in validation prediction accuracy.

Forecasting

From the estimated models, we can make a prediction about the next 59 days of the MSI 20 series.

##           Date  Close CloseL1 SVR_Prediction_poly
## 532 2023-02-10 864.23  865.69            866.4465
## 533 2023-02-13 857.13  864.23            864.9956
## 534 2023-02-14 844.81  857.13            857.9394
## 535 2023-02-15 848.06  844.81            845.6955
## 536 2023-02-16 848.90  848.06            848.9255
## 537 2023-02-17 857.58  848.90            849.7603

##           Date  Close CloseL1 SVR_Prediction_poly
## 585 2023-04-27 840.12  838.25            839.1761
## 586 2023-04-28 848.13  840.12            841.0345
## 587 2023-05-02 840.54  848.13            848.9950
## 588 2023-05-03 838.55  840.54            841.4519
## 589 2023-05-04 837.13  838.55            839.4742
## 590 2023-05-05 835.44  837.13            838.0630

## 2023-02-10 2023-02-13 2023-02-14 2023-02-15 2023-02-16 2023-02-17 
##   866.4465   864.9956   857.9394   845.6955   848.9255   849.7603

## 2023-04-27 2023-04-28 2023-05-02 2023-05-03 2023-05-04 2023-05-05 
##   839.1761   841.0345   848.9950   841.4519   839.4742   838.0630

## Time Series:
## Start = 2063 
## End = 2068 
## Frequency = 1 
## [1] 866.4465 864.9956 857.9394 845.6955 848.9255 849.7603

##            ts_test ts_forecast_SVR
## 2023-02-10  864.23        866.4465
## 2023-02-13  857.13        864.9956
## 2023-02-14  844.81        857.9394
## 2023-02-15  848.06        845.6955
## 2023-02-16  848.90        848.9255
## 2023-02-17  857.58        849.7603

From the graphs above, which show the observations as a black line and the forecast as a red line, we can see that the lines almost coincide, and the forecast follows the test values.

Performance Measures

Let’s calculate the prediction accuracy for the MSI 20 on test and train values.

## [1] 7.295069

## [1] 5.802301

## [1] 0.514012

## [1] 0.5078035

## [1] 4.972625

## [1] 4.294351

## [1] 0.999015

## [1] 1.020161

Modeling with the MLP model

As a result, using Neural Networks (NN) to estimate index of MSI 20 has been proposed as a real alternative. In this study, we will utilize an MLP-NN to predict the next step-ahead of MSI 20. From January 04, 2021 to May 05, 2023, daily data was collected (590 observations). The first 531 must be utilized as training data, while the remaining ones must be used as a testing set.

Building an MLP model

Neural network basics

A simple neural network model

The backpropagation error approach can be used in conjunction with learning rules. The error at the output unit is calculated using the learning rule. This error is then backpropagated to all units, resulting in an error that is proportional to each unit’s contribution to the total error at the output unit. The weight of each connection is then optimized according to the errors of each unit. For a better understanding, the structure of a simple neural network model is illustrated in the diagram below.

Time series forecasting with multi-layer perceptrons (MLPs) and extreme learning machines is simplified with the nnfor package for R.

Univariate forecasting with the `mlp()` function

The main function is mlp(), and in its simplest form, all you have to do is enter a time series to be modeled. nnfor is different from other R neural network implementations in that it includes code for automatic network design with respectable prediction performance, while giving the expert user in-depth control. The automatic specification has been created with parsimony in mind. This not only improves the robustness of the resulting networks, but also reduces training time.

## 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08 2021-01-12 
##     928.96     919.52     915.67     921.52     919.60     923.89

## 2023-02-02 2023-02-03 2023-02-06 2023-02-07 2023-02-08 2023-02-09 
##     832.88     832.05     830.06     842.61     860.11     865.69

## Time Series:
## Start = 2024 
## End = 2029 
## Frequency = 1 
## [1] 928.96 919.52 915.67 921.52 919.60 923.89

## MLP fit with 5 hidden nodes and 20 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2)
## Forecast combined using the median operator.
## MSE: 45.2617.

The output indicates that the resulting network has 5 hidden nodes, has been trained 20 times and that the different predictions have been combined using the median operator. The mlp() function automatically generates sets of networks, whose training starts with different random initial weights.

Input nodes in gray are autoregressions. If other regressors were included, they would appear in light blue or magenta (seasonality).

##                    ME     RMSE      MAE          MPE      MAPE          ACF1
## Test set -0.006057505 6.727684 4.803544 -0.001275546 0.4957029 -0.0009488031
##          Theil's U
## Test set 0.9197932

The mlp() function accepts several arguments to refine the resulting network. The hd argument defines a fixed number of hidden nodes. If it’s a single number, the neurons are arranged in a single hidden node. If it’s a vector, the neurons are arranged in several layers.

## MLP fit with (4,2,2) hidden nodes and 20 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2)
## Forecast combined using the median operator.
## MSE: 47.436.

##                  ME     RMSE     MAE          MPE     MAPE        ACF1
## Test set 0.01062339 6.887377 4.83535 -0.002754793 0.498704 -0.02192383
##          Theil's U
## Test set 0.8769973

The number of hidden nodes can either be predefined, using the hd argument, or automatically specified, as defined with the hd.auto.type argument. By default, this is hd.auto.type="set" and uses the entry provided in hd (the default value is 5). You can choose hd.auto.type="valid" to test using a validation sample ($25\%$ of the time series), or hd.auto.type="cv" to use 5-fold cross-validation. The number of hidden nodes to be evaluated is set by the hd.max argument.

## MLP fit with 2 hidden nodes and 20 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2)
## Forecast combined using the median operator.
## MSE: 51.5619.

##                  ME    RMSE      MAE           MPE      MAPE          ACF1
## Test set 0.01603123 7.18066 4.924758 -0.0009455695 0.5082059 -3.693273e-05
##          Theil's U
## Test set 0.9805411

## MLP fit with 2 hidden nodes and 20 repetitions.
## Series modelled in differences: D1.
## Univariate lags: (1,2)
## Forecast combined using the median operator.
## MSE: 49.0538.

##                  ME    RMSE      MAE         MPE      MAPE        ACF1
## Test set 0.06484902 7.00384 4.889499 0.004928803 0.5052037 -0.03049995
##          Theil's U
## Test set 0.9600274

Forecasting

To produce forecasts, we use the forecast() function, which requires a trained network object and the forecast horizon h.

## 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-13 2023-02-14 
##     830.06     842.61     860.11     865.69     864.23     857.13

## 2023-04-27 2023-04-28 2023-05-02 2023-05-03 2023-05-04 2023-05-05 
##     838.25     840.12     848.13     840.54     838.55     837.13

## Time Series:
## Start = 2060 
## End = 2065 
## Frequency = 1 
## [1] 830.06 842.61 860.11 865.69 864.23 857.13

## 2023-02-07 2023-02-08 2023-02-09 2023-02-10 2023-02-13 2023-02-14 
##   867.7871   873.5589   869.7875   862.8982   851.2900   853.6949

## 2023-04-27 2023-04-28 2023-05-02 2023-05-03 2023-05-04 2023-05-05 
##   854.4052   847.8478   844.1952   867.7871   873.5589   869.7875

## Time Series:
## Start = 2060 
## End = 2065 
## Frequency = 1 
## [1] 867.7871 873.5589 869.7875 862.8982 851.2900 853.6949

##            ts_test ts_forecast_mlp
## 2023-02-10  865.69        867.7871
## 2023-02-13  864.23        873.5589
## 2023-02-14  857.13        869.7875
## 2023-02-15  844.81        862.8982
## 2023-02-16  848.06        851.2900
## 2023-02-17  848.90        853.6949

##            ts_test ts_forecast_mlp
## 2023-04-27  838.25        849.8612
## 2023-04-28  840.12        843.7778
## 2023-05-02  848.13        845.7549
## 2023-05-03  840.54        854.4052
## 2023-05-04  838.55        847.8478
## 2023-05-05  837.13        844.1952

Performance measures

Let’s calculate the prediction accuracy for MSI 20 on test and train values using the MLP model with (8.4) hidden nodes.

##                   ME     RMSE     MAE          MPE      MAPE        ACF1
## Test set -0.01062339 6.887377 4.83535 -0.002449036 0.4994405 -0.02192383
##          Theil's U
## Test set 0.9436091

## [1] "MASE Value:  4.81710380365813"

## [1] 8.451096

## [1] 0.8533805

## [1] 7.208889

## [1] "MASE Test:  7.33317986046202"

Comparison of ARIMA, SVR, and MLP models

In this subsection, we compare in detail the results obtained by three powerful forecasting models: ARIMA, SVR, and MLP. We explore their unique strengths in time series prediction, particularly those of MSI 20. Our aim is to provide valuable insights into their applicability and effectiveness in order to guide practitioners in choosing the most suitable approach for their forecasting tasks, particularly on MSI 20’s daily closing series.

Models	Sets	MAE	RMSE	MAPE
$\small{\text{ARIMA}(0,1,2)}$
	Train	4.209	7.209	0.508
	Test	23.718	25.474	2.821
$\small{\text{SVR}\;\text{avec}\;k_{\text{Polynomial}}}$
	Train	4.294	5.802	0.508
	Test	4.973	7.295	0.514
$\hat{MLP}_{2}$
	Train	4.012	5.608	0.411
	Test	7.426	8.769	0.879

The SVR model with a Polynomial kernel appears to have consistent performance on the training and test sets, while the MLP model shows high performance on the training set but some degradation on the test set. The ARIMA model shows relatively less accurate performance on both sets. These results suggest that the SVR model with a polynomial kernel may be a more robust option in this particular context of forecasting the closing price of the MSI 20.

Discussion and conclusion

An in-depth exploration of the MSI 20 data initiated our analysis, revealing significant autocorrelation and an upward trend in the daily closing price. Time series analysis techniques, such as trend analysis, decomposition, and correlation analysis, were applied to model the series components. The results indicated non-stationarity with the absence of seasonality.

In the modeling phase, three distinct approaches, namely the ARIMA, SVR, and MLP models, were employed. The SVR model with a polynomial kernel demonstrated the best performance and resilience, followed by the MLP model, while the ARIMA model proved less accurate. A sensitivity analysis was carried out to assess the impact of parameters on forecasting performance.

The use of the selected models for forecasting future values highlighted their ability to capture non-linear complexity and short- to medium-term trends. However, the performance showed a decrease with an increasing forecast horizon, suggesting a preferential adaptation for short- and medium-term forecasts.

In the discussion and conclusion of the project, we found that machine learning techniques can outperform classical methods in understanding the complex dynamics of financial time series. The project also identified the challenges and limitations of each approach, highlighting opportunities for improvement and extension in future work. In sum, it has established a solid foundation for the analysis, modeling and forecasting of financial time series, while providing an in-depth perspective on the advantages and disadvantages of different methodologies.

References

Laboratory of Mathematics, Statistics, and Applications, Faculty of Sciences, Mohammed V University in Rabat-Morocco, hassan_oukhouya@um5.ac.ma ↩︎

Models	Sets	MAE	RMSE	MAPE
\(\small{\text{ARIMA}(0,1,2)}\)
	Train	4.209	7.209	0.508
	Test	23.718	25.474	2.821
\(\small{\text{SVR}\;\text{avec}\;k_{\text{Polynomial}}}\)
	Train	4.294	5.802	0.508
	Test	4.973	7.295	0.514
\(\hat{MLP}_{2}\)
	Train	4.012	5.608	0.411
	Test	7.426	8.769	0.879

Modeling and Forecasting the Morocco Stock Index 20: A Comparative Study of ARIMA, SVR, and MLP Models

Ph.D. Candidate Hassan OUKHOUYA1

December 30, 2023

Introduction

Import the necessary libraries

Data selection and cleaning

Import data

Data cleaning

Data analysis, exploration and visualization

Descriptive statistics

Visualize the time series

Preprocessing

Stationarity and differentiation of time series

Checking stationarity

Modeling using the ARIMA model

Box-Jenkins approach

Model identification

ACF and PACF analysis

Parameter estimation

Plotting the characteristic roots of the ARIMA(1,1,1) model

Plotting the characteristic roots of the ARIMA(2,1,2) model

Plotting the characteristic roots of the ARIMA(2,1,0) model

Plotting the characteristic roots of the ARIMA(0,1,2) model

Diagnostic verification and forecasting

Detecting residual outliers

Forecasting

Performance measures

Modeling using the SVR model

Preparing input data for the SVR model

Split the MSI 20 into training and test sets

Building an SVR model

SVR Background

Tuning predictive models

Forecasting

Performance Measures

Modeling with the MLP model

Building an MLP model

Univariate forecasting with the mlp() function

Forecasting

Performance measures

Comparison of ARIMA, SVR, and MLP models

Discussion and conclusion

References

Ph.D. Candidate Hassan OUKHOUYA¹

Univariate forecasting with the `mlp()` function