The goal of this phase is to produce the best multivariate regression model for forecasting the return on our stock of choice - Microsoft. For that we will use family of Linear regression models to find the best performing model.
The dependent variable in our regression model will be daily returns of Microsoft. The chosen explanatory (independent) variables are also stocks (potential competitors) and stock market stock indexes.
Potential regressors in our regression models are:
The dataset splitting for dependent variable (Microsoft daily returns) has been done in the previous phase.
The training data set will contain daily return data from 2019. and 2020. and the test data will only contain first six months of 2021.
In order to split the dataset for potential regressors, we first need to check the stationarity properties of these time series, which is described in the next section.
In this section, we will check the stationarity property of each time series. That means, it needs to be determined that the time series is constant in mean and variance are constant and not dependent on time.
We will look at couple of methods for checking stationarity:
Let’s see the graph of Apple closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of upward trend.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Another test we can conduct is the Augmented Dickey–Fuller (ADF) t-statistic test to find if the series has a unit root (a series with a trend line will have a unit root and result in a large p-value).
##
## Augmented Dickey-Fuller Test
##
## data: AAPL_prices
## Dickey-Fuller = -2.1979, Lag order = 8, p-value = 0.4946
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is pretty high (almost 50%), so we cannot reject the null hypothesis.
Now, we can test if the time series is level or trend stationary using the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test. Here we will test the null hypothesis of trend stationarity (a low p-value will indicate a signal that is not trend stationary, has a unit root).
##
## KPSS Test for Trend Stationarity
##
## data: AAPL_prices
## KPSS Trend = 0.74634, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
The stock prices time series is definitely not stationary, therefore we need to introduce some kind of modification. One of the methods is to use differentiation of stock price i.e. calculate daily returns.
Let’s see the graph of Apple daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: AAPL_retDaily
## Dickey-Fuller = -7.8133, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: AAPL_retDaily
## KPSS Trend = 0.062712, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
== NOTE ==
We will repeat the same steps for all explanatory variables.
Let’s see the graph of Google closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of upward trend.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: GOOG_prices
## Dickey-Fuller = -0.85307, Lag order = 8, p-value = 0.9569
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is really high (around 96%), so we cannot reject the null hypothesis.
Performing KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: GOOG_prices
## KPSS Trend = 1.4884, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
Let’s see the graph of Google daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: GOOG_retDaily
## Dickey-Fuller = -7.1894, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: GOOG_retDaily
## KPSS Trend = 0.037771, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
Let’s see the graph of IBM closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of seasonality.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: IBM_prices
## Dickey-Fuller = -2.8358, Lag order = 8, p-value = 0.2245
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is really high (around 22%), so we cannot reject the null hypothesis.
Performing KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: IBM_prices
## KPSS Trend = 0.63745, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
Let’s see the graph of IBM daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: IBM_retDaily
## Dickey-Fuller = -7.3281, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: IBM_retDaily
## KPSS Trend = 0.080941, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
Let’s see the graph of 3M closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of upward trend.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: MMM_prices
## Dickey-Fuller = -1.4804, Lag order = 8, p-value = 0.7982
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is really high (around 79%), so we cannot reject the null hypothesis.
Performing KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: MMM_prices
## KPSS Trend = 1.5971, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
Let’s see the graph of 3M daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: MMM_retDaily
## Dickey-Fuller = -8.0249, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: MMM_retDaily
## KPSS Trend = 0.032186, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
Let’s see the graph of S&p500 closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of upward trend.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: SP500_prices
## Dickey-Fuller = -1.8165, Lag order = 8, p-value = 0.656
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is really high (around 66%), so we cannot reject the null hypothesis.
Performing KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: SP500_prices
## KPSS Trend = 1.216, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
Let’s see the graph of S&p500 daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: SP500_retDaily
## Dickey-Fuller = -6.9, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: SP500_retDaily
## KPSS Trend = 0.050641, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
Let’s see the graph of Nasdaq closing prices for the past two and a half years:
It looks like this time series is not stationary, as we can see some shape of upward trend.
Now, we need to perform methods described in the introduction to conclude if the time series is stationary or not.
From the plot above, we can conclude that almost all lags are exceeding the confidence interval of the ACF.
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: Nasdaq_prices
## Dickey-Fuller = -2.01, Lag order = 8, p-value = 0.5741
## alternative hypothesis: stationary
The significance level (p-value) for ADF test is really high (around 57%), so we cannot reject the null hypothesis.
Performing KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: Nasdaq_prices
## KPSS Trend = 1.4659, Truncation lag parameter = 6, p-value = 0.01
The significance level (p-value) for KPSS test is really low (below 1%), so we are rejecting the null hypothesis, which means that this time series has a unit root.
Let’s see the graph of Nasdaq daily returns for the past two and a half years:
Well, now it looks different and more promising now. It looks this time series is stationary.
Let’s prove it.
Now we can see that only few lags that exceed the confidence interval of the ACF (blue dashed line).
Performing ADF test:
##
## Augmented Dickey-Fuller Test
##
## data: Nasdaq_retDaily
## Dickey-Fuller = -7.2016, Lag order = 8, p-value = 0.01
## alternative hypothesis: stationary
The significance level (p-value) is around 1%, so we can reject the null hypothesis (no presence of unit root).
And finally, KPSS test:
##
## KPSS Test for Trend Stationarity
##
## data: Nasdaq_retDaily
## KPSS Trend = 0.049269, Truncation lag parameter = 6, p-value = 0.1
The significance level (p-value) for KPSS test is more than 10%, so we are cannot reject the null hypothesis, which means that we cannot prove there is a unit root.
Before we choose appropriate regression model, let’s first say couple of words about linear regression itself and the metrics that will be used.
A linear regression is a statistical model that analyzes the relationship between a response/dependent variable and one or more variables and their interactions (explanatory/independent variables).
The most common evaluation metrics in regression model are:
The problem with the above metrics, is that they are sensible to the inclusion of additional variables in the model, even if those variables don’t have significant contribution in explaining the outcome. This means that including additional variables in the model will always increase the R2 and reduce the RMSE. Therefore, we need to introduce more robust metric in order to make proper choice.
Regarding R2, there is an adjusted version, called Adjusted R-squared, which adjusts the R2 for having too many variables in the model.
Additionally, there are two other important metrics that are commonly used for model evaluation and selection:
In the next section, we will use Adjusted R2, AIC and BIC for comparing models.
Whole dataset (for each stock/index we picked) is divided into two subsets:
We will choose the appropriate regression model on the in-sample dataset.
The first linear model that we will try out is using all explanatory variables that we listed in the introduction section.
Let’s see the metrics from evaluated model:
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ AAPL_daily_ret_training +
## GOOG_daily_ret_training + IBM_daily_ret_training + MMM_daily_ret_training +
## SP500_daily_ret_training + Nasdaq_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.035740 -0.004692 -0.000467 0.003942 0.039572
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.904e-07 3.633e-04 -0.003 0.9978
## AAPL_daily_ret_training -2.281e-02 3.088e-02 -0.739 0.4604
## GOOG_daily_ret_training 5.782e-02 3.324e-02 1.740 0.0825 .
## IBM_daily_ret_training -2.810e-02 3.130e-02 -0.898 0.3697
## MMM_daily_ret_training -1.131e-01 2.697e-02 -4.193 3.26e-05 ***
## SP500_daily_ret_training -3.275e-04 1.000e-01 -0.003 0.9974
## Nasdaq_daily_ret_training 1.217e+00 9.958e-02 12.225 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008085 on 498 degrees of freedom
## Multiple R-squared: 0.8598, Adjusted R-squared: 0.8581
## F-statistic: 509.1 on 6 and 498 DF, p-value: < 2.2e-16
## AIC: -3423.834
## BIC: -3390.037
From the results above, we can see that only two variables are statistically significant (p-value lower than 5%): 3M and Nasdaq daily returns. We can reject the null hypothesis and state that these two coefficients are not 0.
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
Ajdusted R-squared is quite high (85.8%), which means high “goodness of fit”.
Residual Standard Error (also considered as measure of the quality of a linear regression fit) is really low.
We can also see that both AIC and BIC are really low (negative), but these values will be used for comparing with other models.
Let’s now include only S&P500 and Nasdaq daily returns.
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ SP500_daily_ret_training +
## Nasdaq_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.034970 -0.004828 -0.000531 0.003808 0.037992
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.768e-06 3.701e-04 0.013 0.98973
## SP500_daily_ret_training -2.449e-01 7.835e-02 -3.126 0.00187 **
## Nasdaq_daily_ret_training 1.364e+00 7.387e-02 18.470 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008257 on 502 degrees of freedom
## Multiple R-squared: 0.8526, Adjusted R-squared: 0.852
## F-statistic: 1452 on 2 and 502 DF, p-value: < 2.2e-16
## AIC: -3406.568
## BIC: -3389.67
From the results above, we can see that both coefficients are statistically significant (p-value lower than 5%). We can reject the null hypothesis and state that these two coefficients are not 0.
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
Ajdusted R-squared is quite high (85.2%), which means high “goodness of fit”.
Residual Standard Error is really low.
We can also see that both AIC and BIC are really low (negative), but these values will be used for comparing with other models.
Only competitor companies (daily returns) are now explanatory variables:
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ AAPL_daily_ret_training +
## GOOG_daily_ret_training + IBM_daily_ret_training + MMM_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.041875 -0.005631 -0.000348 0.005148 0.048221
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0001203 0.0004792 0.251 0.802
## AAPL_daily_ret_training 0.3891183 0.0292190 13.317 < 2e-16 ***
## GOOG_daily_ret_training 0.4415925 0.0351632 12.558 < 2e-16 ***
## IBM_daily_ret_training 0.1812611 0.0347953 5.209 2.77e-07 ***
## MMM_daily_ret_training -0.0555939 0.0328250 -1.694 0.091 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01068 on 500 degrees of freedom
## Multiple R-squared: 0.7546, Adjusted R-squared: 0.7527
## F-statistic: 384.4 on 4 and 500 DF, p-value: < 2.2e-16
## AIC: -3145.124
## BIC: -3119.777
Quite interesting results. Now, all coefficients are statistically significant except 3M (p-value is around 9%).
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
Ajdusted R-squared is lower than in the previous modes (75.3%), which mean it fits little bit worse, but is is still good results though.
Residual Standard Error is higher than in the previous models.
We can also see that both AIC and BIC are low (negative), but they are higher than in the previous models.
Let’s see what happens if we add Nasdaq index to the previous model as explanatory variable:
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ AAPL_daily_ret_training +
## GOOG_daily_ret_training + IBM_daily_ret_training + MMM_daily_ret_training +
## Nasdaq_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.035739 -0.004693 -0.000467 0.003942 0.039574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.361e-07 3.626e-04 -0.003 0.998
## AAPL_daily_ret_training -2.280e-02 3.069e-02 -0.743 0.458
## GOOG_daily_ret_training 5.782e-02 3.318e-02 1.743 0.082 .
## IBM_daily_ret_training -2.815e-02 2.846e-02 -0.989 0.323
## MMM_daily_ret_training -1.131e-01 2.501e-02 -4.522 7.66e-06 ***
## Nasdaq_daily_ret_training 1.217e+00 6.290e-02 19.350 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008077 on 499 degrees of freedom
## Multiple R-squared: 0.8598, Adjusted R-squared: 0.8584
## F-statistic: 612.1 on 5 and 499 DF, p-value: < 2.2e-16
## AIC: -3425.834
## BIC: -3396.262
Well, this model is similar to the first model (where we included all explanatory variables).
We can see that only two variables are statistically significant (p-value lower than 5%): 3M and Nasdaq daily returns.
We can reject the null hypothesis and state that these two coefficients are not 0.
All other metrics (Adjusted R-squared, RSE, AIC, BIC) are the same (or really close).
This model is the candidate for the winner.
Let’s try something similar. Instead of Nasdaq, let’s include S&P500 index.
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ AAPL_daily_ret_training +
## GOOG_daily_ret_training + IBM_daily_ret_training + MMM_daily_ret_training +
## SP500_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.032511 -0.005088 -0.000183 0.004483 0.046873
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0002047 0.0004134 0.495 0.6207
## AAPL_daily_ret_training 0.1712062 0.0301693 5.675 2.36e-08 ***
## GOOG_daily_ret_training 0.2224156 0.0346152 6.425 3.07e-10 ***
## IBM_daily_ret_training -0.0671826 0.0354718 -1.894 0.0588 .
## MMM_daily_ret_training -0.1850166 0.0299808 -6.171 1.40e-09 ***
## SP500_daily_ret_training 0.9470364 0.0720408 13.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.00921 on 499 degrees of freedom
## Multiple R-squared: 0.8177, Adjusted R-squared: 0.8159
## F-statistic: 447.8 on 5 and 499 DF, p-value: < 2.2e-16
## AIC: -3293.298
## BIC: -3263.726
From the results above, we can conclude that all coefficients are statistically significant (p-value lower than 5%), except IBM which is slightly above 5%, but we cannot reject the null hypothesis i.e. we cannot guarantee that this coefficient is not zero.
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
Ajdusted R-squared is quite high (81.6%), which means high “goodness of fit”. However, it is lower than the candidate for the winner.
Residual Standard Error is low.
We can also see that both AIC and BIC are really low (negative), but these values will be used for comparing with other models.
This model is the good candidate for evaluating the forecast performance which will be described in the next section.
Let’s try Google and 3M:
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ GOOG_daily_ret_training +
## MMM_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.070744 -0.006218 -0.000997 0.005105 0.066000
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0008171 0.0005718 1.429 0.154
## GOOG_daily_ret_training 0.7784551 0.0323281 24.080 < 2e-16 ***
## MMM_daily_ret_training 0.1376758 0.0324538 4.242 2.64e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01282 on 502 degrees of freedom
## Multiple R-squared: 0.6449, Adjusted R-squared: 0.6435
## F-statistic: 455.8 on 2 and 502 DF, p-value: < 2.2e-16
## AIC: -2962.421
## BIC: -2945.522
From the results above, we can see that both coefficients are statistically significant (p-value lower than 5%).
We can reject the null hypothesis and state that these two coefficients are not 0.
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
Ajdusted R-squared is lower 64%), which means solid goodness of fit.
Residual Standard Error is really low.
We can also see that both AIC and BIC are also low.
Let’s have a look when we include only these two explanatory variables:
##
## Call:
## lm(formula = MSFT_daily_ret_training ~ IBM_daily_ret_training +
## GOOG_daily_ret_training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.067191 -0.006041 -0.000683 0.005641 0.063428
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0008201 0.0005531 1.483 0.139
## IBM_daily_ret_training 0.2448062 0.0335209 7.303 1.11e-12 ***
## GOOG_daily_ret_training 0.6986169 0.0339111 20.601 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0124 on 502 degrees of freedom
## Multiple R-squared: 0.6675, Adjusted R-squared: 0.6661
## F-statistic: 503.8 on 2 and 502 DF, p-value: < 2.2e-16
## AIC: -2995.624
## BIC: -2978.726
Both coefficients are statistically significant. That means that neither of the coefficients is zero.
The F-statistics shows high value with zero p-value, which is another proof that there are some coefficients that are not equal to 0.
We have lower Adjuster R-squared (around 67%), but it is acceptable.
AIC and BIC are also lower due to lower goodness of fit.
In general, we must check the residuals. If the model is adequate, the residuals should behave like a white noise.
Let’s perform Ljung-Box tests for residual independence:
Model with all explanatory variables
##
## Box-Ljung test
##
## data: model_all$residuals
## X-squared = 20.168, df = 4, p-value = 0.0004626
##
## Box-Ljung test
##
## data: model_all$residuals
## X-squared = 30.023, df = 14, p-value = 0.007577
##
## Box-Ljung test
##
## data: model_all$residuals
## X-squared = 37.746, df = 24, p-value = 0.0368
We see low values for the p-values (rejecting that there isn’t a serial correlation among the residuals), so we need to discard this model. Model with market stock indexes
##
## Box-Ljung test
##
## data: model_indexes$residuals
## X-squared = 19.11, df = 8, p-value = 0.01428
##
## Box-Ljung test
##
## data: model_indexes$residuals
## X-squared = 31.728, df = 18, p-value = 0.02367
##
## Box-Ljung test
##
## data: model_indexes$residuals
## X-squared = 38.583, df = 28, p-value = 0.08787
Same situation with this model as well. We need to discard it.
Model with competitors
##
## Box-Ljung test
##
## data: model_competition$residuals
## X-squared = 26.955, df = 6, p-value = 0.0001476
##
## Box-Ljung test
##
## data: model_competition$residuals
## X-squared = 30.746, df = 16, p-value = 0.0145
##
## Box-Ljung test
##
## data: model_competition$residuals
## X-squared = 42.121, df = 26, p-value = 0.02385
Same reason. Discarding this model as well.
Model with competitors and SP500
##
## Box-Ljung test
##
## data: model_competition_sp500$residuals
## X-squared = 14.827, df = 5, p-value = 0.01113
##
## Box-Ljung test
##
## data: model_competition_sp500$residuals
## X-squared = 19.061, df = 15, p-value = 0.211
##
## Box-Ljung test
##
## data: model_competition_sp500$residuals
## X-squared = 32.449, df = 25, p-value = 0.1454
Well, this is a different story. We can accept this model and use it for further analysis.
Model with Google and 3M
##
## Box-Ljung test
##
## data: model_google_3m$residuals
## X-squared = 19.552, df = 8, p-value = 0.01217
##
## Box-Ljung test
##
## data: model_google_3m$residuals
## X-squared = 23.988, df = 18, p-value = 0.1554
##
## Box-Ljung test
##
## data: model_google_3m$residuals
## X-squared = 34.49, df = 28, p-value = 0.1852
We are going to accept this as we can see that we can reject the null hypothesis for higher number of lags.
Model with IBM and Google
##
## Box-Ljung test
##
## data: model_ibm_google$residuals
## X-squared = 17.682, df = 8, p-value = 0.02374
##
## Box-Ljung test
##
## data: model_ibm_google$residuals
## X-squared = 23.638, df = 18, p-value = 0.1672
##
## Box-Ljung test
##
## data: model_ibm_google$residuals
## X-squared = 34.114, df = 28, p-value = 0.1971
Accepting this one as well as we can see that we can reject the null hypothesis with higher number of lags.
There are several other ways that explanatory information might make its way into residuals:
Now let’s do the tests for homoscedasticity for the models that are remaining:
Model with competitors and SP500
## # A tibble: 1 x 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 9.67 0.0852 5 Koenker (studentised) greater
The p-value is greater than 5%, therefore we can’t reject the null hypothesis which states the presence of homoscedasticity.
Model with Google and 3M
## # A tibble: 1 x 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 3.98 0.136 2 Koenker (studentised) greater
Same situation. We can’t reject the presence of homoscedasticity.
Model with IBM and Google
## # A tibble: 1 x 5
## statistic p.value parameter method alternative
## <dbl> <dbl> <dbl> <chr> <chr>
## 1 5.23 0.0732 2 Koenker (studentised) greater
And this model as well. This one is slightly aboce the significance level.
In this section, we will evaluate the forecast performance for the three models from previous section.
The models that are competing are:
Competitors and S&P 500Google and 3MGoogle and IBMAs it was mentioned in the arima forecasting section, forecast performance is evaluated over the entire testing data set. We will use rolling scheme to produce the forecasts. Models will be evaluated in terms of the one-period-ahead forecast and forecast at horizon of five-periods-ahead (one trading week).
## ME RMSE MAE MPE MAPE
## Test set -0.0001485647 0.009173761 0.006475195 86.19317 178.0951
## ME RMSE MAE MPE MAPE
## Test set -0.002011397 0.01179252 0.008358576 101.4903 198.0536
## ME RMSE MAE MPE MAPE
## Test set -0.002039885 0.01143915 0.007983145 87.6192 196.3395
Based on the results above, we can see that Competitor & SP500 model is still the best, because RMSE (Root Mean Squared Error) is lowest.
Besides RMSE for comparing two models, we will use DM (Diebold-Mariano) test. It checks whether the forecast error is significant or simply due to the specific choice of data in our sample.
Let’s compare the models:
Comparison Competitors + SP500 and IBM + Google
##
## Diebold-Mariano Test
##
## data: errors_1_winnererrors_1_runner_up
## DM = -2.1568, Forecast horizon = 1, Loss function power = 2, p-value =
## 0.01661
## alternative hypothesis: less
Comparison Competitors + SP500 and Google + 3M
##
## Diebold-Mariano Test
##
## data: errors_1_winnererrors_1_third
## DM = -1.5949, Forecast horizon = 1, Loss function power = 2, p-value =
## 0.05682
## alternative hypothesis: less
Based on these tests, we can be sure that model Competitors and SP500 is a better fit than IBM and Google, but not that is better than Google and 3M.
However, we will still keep Competitors and SP500 as favorite one because of the really low RMSE.
## ME RMSE MAE MPE MAPE
## Test set -0.000173226 0.009169522 0.006456579 86.26248 178.2542
## ME RMSE MAE MPE MAPE
## Test set -0.00201904 0.01178804 0.008353015 101.1574 198.4184
## ME RMSE MAE MPE MAPE
## Test set -0.002055996 0.01143913 0.007979821 87.25014 196.7166
The metric values are pretty much the same like in one-period-ahead forecast.
But let’s do the Diebold-Mariano tests again.
Comparison Competitors + SP500 and IBM + Google
##
## Diebold-Mariano Test
##
## data: errors_5_winnererrors_5_runner_up
## DM = -2.3348, Forecast horizon = 5, Loss function power = 2, p-value =
## 0.01069
## alternative hypothesis: less
Comparison Competitors + SP500 and Google + 3M
##
## Diebold-Mariano Test
##
## data: errors_5_winnererrors_5_third
## DM = -1.418, Forecast horizon = 5, Loss function power = 2, p-value =
## 0.07952
## alternative hypothesis: less
We have the similar situation like with one-perio-ahead forecast. Based on the Diebold-Mariano test, we can’t say that Competitors and SP500 is a better model than Google and 3M as we have the p-vale almost 8%.
After evaluating forecast performance in the previous section we can conclude that regression mode that has competitors and S&P500 as explanatory is the best fit.
Here is the brief overview of the key metrics for that model:
| Metric | Best Regression model |
|---|---|
| AIC | -3293.298 |
| BIC | -3263.726 |
| RMSE (1-period-ahead) | 0.009173761 |
| RMSE (5-period-ahead) | 0.009169522 |
Now it is time to compare the results from Phase 2 - Forecasting ARIMA.
The model that proved to be the best fit, in all of the phases, in all the tests and for all used metrics was the ARMA(2,3).
We used different metrics for errors, so we will convert them in order to compare the results.
For the regression evaluation, we used RMSE, and for the arima evaluation we have used MSFE.
Basically, RMSE is the square root of MSFT, so the modified table for best fit arima model ARMA(2,3) is:
| Metric | ARMA(2,3) |
|---|---|
| AIC | -2542.072 |
| BIC | -2512.500 |
| RMSE (1-period-ahead) | 0.0151360 |
| RMSE (5-period-ahead) | 0.0148121 |
Regression model Competitors and S&P 500 is better for each of the metric parameter.
Therefore, the Ultimate winner is regression model Competitors and S&P 500!
Statistics and Financial Data Analysis
A work by: Nikola Krivacevic, Aleksandar Milinkovic and Milos Milunovic
Entire forecasting project on github
(https://github.com/mcf-long-short/statistics-stocks-forecasting)