PCA reduces p-dimension dataset to an m-dimension dataset where p > m. It describes the orginal data using fewer variables or dimensions than initially measured. We project the original time_ser_diff data onto a new, orthogonal basis. This removes multicollinearity. R calculates PCA using singular value decomposition of the scaled and centered matrix data (rather than eigen on covarince matrix) as it provides numerical accuracy.
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.4645 1.3724 1.0866 0.65482 0.49411 0.36791 0.1731
## Proportion of Variance 0.6074 0.1883 0.1181 0.04288 0.02441 0.01354 0.0030
## Cumulative Proportion 0.6074 0.7957 0.9138 0.95669 0.98111 0.99464 0.9976
## PC8 PC9 PC10
## Standard deviation 0.13519 0.06179 0.03869
## Proportion of Variance 0.00183 0.00038 0.00015
## Cumulative Proportion 0.99947 0.99985 1.00000
Proportion of variance shows how much of the variance is explained by that principal component. The components are always sorted by how important they are, so the most important components will always be the first few.
## PC1 PC2 PC3 PC4 PC5 PC6
## [1,] -2.532253 0.9589985 -0.6856539 -0.4194933 0.6971565 -0.9574271
## [2,] -2.417471 1.1089015 -0.9856733 -0.2229814 0.6730305 -0.8642424
## [3,] -2.397437 1.0653462 -1.1664993 -0.3159122 0.5733262 -0.7703319
## [4,] -2.369972 0.9977712 -1.1642373 -0.3433385 0.6511079 -0.7066144
## [5,] -2.267717 1.0412444 -1.4965310 -0.3725687 0.5711167 -0.6006079
## [6,] -2.265047 1.0854381 -1.4703869 -0.5285890 0.5150894 -0.6393254
## [1] "matrix"
When the trends and patterns of two series are similar, then they are cointegrated. The cointegration test measures whether the residuals from a regression are stationary. Stationary residuals are cointegrated. Therefore, the fact that time series are correlated is statistically significant, and not due to some chance.It is also a Dickey-Fuler stationarity test on residuals where the null hypothesis is that the series are not cointegrated. For a stationary test, we should reject the null hypothesis of no cointegrated.
The concept of cointegrated time series arises from the idea that housing prices, securities’ prices, interest rates and other economic indicators return to their long-term average levels after significant movements in short terms. Besides the imbalance in the demand and supply of houses, prices revert to their means as housing pries are highly correlated with inflation. Further, inflation rates are highly correlated with wages or real disposable income.
Given two series x(t) and y(t), R will search for paramteres α, β, and ρ such that
y(t) = α + β * x(t) + r(t) r(t) = ρ * r(t−1) + ϵ(t)
where r(t) = residual and, ϵ(t) = series of idependently and identically distributed (i.i.d) innovations with mean = 0
If |ρ| < 1, then x(t) and y(t) are cointegrated (i.e., r(t) doesn’t contain a unit root). if |ρ| = 1, then the residual series R[t] has a unit root and follows a random walk.
## Y[i] = 0.9736 X[i] + 66.8147 + R[i], R[i] = 0.7482 R[i-1] + eps[i], eps ~ N(0,125.2861^2)
## (0.0215) (69.6048) (0.0314)
##
## R[511] = -864.9835 (t = -4.788)
##
## WARNING: The series seem cointegrated but the residuals are not AR(1).
##
## Unit Root Tests of Residuals
## Statistic p-value
## Augmented Dickey Fuller (ADF) -4.276 0.00583
## Phillips-Perron (PP) -133.946 0.00010
## Pantula, Gonzales-Farias and Fuller (PGFF) 0.736 0.00010
## Elliott, Rothenberg and Stock DF-GLS (ERSD) -4.261 0.00014
## Johansen's Trace Test (JOT) -79.007 0.00010
## Schmidt and Phillips Rho (SPR) -86.676 0.00010
##
## Variances
## SD(diff(X)) = 95.289345
## SD(diff(Y)) = 112.092429
## SD(diff(residuals)) = 133.568139
## SD(residuals) = 180.670994
## SD(innovations) = 125.286102
##
## Half life = 2.389727
## R[last] = -864.983531 (t=-4.79)
This test is different from Augmented Dickey Fuller and Phillips-Perron unit root tests.It measures the evidence of coinintegration in the residuals of two time series. In this case, I have regressed housing starts on private houses completed series.
We cannot use ADF on residuals as they are devoid of Dickey-Fuller distributions in which the null hypothesis is that cointegration is absent. Alternatively, the residuals have Phillips-Ouliaris distributions.
##
## Phillips-Ouliaris Cointegration Test
##
## data: log(time_ser_diff[, c(1, 8)])
## Phillips-Ouliaris demeaned = -94.948, Truncation lag parameter =
## 5, p-value = 0.01
H0: the 2 Series are not cointegrated Ha: the 2 Series are cointergated
The PO test rejects the null of no cointegration at the 5 percent level.The series are cointegrated. With cointegrated series we can construct a VEC model to better understand the causal relationship between the two variables.
http://blog.mindymallory.com/2018/02/basic-time-series-analysis-the-var-model-explained/
The variables with cointegration I(0) have a short term relationship as opposed to those variables with cointegration I(1), as the latter have long term relationship. Both short and long run effects are present in the short run error correction model. The first equaation is the ARDL(1,1) model that presumes a long run or steady state association between x and y. When deriving the error correction model, we can add more lagged differences of the regressor (x variable) to remove serial correlation.
y_t = δ + θ_1* y_(t−1) + δ_0 * x_t + δ_1 * x_(t−1) + ν_t,
Δy_t = −α * [y_(t−1) − β_1 − β_2 * x_(t−1)] + δ_0 * Δx_t + ν_t
We can construct the ECM for US housing starts and housing supply as:
Δb_t = −α * [b_(t−1) − β_1 − β_2 * f_(t−1)] + (δ_0 * Δf_t) + [δ_1 * Δf_(t−1)] + ν_t (estimated using codes below)
We can estimate a vector autoregression model of order 1, VAR(1) if both series are I(0). If they are I(1), we can estimate the same equations vy taking the first differences.
y_t = β_10 + β_11 * y_(t−1) + β_12 * x_(t−1) + ν
x_t = β_20 + β_21 * y_(t−1) + β_22 * x_(t−1) + ν
If both the variables in the above equations are cointegrated, we have to include the cointegration relationship in the model. This model is known as the vector error correction model. The equation below displays the cointegration relationship with stationary error terms.
y_t = β_0 + β_1 * x_t + e_t
The stationarity tests indicate that both series are I(1). To check for cointegration, I have .
hous_st = β1 * house_supply + e_ t e_t = hous_st − β1 * house_supply
##
## Time series regression with "ts" data:
## Start = 1976(6), End = 2010(6)
##
## Call:
## dynlm(formula = hous_st ~ house_supply - 1, data = train_set_diff)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2105.2 -158.2 301.5 714.0 1292.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## house_supply 212.720 5.456 38.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 710 on 408 degrees of freedom
## Multiple R-squared: 0.7884, Adjusted R-squared: 0.7879
## F-statistic: 1520 on 1 and 408 DF, p-value: < 2.2e-16
##
## Time series regression with "ts" data:
## Start = 1976(7), End = 2010(6)
##
## Call:
## dynlm(formula = d(resid_house1_dyn) ~ L(resid_house1_dyn) - 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -782.26 -97.56 14.08 100.22 578.65
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## L(resid_house1_dyn) -0.02980 0.01275 -2.338 0.0199 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 182.1 on 407 degrees of freedom
## Multiple R-squared: 0.01326, Adjusted R-squared: 0.01083
## F-statistic: 5.468 on 1 and 407 DF, p-value: 0.01985
Our test rejects the null of no cointegration, meaning that the series are cointegrated. With cointegrated series we can construct a VEC model to better understand the causal relationship between the two variables.
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -1.81 6.04 -0.300 0.765
## 2 L(resid_house1_dyn) -0.00291 0.00854 -0.341 0.733
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -0.0209 0.0279 -0.748 0.455
## 2 L(resid_house1_dyn) 0.000131 0.0000395 3.32 0.000980
In the hous_st equation, the error correction term’s coefficient : e_(t−1), is significant for housing starts, implying that changes in the housing supply influence housing starts;
Alternatively, in the house_supply equation, the the error correction coefficient is statistically insignificant, indicating that changes in housing starts impact housing supply.
## # A tibble: 5 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 a 0.0964 0.0201 4.80 2.21e- 6
## 2 b1 2814. 229. 12.3 1.06e-29
## 3 b2 -216. 36.2 -5.97 5.14e- 9
## 4 d0 -49.2 10.2 -4.83 1.92e- 6
## 5 d1 -9.89 10.5 -0.942 3.47e- 1
We can also use the error correction model to assess whether the two series are coinegrated by observing the errors of the correction part for stationarity. The model that estimates the errors are:
e_(t−1) = b_(t−1) − β_1 − β_2 * f_(t-2)
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 L(ehat) -0.00327 0.00386 -0.846 0.398
## 2 L(d(ehat)) -0.138 0.0493 -2.80 0.00537
We should compare the t-statistic of the lagged term,t= -0.846343 to test for cointegration. As the p-value of 0.3978 > 0.05, we fail to reject the null of no cointegration.
This test measures if three or more time series are cointegrated. Then, we take a linear combination of underlying series to form a stationary series. VAR(p) without drift is of the form:
x_t = μ + A_1 * x_(t−1) + … + A_p * x_(t−p) + w_t
μ = vector-valued mean of the series, A_i = coefficient matrices for each lag, w_t = multivariate Gaussian noise term with mean zero.
By differencing the series, we can form a Vector Error Correction model (VECM):
Δx_t = μ + A * x_(t−1) + Γ_1 * Δx_(t−1) +…+ Γ_p * Δx_(t−p) + w_t
Δx_t = x_t − x_(t−1) : differencing operator, A = coefficient matrix for the first lag, Γ_i = matrices for each differenced lag.
When the matrix A=0, the series are not cointegrated.
We perform an eigenvalue decomposition of A. r is the rank of the matrix A and the Johansen test checks if r = 0 or 1.
r=n−1, where n is the number of time series under test.
H0: r=0 means implies that no cointegration is present. When rank r > 0, there is a cointegrating relationship between at least two time series.
The eigenvalue decomposition outputs a set of eigenvectors. The components of the largest eigenvector is used in formulating the coefficients of the linear combination of time series. This creates stationarity. We should run the Johansen Test of Cointegration for variables which are I(1) before running ECM. If series are not cointegrated, we don’t have to perform ECM.
##
## ######################
## # Johansen-Procedure #
## ######################
##
## Test type: trace statistic , with linear trend
##
## Eigenvalues (lambda):
## [1] 0.14624891 0.05501741 0.01938735
##
## Values of teststatistic and critical values of test:
##
## test 10pct 5pct 1pct
## r <= 2 | 9.97 6.50 8.18 11.65
## r <= 1 | 38.77 15.66 17.95 23.52
## r = 0 | 119.25 28.71 31.52 37.22
##
## Eigenvectors, normalised to first column:
## (These are the cointegration relations)
##
## hous_st.l2 pvt_house_comp.l2 house_supply.l2
## hous_st.l2 1.0000000 1.0000000 1.000000
## pvt_house_comp.l2 -0.9976467 -0.6774901 -12.345973
## house_supply.l2 0.1172957 0.9884285 3.759475
##
## Weights W:
## (This is the loading matrix)
##
## hous_st.l2 pvt_house_comp.l2 house_supply.l2
## hous_st.d 0.02084387 -0.083670630 0.0010979229
## pvt_house_comp.d 0.21339732 -0.002545864 0.0006017081
## house_supply.d 0.04838887 -0.011902500 -0.0027458166
The largest eigenvalue generated by the test is 0.14624891.
Next, the output shows the trace test statistic for the three hypotheses of r ≤ 2, r ≤ 1 and r = 0. At all these three levels, the test statistic exceeds the 0.05 significance level. For instance, when r = 0, 119.25 > 31.52. Similarly, in the second test we test the null hypothesis for r ≤ 1 against the alternative hypothesis of r > 1. As 38.77 > 17.95, we reject r ≤ 1, i.e. the null hypothesis of no cointegration. Thus, the matrix’ rank is 2 and the series will become stationary after using a linear combination of three time series.
We can make a linear combination by using components of eigenvectors associated with the largets eigenvalue of 0.14624891. Correspondingly, we use vectors under the column hous_st.l2 which are (1.000000,-0.9976467 , 0.1172957) to obtain a stationary series.
##
## Augmented Dickey-Fuller Test
##
## data: linear_series
## Dickey-Fuller = -4.2937, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
The p-value in the Dickey-Fuller test is 0.01 < 0.05. So, we reject the null hypothesis of unit root and conclude that the series formed from the linear combination is stationary.
We treat all variables symmetrically in VAR i.e we model them in such a way that these endogenous variables equally impact each other.
As a generalization of the univariate autoregressive model, it forecasts a vector of time series.The system has one equation per variable. The right hand side of each equation has lags and a constant of all the variables.
For a stationary series, we can directly fit VAR to the data and forecast. This is called “VAR in levels”. Otherwise, we difference the non-stationary data first and then fit the model. The resulting model is called “VAR in differences.” Using leveled variables (which are stationary) in VAR models can result in spurious regression. But, differenced variables will remedy the problem. In both instances, we use the concept of least squares to estimate the model.
Moreover, a non-stationary series could be cointegrated. This implies that there is a linear combination of variables which is stationary. In this scenario, we should make a vector error correction model i.e. a VAR model with an error correction mechanism
The VAR model can be used when the variables under study are I(1) but not cointegrated.
Δy_t = [β_11 * Δy_(t−1)] + [β_12 * Δx_(t−1)] + ν Δx_t = [β_21 * Δy_(t−1)] + [β_22 * Δx_(t−1)] + ν
## AIC(n) HQ(n) SC(n) FPE(n)
## 5 4 3 5
The R output shows the lag length selected by each of the information criteria available in the vars package. From the above results, we choose 3 as a lag parameter:p as = 5 minimizes AIC,and FPE. We construct multivariate order 3 VAR model, VAR(5).
R estimates VAR using OLS equation where the model is of the form: y_t = A_1y_(t-1) + …. + A_p y_(t-p) + CD_t + u_t
where y_t is a K * 1 vector of endogenous variables and u_t assigns a spherical disturbance term of the same dimension. The coefficient matrices A_1……Ap are of dimension K * K.
##
## ===========================================================
## Dependent variable:
## ----------------------------
## Housing Starts, Income
## (1) (2)
## -----------------------------------------------------------
## hous_st.l1 0.924*** 0.026**
## (0.019) (0.012)
##
## income.l1 -0.006 0.871***
## (0.034) (0.021)
##
## const 369.634* 642.167***
## (199.954) (122.081)
##
## trend 0.252 3.050***
## (1.280) (0.781)
##
## fed_fundsR -6.722 1.831
## (4.557) (2.782)
##
## yield_sp 3.740 1.359
## (13.131) (8.017)
##
## sec_conL 0.013 0.121***
## (0.048) (0.029)
##
## unempR -8.297 2.309
## (5.497) (3.356)
##
## CPI -1.348 -3.233**
## (2.513) (1.534)
##
## -----------------------------------------------------------
## Observations 510 510
## R2 0.928 1.000
## Adjusted R2 0.926 0.999
## Residual Std. Error (df = 501) 110.069 67.202
## F Statistic (df = 8; 501) 802.658*** 125,643.400***
## ===========================================================
## Note: *p<0.1; **p<0.05; ***p<0.01
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var1
## Chi-squared = 106.87, df = 36, p-value = 5.972e-09
H0: no serial correlation H1: serial correlation is present
The residuals for this model pass the test for serial correlation.
We use F-test on the lags of other variables to implement the granger causality. It tests whether the lags of variables are useful in forecasting housing starts and vice versa.
For instance, fed_fundsR can granger cause hous_st if hous_st can be more accurately predicted by the lagged values of both hous_st and fed_fundsR, rather than the lagged values of hous_st alone. Thus, the granger causality test examines if lagged values of a variable can enhance the forecasts of another variable.
## $Granger
##
## Granger causality H0: hous_st do not Granger-cause income
##
## data: VAR object var1
## F-Test = 4.9632, df1 = 1, df2 = 1002, p-value = 0.02611
##
##
## $Instant
##
## H0: No instantaneous causality between: hous_st and income
##
## data: VAR object var1
## Chi-squared = 4.6197, df = 1, p-value = 0.03161
## $Granger
##
## Granger causality H0: income do not Granger-cause hous_st
##
## data: VAR object var1
## F-Test = 0.027988, df1 = 1, df2 = 1002, p-value = 0.8672
##
##
## $Instant
##
## H0: No instantaneous causality between: income and hous_st
##
## data: VAR object var1
## Chi-squared = 4.6197, df = 1, p-value = 0.03161
Three of the four results have sufficiently small p-value and they indicates that we can reject null hypothesis: they Granger-cause others. The regressors have information to predict today’s housing starts.
Alternatively, we fail to reject the null hypothesis that income do not Granger-cause hous_st. This means income does not play much role in predicting today’s housing starts.
The ARCH-LM test with q lags checks for the presence of ARCH effects at lags 1 to q. It tests if the coefficients α_1,…. α_q in the equation below:
x^2_t = α_0 + α_1 * x^2_(t-1) +….+ α_q * x^2_(t-q) + ϵ_t
##
## ARCH (multivariate)
##
## data: Residuals of VAR object var1
## Chi-squared = 144.39, df = 18, p-value < 2.2e-16
When q = 2, we test for ARCH effects jointly at lags 1 and 2. H0 = α_1 = α_2 = 0 As the p-value is very small, we reject the null hypothesis and conclude that ARCH effects are present at lags 1 and 2 jointly. ARCH effects are also present at higher lag orders, implying that the data is conditionally heteroskedastic.
In IRF, we shock one variable, say income, and propagate it through the fitted VAR model for a number of periods. We can trace this through the VAR model and see if it impacts the other variables in a statistically significant way.
An impulse (shock) to housing starts at time zero has large effects the next period, and the effects enlarge over time. The dotted lines show the 95 percent interval estimates of these effects. The VAR function prints the values corresponding to the impulse response graphs.
Using the VAR model, a Forecast Error Variance Decomposition examines the impact of variables on one another. We use the forecast errors of each equation in the fitted VAR model to compute FEVD. Then, the fitted VAR model determines how much of each error realization is coming from unexpected changes (forecast errors) in the other variable.
Variance decomposition helps to interpret the VAR model. We can determine the amount of variation in the dependent variable is explained by each each independent variable. FEVD explains how a future shock in a time series changes future uncertainity in the other time series in the system. This process evolves over time, so a shock on a time series may not be important in the short run, but may be very significant in the long run.
In the first plot, we see the FEVD for housing starts. It appears that although we were borderline on whether or not to conclude that federal funds rate Granger cause housing starts, the FEVD reveals that the magnitude of the causality is tiny anyway, while that of income is greater on housing starts.
In the second plot, we see the FEVD for income. It appears that although we were borderline on whether or not to conclude that housing starts and federal funds rate Granger cause income.
## #############
## ###Model VECM
## #############
## Full sample size: 409 End sample size: 405
## Number of variables: 5 Number of estimated slope parameters 85
## AIC 6486.702 BIC 6843.048 SSR 5985445
## Cointegrating vector (estimated by ML):
## hous_st income fed_fundsR yield_sp sec_conL
## r1 1 0.891575 49.52901 -79.69626 -2.224757
##
##
## ECT Intercept
## Equation hous_st -0.0223(0.0190) 125.8165(108.1263)
## Equation income 0.0127(0.0103) -48.1334(58.5937)
## Equation fed_fundsR 4.3e-05(9.0e-05) -0.2698(0.5148)
## Equation yield_sp -5.4e-05(3.5e-05) 0.3205(0.1981)
## Equation sec_conL 0.0061(0.0009)*** -33.4027(5.2147)***
## hous_st -1 income -1
## Equation hous_st -0.3824(0.0517)*** 0.0734(0.0941)
## Equation income -0.0091(0.0280) -0.3562(0.0510)***
## Equation fed_fundsR 3.8e-05(0.0002) -2.7e-05(0.0004)
## Equation yield_sp -0.0002(9.5e-05) -0.0001(0.0002)
## Equation sec_conL -0.0074(0.0025)** -0.0062(0.0045)
## fed_fundsR -1 yield_sp -1
## Equation hous_st -22.5685(13.5130). -14.2490(35.1534)
## Equation income -10.3410(7.3227) -30.8697(19.0496)
## Equation fed_fundsR 0.2211(0.0643)*** -1.1347(0.1674)***
## Equation yield_sp 0.0047(0.0248) 0.3397(0.0644)***
## Equation sec_conL -0.3420(0.6517) -0.2964(1.6954)
## sec_conL -1 hous_st -2
## Equation hous_st -0.2307(1.0368) -0.1705(0.0539)**
## Equation income 0.3111(0.5618) 0.0368(0.0292)
## Equation fed_fundsR 0.0040(0.0049) 0.0005(0.0003).
## Equation yield_sp 2.5e-05(0.0019) -0.0002(9.9e-05)*
## Equation sec_conL 0.1392(0.0500)** -0.0008(0.0026)
## income -2 fed_fundsR -2
## Equation hous_st 0.0762(0.0966) -11.5409(13.5994)
## Equation income -0.2787(0.0523)*** 0.2188(7.3695)
## Equation fed_fundsR 0.0003(0.0005) -0.1286(0.0647)*
## Equation yield_sp -0.0002(0.0002) -0.0420(0.0249).
## Equation sec_conL 0.0054(0.0047) -0.1984(0.6559)
## yield_sp -2 sec_conL -2
## Equation hous_st -26.3782(36.1831) 0.6551(1.0374)
## Equation income -18.6359(19.6077) 0.3534(0.5622)
## Equation fed_fundsR 0.4183(0.1723)* -0.0032(0.0049)
## Equation yield_sp -0.3196(0.0663)*** 0.0007(0.0019)
## Equation sec_conL 1.2717(1.7450) 0.0900(0.0500).
## hous_st -3 income -3
## Equation hous_st -0.0117(0.0508) -0.0403(0.0943)
## Equation income 0.0093(0.0275) -0.2130(0.0511)***
## Equation fed_fundsR -0.0002(0.0002) 0.0001(0.0004)
## Equation yield_sp -2.2e-05(9.3e-05) -0.0002(0.0002)
## Equation sec_conL -0.0025(0.0024) 0.0042(0.0045)
## fed_fundsR -3 yield_sp -3
## Equation hous_st -15.6448(12.4366) 28.2351(36.7446)
## Equation income 5.8998(6.7394) 25.0770(19.9119)
## Equation fed_fundsR -0.1635(0.0592)** -0.5129(0.1749)**
## Equation yield_sp 0.0724(0.0228)** 0.1327(0.0673)*
## Equation sec_conL -0.1802(0.5998) 1.5537(1.7721)
## sec_conL -3
## Equation hous_st -0.1031(1.0183)
## Equation income 0.2590(0.5518)
## Equation fed_fundsR -0.0019(0.0048)
## Equation yield_sp 0.0023(0.0019)
## Equation sec_conL 0.1732(0.0491)***
Generalized Autoregressive Conditional Heteroskedastic, or GARCH models are useful to analyse and forecast volatility in a time series data. Univariate GARCH(1,1) helps in modeling volality and its clustering.
Financial time series possess the property of volatility clustering wherein the volatility of the variable changes over time. Technically, this behavior is called conditional heteroskedasticity. Because ARMA models don’t consider volatility clustering i.e. they are not conditionally heteroskedastic, so we need to use ARCH and GARCH models for predictions.
Such models include the Autogressive Conditional Heteroskedastic (ARCH) model and Generalised Autogressive Conditional Heteroskedastic (GARCH) model. Different forms of volatility such as sell-offs during a financial crises, can cause serially correlated heteroskedasticity. Thus, the time_ser data is conditionally heteroskedastic.
Maximum likelihood estimates most GARCH models, such as measuring relative loss or profit from trading stocks in a day. If x_t is the value of housing starts on t, then r_t=[x_t − x_(t−1)]/x_(t−1) is called the return. We observe large volatility around the 2008 financial crisis and returns that are mostly noise noise with short periods of large variability.
We also calculate the autocorrelations and partial autocorrelations for the log returns.
## An object of class "SampleAutocorrelations"
## Slot *data*:
## An object of class "Lagged1d"
## Slot *data*:
## Lag_0 Lag_1 Lag_2 Lag_3 Lag_4
## 1.000000000 -0.337696071 0.020669913 0.060895883 0.026866374
## Lag_5 Lag_6 Lag_7 Lag_8 Lag_9
## 0.014099809 0.027820931 -0.060173170 0.046525598 0.021256194
## Lag_10 Lag_11 Lag_12 Lag_13 Lag_14
## -0.061764020 0.059106842 -0.112716964 0.129317611 -0.025322512
## Lag_15 Lag_16 Lag_17 Lag_18 Lag_19
## 0.025161904 0.040911800 -0.061328537 0.003417615 0.077785312
## Lag_20 Lag_21 Lag_22 Lag_23 Lag_24
## -0.072004007 0.009488142 0.056320013 0.031118713 -0.142468122
## Lag_25 Lag_26 Lag_27
## 0.077254000 0.047208565 -0.011071597
## Slot n:
## [1] 510
## Slot varnames:
## character(0)
## Slot objectname:
## [1] "x"
## Slot *data*:
## An object of class "Lagged1d"
## Slot *data*:
## Lag_0 Lag_1 Lag_2 Lag_3 Lag_4
## 1.000000000 -0.337696071 -0.105386902 0.037692151 0.073121144
## Lag_5 Lag_6 Lag_7 Lag_8 Lag_9
## 0.060452379 0.057317377 -0.044570006 0.002237136 0.031374239
## Lag_10 Lag_11 Lag_12 Lag_13 Lag_14
## -0.043581610 0.027240530 -0.104315715 0.072712156 0.041212694
## Lag_15 Lag_16 Lag_17 Lag_18 Lag_19
## 0.061616800 0.079374862 -0.044826188 -0.043220838 0.040111390
## Lag_20 Lag_21 Lag_22 Lag_23 Lag_24
## -0.024061621 -0.014164097 0.038575007 0.099078391 -0.136704372
## Lag_25 Lag_26 Lag_27
## -0.002251135 0.072969266 0.037366431
Routine portmanteau tests, such as Ljung-Box, also reject the IID hypothesis.
##
## Ljung-Box test
##
## data: Residuals from ETS(A,N,N)
## Q* = 106.72, df = 22, p-value = 4.226e-13
##
## Model df: 2. Total lags used: 24
Here we carry out IID tests using the method of Li-McLeod:
## ChiSq DF pvalue
## [1,] 60.76776 5 8.434054e-12
## [2,] 66.36748 10 2.217507e-10
## [3,] 92.61914 20 2.573373e-11
## attr(,"method")
## [1] "LiMcLeod"
Small p-values reject the null hypothesis at 0.05 level. Rejection of the null hypothesis is often taken to mean that the data are autocorrelated. I have fit a GARCH-type model assuming that log returns are GARCH. I ahve also changed the null hypothesis to “garch” (one possible weak white noise hypothesis):
## h Q pval
## [1,] 5 44.44354 1.882400e-08
## [2,] 10 46.52845 1.150233e-06
## [3,] 20 58.15016 1.371371e-05
The low p-values give reason to reject the hypothesis that the log-returns are a GARCH white noise process. So, we should do ARMA modelling.
We have fit GARCH model(s), starting with a GARCH(1,1) model with Gaussian innovations.GARCH(1,1) considers a single autoregressive and a moving average lag. The model is:
ϵ_t = σ_t * w_t σ^2 = α_0 + α_1 * ϵ^2_(t−1) + β_1 * σ^2_(t−1)
Note that it is necessary for α1+β1<1 otherwise the series will become unstable.
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 1), data = hous_st_approx_return,
## trace = FALSE)
##
## Mean and Variance Equation:
## data ~ garch(1, 1)
## <environment: 0x55d680be9810>
## [data = hous_st_approx_return]
##
## Conditional Distribution:
## norm
##
## Coefficient(s):
## mu omega alpha1 beta1
## 7.2564e-06 6.5094e-05 5.4054e-02 9.3618e-01
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 7.256e-06 3.183e-03 0.002 0.99818
## omega 6.509e-05 5.483e-05 1.187 0.23512
## alpha1 5.405e-02 1.911e-02 2.829 0.00468 **
## beta1 9.362e-01 2.128e-02 43.993 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 573.6862 normalized: 1.124875
##
## Description:
## Thu Feb 28 16:33:06 2019 by user:
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 3.509322 0.1729658
## Shapiro-Wilk Test R W 0.9939851 0.04105379
## Ljung-Box Test R Q(10) 61.96807 1.534529e-09
## Ljung-Box Test R Q(15) 79.20523 9.757384e-11
## Ljung-Box Test R Q(20) 88.16705 1.547757e-10
## Ljung-Box Test R^2 Q(10) 30.61482 0.0006790964
## Ljung-Box Test R^2 Q(15) 44.02928 0.0001088094
## Ljung-Box Test R^2 Q(20) 48.81851 0.0003261925
## LM Arch Test R TR^2 25.59477 0.01224274
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -2.234063 -2.200852 -2.234185 -2.221043
The diagnostics imply that the standardised residuals and their squares are not IID and that the model does not accomodate ARCH effects. Nonetheless, their distribution is Gaussian (from the p-values for Jarque-Bera and Shapiro-Wilk Tests). If not Gaussian, we could try another conditional distribution.
Another possible problem is that alpha_1 + beta_1 > 0.
The persistence of a GARCH model signifies the rate at which large volatilities decay after a shock. The key statistic in GARCH(1,1) is the sum of two parameters: alpha1 and beta1.
Ideally, alpha_1 + beta_1 < 1. If, alpha_1 + beta_1 > 1, then the volatility predictions are explosive. If, alpha_1 + beta_1 = 1, then the model has exponential decay.
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~garch(1, 1), data = hous_st_return, cond.dist = c("sstd"),
## trace = FALSE)
##
## Mean and Variance Equation:
## data ~ garch(1, 1)
## <environment: 0x55d683b7da38>
## [data = hous_st_return]
##
## Conditional Distribution:
## sstd
##
## Coefficient(s):
## mu omega alpha1 beta1 skew shape
## 0.00038759 0.00249668 0.31156371 0.35249518 0.94923888 7.63204945
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu 0.0003876 0.0029247 0.133 0.894571
## omega 0.0024967 0.0006917 3.610 0.000307 ***
## alpha1 0.3115637 0.0945111 3.297 0.000979 ***
## beta1 0.3524952 0.1200780 2.936 0.003330 **
## skew 0.9492389 0.0637039 14.901 < 2e-16 ***
## shape 7.6320494 2.4286593 3.142 0.001675 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 573.4515 normalized: 1.126624
##
## Description:
## Thu Feb 28 16:33:06 2019 by user:
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 39.89308 2.174341e-09
## Shapiro-Wilk Test R W 0.9856362 6.450794e-05
## Ljung-Box Test R Q(10) 50.9149 1.810721e-07
## Ljung-Box Test R Q(15) 64.03935 5.031556e-08
## Ljung-Box Test R Q(20) 75.97338 1.873609e-08
## Ljung-Box Test R^2 Q(10) 4.51276 0.921266
## Ljung-Box Test R^2 Q(15) 50.02022 1.194994e-05
## Ljung-Box Test R^2 Q(20) 51.30893 0.0001434862
## LM Arch Test R TR^2 23.901 0.02098089
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -2.229672 -2.179781 -2.229946 -2.210110
##
## Title:
## GARCH Modelling
##
## Call:
## garchFit(formula = ~aparch(1, 1), data = hous_st_return, cond.dist = c("sstd"),
## trace = FALSE)
##
## Mean and Variance Equation:
## data ~ aparch(1, 1)
## <environment: 0x55d67b7f6570>
## [data = hous_st_return]
##
## Conditional Distribution:
## sstd
##
## Coefficient(s):
## mu omega alpha1 gamma1 beta1 delta
## -0.0012071 0.0288430 0.2863702 0.1519114 0.3885731 1.0395950
## skew shape
## 0.9169429 8.1930362
##
## Std. Errors:
## based on Hessian
##
## Error Analysis:
## Estimate Std. Error t value Pr(>|t|)
## mu -0.001207 0.003660 -0.330 0.741536
## omega 0.028843 0.007884 3.659 0.000254 ***
## alpha1 0.286370 0.078956 3.627 0.000287 ***
## gamma1 0.151911 0.224820 0.676 0.499230
## beta1 0.388573 0.121144 3.208 0.001339 **
## delta 1.039595 0.536613 1.937 0.052705 .
## skew 0.916943 0.075564 12.135 < 2e-16 ***
## shape 8.193036 2.797200 2.929 0.003400 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Log Likelihood:
## 573.9265 normalized: 1.127557
##
## Description:
## Thu Feb 28 16:33:07 2019 by user:
##
##
## Standardised Residuals Tests:
## Statistic p-Value
## Jarque-Bera Test R Chi^2 38.65526 4.03755e-09
## Shapiro-Wilk Test R W 0.985954 8.016392e-05
## Ljung-Box Test R Q(10) 53.03175 7.345548e-08
## Ljung-Box Test R Q(15) 64.66971 3.903785e-08
## Ljung-Box Test R Q(20) 76.41239 1.581817e-08
## Ljung-Box Test R^2 Q(10) 4.479591 0.9231299
## Ljung-Box Test R^2 Q(15) 51.58351 6.617759e-06
## Ljung-Box Test R^2 Q(20) 52.54272 9.485389e-05
## LM Arch Test R TR^2 22.86343 0.02890798
##
## Information Criterion Statistics:
## AIC BIC SIC HQIC
## -2.223680 -2.157158 -2.224164 -2.197596
Once a GARCH model is fit, you can forecast the returns as well as the volatility. The mean square prediction error will depend on the size of the volatility.
## meanForecast meanError standardDeviation lowerInterval upperInterval
## 1 -0.001207079 0.06516409 0.06516409 -0.1368431 0.1230186
## 2 -0.001207079 0.07154674 0.07154674 -0.1501283 0.1351862
## 3 -0.001207079 0.07542169 0.07542169 -0.1581938 0.1425732
## 4 -0.001207079 0.07777708 0.07777708 -0.1630964 0.1470634
## 5 -0.001207079 0.07920978 0.07920978 -0.1660786 0.1497947
## 6 -0.001207079 0.08008158 0.08008158 -0.1678932 0.1514566
## 7 -0.001207079 0.08061220 0.08061220 -0.1689976 0.1524682
## 8 -0.001207079 0.08093520 0.08093520 -0.1696699 0.1530839
## 9 -0.001207079 0.08113183 0.08113183 -0.1700792 0.1534588
## 10 -0.001207079 0.08125155 0.08125155 -0.1703284 0.1536870
## 11 -0.001207079 0.08132443 0.08132443 -0.1704801 0.1538259
## 12 -0.001207079 0.08136880 0.08136880 -0.1705725 0.1539105
## 13 -0.001207079 0.08139582 0.08139582 -0.1706287 0.1539620
## 14 -0.001207079 0.08141227 0.08141227 -0.1706629 0.1539934
## 15 -0.001207079 0.08142228 0.08142228 -0.1706838 0.1540125
## 16 -0.001207079 0.08142838 0.08142838 -0.1706965 0.1540241
## 17 -0.001207079 0.08143209 0.08143209 -0.1707042 0.1540312
## 18 -0.001207079 0.08143436 0.08143436 -0.1707089 0.1540355
## 19 -0.001207079 0.08143573 0.08143573 -0.1707118 0.1540381
## 20 -0.001207079 0.08143657 0.08143657 -0.1707135 0.1540397
## 21 -0.001207079 0.08143708 0.08143708 -0.1707146 0.1540407
## 22 -0.001207079 0.08143739 0.08143739 -0.1707152 0.1540413
## 23 -0.001207079 0.08143758 0.08143758 -0.1707156 0.1540416
## 24 -0.001207079 0.08143769 0.08143769 -0.1707158 0.1540418
## 25 -0.001207079 0.08143776 0.08143776 -0.1707160 0.1540420
## 26 -0.001207079 0.08143781 0.08143781 -0.1707161 0.1540421
## 27 -0.001207079 0.08143783 0.08143783 -0.1707161 0.1540421
## 28 -0.001207079 0.08143785 0.08143785 -0.1707162 0.1540421
## 29 -0.001207079 0.08143786 0.08143786 -0.1707162 0.1540422
## 30 -0.001207079 0.08143786 0.08143786 -0.1707162 0.1540422
## 31 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 32 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 33 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 34 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 35 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 36 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 37 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 38 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 39 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 40 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 41 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 42 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 43 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 44 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 45 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 46 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 47 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422
## 48 -0.001207079 0.08143787 0.08143787 -0.1707162 0.1540422