Introduction

The data is the Federal Housing Finance Agency (FHFA) House Price Index (HPI) which is a broad measure of the movement of single-family house prices. It is a weighted, repeat-sales index, meaning that it measures average price changes in repeat sales or refinancing on the same properties. This information is obtained by reviewing repeat mortgage transactions on single-family properties whose mortgages have been purchased or securitized by Fannie Mae or Freddie Mac.

The technical methodology for devising the index, collection, and publishing the data is at

http://www.fhfa.gov/PolicyProgramsResearch/Research/PaperDocuments/1996-03_HPI_TechDescription_N508.pdf

Dataset

This time series begins January 1991 and ends at August 2016. Both the seasonally adjusted index ‘index_sa’ and the non-seasonally adjusted index ‘index_nsa’ set the index value at 100 for January 1991. The dataset contains monthly time series from January 1991 to August 2016 for the East South Central Division of America.

We will perform analysis on the aggregate seasonally adjusted index values. The index value is begins at 100 from January 1991.

Including necessary Libraries

library(readr)
library(TSA)
## Loading required package: leaps
## Loading required package: locfit
## locfit 1.5-9.1    2013-03-22
## Loading required package: mgcv
## Loading required package: nlme
## This is mgcv 1.8-17. For overview type 'help("mgcv-package")'.
## Loading required package: tseries
## 
## Attaching package: 'TSA'
## The following object is masked from 'package:readr':
## 
##     spec
## The following objects are masked from 'package:stats':
## 
##     acf, arima
## The following object is masked from 'package:utils':
## 
##     tar
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(forecast)
## 
## Attaching package: 'forecast'
## The following objects are masked from 'package:TSA':
## 
##     fitted.Arima, plot.Arima
## The following object is masked from 'package:nlme':
## 
##     getResponse
library(rugarch)
## Loading required package: parallel
## 
## Attaching package: 'rugarch'
## The following object is masked from 'package:stats':
## 
##     sigma
library(FitAR)
## Loading required package: lattice
## Loading required package: ltsa
## Loading required package: bestglm
## 
## Attaching package: 'FitAR'
## The following object is masked from 'package:forecast':
## 
##     BoxCox
library(fGarch)
## Loading required package: timeDate
## 
## Attaching package: 'timeDate'
## The following objects are masked from 'package:TSA':
## 
##     kurtosis, skewness
## Loading required package: timeSeries
## 
## Attaching package: 'timeSeries'
## The following object is masked from 'package:zoo':
## 
##     time<-
## Loading required package: fBasics
## 
## Rmetrics Package fBasics
## Analysing Markets and calculating Basic Statistics
## Copyright (C) 2005-2014 Rmetrics Association Zurich
## Educational Software for Financial Engineering and Computational Science
## Rmetrics is free software and comes with ABSOLUTELY NO WARRANTY.
## https://www.rmetrics.org --- Mail to: info@rmetrics.org

Reading the dataset and converitng the index_sa values to timeseries

ensd_hp_index <-read_csv("C:/Users/Bhargab/Desktop/EastSouthCentralDivision.csv")
## Parsed with column specification:
## cols(
##   hpi_type = col_character(),
##   hpi_flavor = col_character(),
##   frequency = col_character(),
##   level = col_character(),
##   place_name = col_character(),
##   place_id = col_character(),
##   yr = col_integer(),
##   period = col_integer(),
##   index_nsa = col_double(),
##   index_sa = col_double()
## )
ensd_hp_index <- ensd_hp_index$index_sa

class(ensd_hp_index)
## [1] "numeric"
ensd_hp_index_ts <- ts(as.vector(ensd_hp_index), start=1991,frequency = 12)
class(ensd_hp_index_ts)
## [1] "ts"
ensd_hp_index_ts = log(ensd_hp_index_ts)

Time Series Plot House Price Index

Plotting a Time Series Plot of the data

#------------------ Time Series Plot of the data          -------------------------------#
plot(ensd_hp_index_ts,type='o',ylab= 'House Price Index',
     main="Time Series Plot of House Price Index") 

The time series plot shows an upward trend. After 2007, the series drops and later rises since 2011. Lot of points are closely related to each other and hence there is an indication of a strong correlation. The values are seasonally adjusted and hence we will not be considering the seasonal aspect of the timeseries.

ACF & PACF Plot of the Time Series

Plotting the ACF Plot of the Time Series

#------------------ ACF & PACF Plot of the Time Series ----------------------------#
acf(ensd_hp_index_ts,main="ACF Plot of House Price Index",lag.max = 72)

pacf(ensd_hp_index_ts,main="PACF Plot of House Price Index")

The ACF plot supports our conclusion of high correlation among successive points. It also shows a strong evidence of an existence of a trend which was clearly seen in the time series plot. Furthermode, the PACF plot shows one significant correlation on the plot.

Calculation of the degree of correlation on 1st Lag

#-------------------------- Calculating the degree of correlation on Ist Lag -----------#
y=ensd_hp_index_ts
x=zlag(ensd_hp_index_ts)
index = 2:length(x) 
cor(y[index],x[index])
## [1] 0.9995936

Correaltion coeficient : 0.9995806.

The correlation value is significantly high and supports our conclusion drawn from the timeseries plot.

Ensuring Stationarity

Due to the presence of an upward trend from 1990 to 2007 and further a downward trend till 2011 and again an upward trend from 2011 to 2016, the data cannot be considered to be stationary. Hence we need to apply differencing to acquire a stationary time series to enable us to fit the time series models to the data.

De-trend the data of HPI for East South Central Division

#-------------------- De-trend the data of HPI for East South Central Division--------------#

#-------------------- Ist order differencing  -------------------------------------------#
ensd_hp_index_ts_d1 = diff(ensd_hp_index_ts,differences = 1)
adf.test(ensd_hp_index_ts_d1, alternative = "stationary")
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ensd_hp_index_ts_d1
## Dickey-Fuller = -3.2168, Lag order = 6, p-value = 0.08542
## alternative hypothesis: stationary
#------------------ Time Series Plot of the first difference  ---------------------------#
plot(ensd_hp_index_ts_d1,type='o',ylab= 'House Price Index',
     main="Time Series Plot of Ist difference HPI") 

The time series plot shows a change in variance at few occasions. Also, it shows a decline and a rise further. A lot of places the points are closer to each other and hence there is an indication of a strong correlation. Also the ADF test is statistically insignificant indicating Non-stationarity. Hence we need to difference once more and check if we acquire a stationary series.

IInd order differencing

#-------------------- IInd order differencing -------------------------------------------#
ensd_hp_index_ts_d2 = diff(ensd_hp_index,differences = 2)
adf.test(ensd_hp_index_ts_d2, alternative = "stationary")
## Warning in adf.test(ensd_hp_index_ts_d2, alternative = "stationary"): p-
## value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ensd_hp_index_ts_d2
## Dickey-Fuller = -10.813, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary
#------------------ Time Series Plot of the IInd difference   ---------------------------#
plot(ensd_hp_index_ts_d2,type='o',ylab= 'House Price Index',
     main="Time Series Plot of IInd difference of HPI")

The time series plot shows a change in variance. A lot of places the points are closer to each other and hence there is an indication of a strong correlation. The ADF test is statistically significant (at 5% significance) which indicates Stationarity of the second difference time series.

ACF & PACF plot of the Stationary Time Series

ACF Plot of IInd difference

#--------------------------------ACF Plot of IInd difference ----------------------------#

acf(ensd_hp_index_ts_d2,main = 'ACF for IInd Difference HPI',lag.max = 36)

The ACF plot shows a jagged pattern with the first lag to be significant towards the negative side. Also, there are few more slightly significant lags in the higher order. In order to get a better view of this ACF plot, we can try to have a look at the Alternative Bound ACF plot.

Alternative Bound ACF

#----------------------  Alternative Bound ACF  -----------------------------------------#
acf(ensd_hp_index_ts_d2,ci.type='ma',xaxp=c(0,20,10),
    main="Alternative Bounds for IInd Difference HPI")

The Alternative Bound ACF plot shows a Jagged pattern with the first lag to be significant towards the negative side.The alternative bound acf also shows the lag 12 to be slightly significant. Also, the 24th lag is quite significant. This makes us suspect seasonality. But, since the values are seasonally adjusted we will keep the seasonal aspect out of scope.

Plotting the PACF plot

#----------------------------------  PACF Plot  -----------------------------------------#
pacf(ensd_hp_index_ts_d2,main = 'PACF for IInd Difference HPI')

The PACF shows initial lags to be significant with a decay. But later some of the lags like 10, 22 & 23 are found to be slightly significant.

Model Specification

In order to get a clear view of the candidate models lets have a look at the EACF table

Plotting the EACF plot

eacf(ensd_hp_index_ts_d2)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x x o o o o o o o o o  x  x  o 
## 1 x x x o o o o o o o o  x  o  o 
## 2 x x o x o o o o o o o  x  o  o 
## 3 x x o x x o o o o o x  x  o  o 
## 4 x x x o o o o o o o o  o  o  o 
## 5 x o x x o o o o o o o  o  o  o 
## 6 x o o o x o o o o o o  o  o  o 
## 7 x o o o x x o o x o o  o  o  o

The eacf plot shows a decent vertex at (0,2). Hence, the candidate models selected will be ARIMA(0,2,2) ARIMA(0,2,3) & ARIMA(1,2,3).

For further investigation of the candidate models, let’s plot the BIC table.

Plotting the BIC table

res = armasubsets(y=ensd_hp_index_ts_d2,nar=8,nma=8,y.name='test',ar.method='ols')
plot(res)

The BIC table shows slight higher orders for AR and MA components at -320. We will consider the lower orders only for our analysis. The candidate models selected will be ARIMA(1,2,1) ARIMA(2,2,1) and ARIMA(3,2,1).

So, from the overall analysis the candidate models selected are as follows. 1. ARIMA(0,2,2) 2. ARIMA(0,2,3) 3. ARIMA(1,2,3) 4. ARIMA(1,2,1) 5. ARIMA(2,2,1) 6. ARIMA(3,2,1)

Model Estimation and Evaluation

We will fit each of the candidate models selected in the previous step. Model evaluation will be done on the basis of the AIC, BIC and significance of the coefficients.

Model Estimation for ARIMA(0,2,2)

m022_hpi = arima(ensd_hp_index_ts,order=c(0,2,2)) 
AIC(m022_hpi)
## [1] -2293.366
BIC(m022_hpi)
## [1] -2282.195
coeftest(m022_hpi) 
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ma1 -1.425427   0.049780 -28.634 < 2.2e-16 ***
## ma2  0.525131   0.049264  10.660 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for both MA(1) and MA(2) are significant for the model ARIMA(0,2,2). AIC = -2238.063 BIC = -2226.962

Model Estimation for ARIMA(0,2,3)

m023_hpi = arima(ensd_hp_index_ts,order=c(0,2,3)) 
AIC(m023_hpi)
## [1] -2291.405
BIC(m023_hpi)
## [1] -2276.511
coeftest(m023_hpi) 
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ma1 -1.419553   0.057532 -24.6741 < 2.2e-16 ***
## ma2  0.508687   0.096359   5.2791 1.298e-07 ***
## ma3  0.011649   0.058857   0.1979    0.8431    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for both MA(1) and MA(2) are significant. But, MA(3) is insignificant for the model ARIMA(0,2,3)

AIC = -2236.07 BIC = -2221.268

Model Estimation for ARIMA(1,2,3)

m123_hpi = arima(ensd_hp_index_ts,order=c(1,2,3))
AIC(m123_hpi)
## [1] -2291.755
BIC(m123_hpi)
## [1] -2273.137
coeftest(m123_hpi)  
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.931035   0.044877 -20.7462 < 2.2e-16 ***
## ma1 -0.473879   0.060861  -7.7863 6.902e-15 ***
## ma2 -0.848178   0.046764 -18.1375 < 2.2e-16 ***
## ma3  0.517523   0.048800  10.6050 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for all AR(1), MA(1) MA(2) and MA(3) are significant for the model ARIMA(1,2,3).

AIC = -2236.191 BIC = -2217.688

Model Estimation for ARIMA(1,2,1)

m121_hpi = arima(ensd_hp_index_ts,order=c(1,2,1))
AIC(m121_hpi)
## [1] -2273.219
BIC(m121_hpi)
## [1] -2262.048
coeftest(m121_hpi) 
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.414054   0.053599  -7.7251 1.118e-14 ***
## ma1 -0.891116   0.022397 -39.7876 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for both AR(1) and MA(1) are significant for the model ARIMA(1,2,1).

AIC = -2218.742 BIC = -2207.64

Model Estimation for ARIMA(2,2,1)

m221_hpi = arima(ensd_hp_index_ts,order=c(2,2,1))
AIC(m221_hpi)
## [1] -2286.143
BIC(m221_hpi)
## [1] -2271.248
coeftest(m221_hpi) 
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.538376   0.060247  -8.9362 < 2.2e-16 ***
## ar2 -0.237084   0.059657  -3.9741 7.065e-05 ***
## ma1 -0.849125   0.029420 -28.8620 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for all AR(1) AR(2) and MA(1) are significant for the model ARIMA(2,2,1).

AIC = -2231.219 BIC = -2216.417

Model Estimation for ARIMA(3,2,1)

m321_hpi = arima(ensd_hp_index_ts,order=c(3,2,1)) 
AIC(m321_hpi)
## [1] -2286.468
BIC(m321_hpi)
## [1] -2267.85
coeftest(m321_hpi)  
## 
## z test of coefficients:
## 
##      Estimate Std. Error  z value  Pr(>|z|)    
## ar1 -0.581442   0.066324  -8.7667 < 2.2e-16 ***
## ar2 -0.308873   0.075588  -4.0863 4.384e-05 ***
## ar3 -0.098225   0.063807  -1.5394    0.1237    
## ma1 -0.824080   0.036938 -22.3097 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficients for AR(1) AR(2) and MA(1) are significant. But, AR(3) is insignificant for the model ARIMA(3,2,1).

AIC = -2231.589 BIC = -2213.086

Model Comparison

Looking at the significance of the coefficients and the AIC and BIC values the candidate models would be

1. ARIMA(0,2,2)
2. ARIMA(1,2,3)
3. ARIMA(2,2,1)
4. ARIMA(1,2,1)

In order to filter these models further. Let's try to analyse the residuals of the model selected above.

Residual Analysis

Function for Residual Analysis for ARIMA Models

# Function that performs residual analysis
residual.analysis <- function(model, std = TRUE){
  library(TSA)
  library(FitAR)
  if (std == TRUE){
    res.model = rstandard(model)
  }else{
    res.model = residuals(model)
  }
  par(mfrow=c(3,2))
  plot(res.model,type='o',ylab='Standardised residuals', 
       main="Time series plot of standardised residuals")
  abline(h=0)
  hist(res.model,main="Histogram of standardised residuals")
  qqnorm(res.model,main="QQ plot of standardised residuals")
  qqline(res.model, col = 2)
  acf(res.model,main="ACF of standardised residuals")
  print(shapiro.test(res.model))
  k=0
  LBQPlot(res.model, lag.max = length(model$residuals)-1, StartLag = k + 1, k = 0, 
          SquaredQ = FALSE)
  par(mfrow=c(1,1))
}

Calling the residual analysis function

residual.analysis(m022_hpi)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.96462, p-value = 7.855e-07
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

residual.analysis(m123_hpi)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.96439, p-value = 7.277e-07
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

residual.analysis(m221_hpi)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.96154, p-value = 2.888e-07
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

residual.analysis(m121_hpi)
## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.95824, p-value = 1.042e-07
## Warning in (ra^2)/(n - (1:lag.max)): longer object length is not a multiple
## of shorter object length

The residues of all the models fail the Shapiro Wilk test. Hence, the residues are not normally distributed. The Ljung Box Test plot shows a good seperation from the p-value reference line for the models ARIMA(0,2,2) and ARIMA(1,2,3).

The QQ-Plot showing a fat tails makes us suspect conditional heteoscedasticity in the timeseries

Since, ARIMA(0,2,2) and ARIMA(1,2,3) show a decent plot for Ljung-Box Test we will continue further with those models.

Suspecting Conditional Heteroscedaticity

McLeod-Li Test to examine conditional heteroscedasticity on Residues of ARIMA(0,2,2)

McLeod.Li.test(y=residuals(m022_hpi))

McLeod-Li Test to examine conditional heteroscedasticity on Residues of ARIMA(1,2,3)

McLeod.Li.test(y=residuals(m123_hpi))

Plots for the residues of both the models show that most of the points lie below the p-value reference line. This confirms that the time series contain a significant degree of conditional heteroscedasticity. We will try to fit a GARCH/ARCH models to the residues of the models.

Extracting the residuals from the models

m022residuals = residuals(m022_hpi)
m123residuals = residuals(m123_hpi)

We will try to estimate the orders for GARCH and ARCH components using the EACF plot of squared residues of ARIMA(0,2,2) #EACF of the squared residuals of ARIMA(0,2,2)

eacf(m022residuals^2)## GARCH(1,0) GARCH(1,1)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o x x x x o o o  x  o  x 
## 1 x o o o x x o o o o o  x  o  x 
## 2 x x o o x o o o o o o  o  o  x 
## 3 x x o o x o x o o o o  o  o  x 
## 4 x x o x x o x o o o o  o  o  x 
## 5 x x o x x o o x o o o  o  o  x 
## 6 x x o x x o o o o o o  o  o  x 
## 7 x x x x o o x o o x x  o  o  o

EACF of absolute residues of ARIMA(0,2,2)

eacf(abs(m022residuals))## GARCH(1,1) GARCH(2,1) GARCH(2,2)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 x o x x x x x x o o x  x  x  x 
## 1 x o o o o o o o o o o  o  o  o 
## 2 x x o o o o o o o o o  o  o  o 
## 3 x x o o o o o o o o o  o  o  o 
## 4 x x x o o o o o o o o  x  o  o 
## 5 x x o x x o o o o o o  x  o  o 
## 6 x x o x x o o o o o o  o  o  o 
## 7 x o x o o o o o o o o  o  o  o

Looking at the both the EACF plots the candidate models selected for residues of ARIMA(0,2,2) are GARCH(1,0) GARCH(1,1) GARCH(2,1) GARCH(2,2) #EACF of squared residues of ARIMA(1,2,3)

eacf(m123residuals^2)## GARCH(1,0) GARCH(1,1)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o o o x x x x o o o  x  o  x 
## 1 x o o o x o o o o o o  x  o  x 
## 2 x x o o x o o o o o o  o  o  x 
## 3 x x o o x o x o o o o  o  o  x 
## 4 x x o x x o x o o o o  o  o  x 
## 5 x x o x x o o x o o o  o  o  x 
## 6 x x o x x o o o o o o  o  o  x 
## 7 x x x x o o x o o o o  o  o  o

EACF of absolute residues of ARIMA(1,2,3)

eacf(abs(m123residuals))## GARCH(1,1) GARCH(2,1) GARCH(2,2)
## AR/MA
##   0 1 2 3 4 5 6 7 8 9 10 11 12 13
## 0 o o x x x x x x x o x  x  x  x 
## 1 x o o o o o o o o o o  o  o  o 
## 2 x x o o o o o o o o o  o  o  o 
## 3 x x x o o o o o o o o  o  o  o 
## 4 x x x o o o o o o o o  o  o  o 
## 5 x o o x o o o o o o o  o  o  o 
## 6 x x o x o o o o o o o  o  o  o 
## 7 x o o x o o o o o o o  o  o  o

Looking at the both the EACF plots the candidate models selected for residues of ARIMA(1,2,3) are GARCH(1,0) GARCH(1,1) GARCH(2,1) GARCH(2,2) #GARCH Model Estimation #Fitting GARCH(1,0) on the residues of ARIMA(0,2,2)

a022garch10=garch(m022residuals,order=c(0,1),trace=F)

Fitting GARCH(1,1) on the residues of ARIMA(0,2,2)

a022garch11=garch(m022residuals,order=c(1,1),trace=F)
## Warning in sqrt(pred$e): NaNs produced

Fitting GARCH(2,1) on the residues of ARIMA(0,2,2)

a022garch21=garch(m022residuals,order=c(1,2),trace=F)
## Warning in sqrt(pred$e): NaNs produced

Fitting GARCH(2,2) on the residues of ARIMA(0,2,2)

a022garch22=garch(m022residuals,order=c(2,2),trace=F) 
## Warning in garch(m022residuals, order = c(2, 2), trace = F): singular
## information

Fitting GARCH(1,0) on the residues of ARIMA(1,2,3)

a123garch10=garch(m123residuals,order=c(0,1),trace=F)

Fitting GARCH(1,1) on the residues of ARIMA(1,2,3)

a123garch11=garch(m123residuals,order=c(1,1),trace=F) 

Fitting GARCH(2,1) on the residues of ARIMA(1,2,3)

a123garch21=garch(m123residuals,order=c(1,2),trace=F) 
## Warning in sqrt(pred$e): NaNs produced

Fitting GARCH(2,2) on the residues of ARIMA(1,2,3)

a123garch22=garch(m123residuals,order=c(2,2),trace=F)
## Warning in garch(m123residuals, order = c(2, 2), trace = F): singular
## information

Residual Analysis of GARCH Models

Function to perform Residual Analysis and Coefficient Estimation on GARCH models

garchresidual.analysis <- function(model, std = TRUE){
  library(TSA)
  library(FitAR)
  
  res.model = residuals(model)
  
  par(mfrow=c(2,2))
  plot(res.model,type='o',ylab='Standardised residuals', main="Time series plot of standardised residuals")
  abline(h=0)
  hist(res.model,main="Histogram of standardised residuals")
  qqnorm(res.model,main="QQ plot of standardised residuals")
  qqline(res.model, col = 2)
  print(shapiro.test(res.model))
  print(summary(model))
  par(mfrow=c(1,1))
}

Calling the residual analysis for GARCH(2,1) on ARIMA(0,2,2)

garchresidual.analysis(a022garch21)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.9946, p-value = 0.3583
## 
## 
## Call:
## garch(x = m022residuals, order = c(1, 2), trace = F)
## 
## Model:
## GARCH(1,2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.66120 -0.59355 -0.05831  0.61438  3.05384 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 2.718e-07   6.059e-07    0.449   0.6537    
## a1 3.264e-11   8.238e-02    0.000   1.0000    
## a2 2.919e-01   1.323e-01    2.207   0.0273 *  
## b1 7.718e-01   5.876e-02   13.134   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 2.1499, df = 2, p-value = 0.3413
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.31631, df = 1, p-value = 0.5738

Shapiro Wilk Test fails to prove normality of the residues at 5% significance after fitting GARCH(1,2). Even the Jarque Bera Test supports the conclusion from Shapiro Wilk Test. Box-Ljung test also fails to find significant evidence to prove that the residues are uncorrelated.

Calling the residual analysis for GARCH(2,2) on ARIMA(0,2,2)

garchresidual.analysis(a022garch22)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.98103, p-value = 0.0004448
## 
## 
## Call:
## garch(x = m022residuals, order = c(2, 2), trace = F)
## 
## Model:
## GARCH(2,2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.23176 -0.55133 -0.06646  0.61574  4.16010 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)
## a0 7.070e-06          NA       NA       NA
## a1 2.457e-01          NA       NA       NA
## a2 2.002e-01          NA       NA       NA
## b1 1.624e-01          NA       NA       NA
## b2 2.277e-01          NA       NA       NA
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 35.432, df = 2, p-value = 2.024e-08
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 1.5545, df = 1, p-value = 0.2125

Shapiro Wilk Test => p-value 0.2394. The Shapiro Wilk test provides significant evidence to prove normality of the residues. Jarque Bera Test => p-value 0.0585. The Jarque Bera Test too complements the conclusions of Shapiro Wilk Test. Box-Ljung test => p-value 0.9654. Box-Ljung test gives a significant evidence to that the residues after the GARCH(2,2) fit are uncorrelated

Calling the residual analysis for GARCH(1,0) on ARIMA(0,2,2)

garchresidual.analysis(a022garch10)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.96629, p-value = 1.426e-06
## 
## 
## Call:
## garch(x = m022residuals, order = c(0, 1), trace = F)
## 
## Model:
## GARCH(0,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.23810 -0.61465 -0.07201  0.59207  4.60399 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 2.713e-05   2.420e-06   11.207   <2e-16 ***
## a1 1.592e-01   7.312e-02    2.177   0.0295 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 94.792, df = 2, p-value < 2.2e-16
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.06308, df = 1, p-value = 0.8017

Shapiro Wilk Test fails to prove normality of the residues after fitting GARCH(1,0). Even the Jarque Bera Test supports the conclusion from Shapiro Wilk Test.

Box-Ljung test though provides significant evidence to prove that the residues are uncorrelated.

Calling the residual analysis for GARCH(2,2) on ARIMA(0,2,2)

garchresidual.analysis(a022garch11)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.95778, p-value = 9.448e-08
## 
## 
## Call:
## garch(x = m022residuals, order = c(1, 1), trace = F)
## 
## Model:
## GARCH(1,1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3588 -0.5823 -0.0631  0.6017  2.9558 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 1.024e-14   3.970e-07    0.000  1.00000    
## a1 2.219e-01   8.331e-02    2.663  0.00774 ** 
## b1 8.371e-01   5.238e-02   15.981  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 284.8, df = 2, p-value < 2.2e-16
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.61124, df = 1, p-value = 0.4343

Shapiro Wilk Test fails to prove normality of the residues after fitting GARCH(1,1). Even the Jarque Bera Test supports the conclusion from Shapiro Wilk Test. Box-Ljung test though provides significant evidence to prove that the residues are uncorrelated. #Calling the residual analysis for GARCH(1,0) on ARIMA(1,2,3)

garchresidual.analysis(a123garch10)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.96781, p-value = 2.404e-06
## 
## 
## Call:
## garch(x = m123residuals, order = c(0, 1), trace = F)
## 
## Model:
## GARCH(0,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.23986 -0.58354 -0.06563  0.59652  4.66774 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 2.655e-05   2.373e-06   11.191   <2e-16 ***
## a1 1.717e-01   7.145e-02    2.404   0.0162 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 90.22, df = 2, p-value < 2.2e-16
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.05295, df = 1, p-value = 0.818

Shapiro Wilk Test fails to prove normality of the residues after fitting GARCH(1,0). Even the Jarque Bera Test supports the conclusion from Shapiro Wilk Test. Box-Ljung test though provides significant evidence to prove that the residues are uncorrelated. #Calling the residual analysis for GARCH(1,1) on ARIMA(1,2,3)

garchresidual.analysis(a123garch11)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.99475, p-value = 0.3771
## 
## 
## Call:
## garch(x = m123residuals, order = c(1, 1), trace = F)
## 
## Model:
## GARCH(1,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99795 -0.62922 -0.08498  0.64675  3.41328 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 2.038e-07   2.404e-07    0.848  0.39663    
## a1 7.318e-02   2.740e-02    2.671  0.00756 ** 
## b1 9.223e-01   3.133e-02   29.440  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 4.249, df = 2, p-value = 0.1195
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.91241, df = 1, p-value = 0.3395

Shapiro Wilk Test => p-value 0.3438.The Shapiro Wilk test provides significant evidence to prove normality of the residues. Jarque Bera Test => p-value 0.0824. The Jarque Bera Test too complements the conclusions of Shapiro Wilk Test. Box-Ljung test => p-value 0.402. Box-Ljung test gives a significant evidence to that the residues after the GARCH(1,1) fit are uncorrelated.

The summary of the estimates too show that all the components of the GARCH(1,1) model are significant #Calling the residual analysis for GARCH(2,1) on ARIMA(1,2,3)

garchresidual.analysis(a123garch21)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.98919, p-value = 0.02312
## 
## 
## Call:
## garch(x = m123residuals, order = c(1, 2), trace = F)
## 
## Model:
## GARCH(1,2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.11669 -0.55211 -0.07255  0.58230  3.58727 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)    
## a0 1.955e-06   1.197e-06    1.633    0.102    
## a1 2.069e-01   1.314e-01    1.575    0.115    
## a2 2.384e-01   1.706e-01    1.397    0.162    
## b1 6.220e-01   7.616e-02    8.167 2.22e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 13.539, df = 2, p-value = 0.001148
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 2.7303, df = 1, p-value = 0.09846

Only beta1 coefficient is significant. The alpha1, beta1 coefficient estimates are insignificant. Shapiro Wilk Test => p-value 0.3773. The Shapiro Wilk test provides significant evidence to prove normality of the residues. Jarque Bera Test => p-value 0.1298. The Jarque Bera Test too complements the conclusions of Shapiro Wilk Test.

Box-Ljung test => p-value 0.874. Box-Ljung test gives a significant evidence to that the residues after the GARCH(2,1) fit are uncorrelated #Calling the residual analysis for GARCH(2,2) on ARIMA(1,2,3)

garchresidual.analysis(a123garch22)

## 
##  Shapiro-Wilk normality test
## 
## data:  res.model
## W = 0.99369, p-value = 0.2304
## 
## 
## Call:
## garch(x = m123residuals, order = c(2, 2), trace = F)
## 
## Model:
## GARCH(2,2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.8587 -0.6463 -0.0878  0.6410  3.4776 
## 
## Coefficient(s):
##     Estimate  Std. Error  t value Pr(>|t|)
## a0 3.783e-07          NA       NA       NA
## a1 1.074e-13          NA       NA       NA
## a2 8.875e-02          NA       NA       NA
## b1 7.897e-01          NA       NA       NA
## b2 1.078e-01          NA       NA       NA
## 
## Diagnostic Tests:
##  Jarque Bera Test
## 
## data:  Residuals
## X-squared = 4.7792, df = 2, p-value = 0.09167
## 
## 
##  Box-Ljung test
## 
## data:  Squared.Residuals
## X-squared = 0.06437, df = 1, p-value = 0.7997

Shapiro Wilk Test fails to prove normality of the residues after fitting GARCH(2,2). Even the Jarque Bera Test supports the conclusion from Shapiro Wilk Test. However, Box-Ljung test though provides significant evidence to prove that the residues are uncorrelated.

Final Verdict

Looking at all the models that are fit for GARCH model specification, the best model selected is ARIMA(1,2,3) + GARCH(1,1).

The above model is the best as all its coefficients are significant and we have significant evidence that the residues are uncorrelated and normally distributed.

Forecasting for next 24 months

We will use the model to forecast for next 24 months.

spec = ugarchspec(variance.model = list(model="sGARCH",garchOrder=c(1,1)),
                  mean.model = list(armaOrder=c(1,3)))
mugarchfit = ugarchfit(spec,ensd_hp_index_ts_d2)
coef(mugarchfit)
##           mu          ar1          ma1          ma2          ma3 
##  0.002212167 -0.964952483 -0.426686040 -0.873817337  0.504498774 
##        omega       alpha1        beta1 
##  0.004045258  0.089563943  0.909436041
forecast_24 = ugarchforecast(mugarchfit,data=diff2.log,n.ahead = 24)
forecast24ts = ts(fitted(forecast_24)[,1],start = c(2016,9),frequency = 12)
sigma(forecast_24)
##      1970-11-03 05:30:00
## T+1             1.200875
## T+2             1.201958
## T+3             1.203040
## T+4             1.204119
## T+5             1.205196
## T+6             1.206271
## T+7             1.207345
## T+8             1.208416
## T+9             1.209485
## T+10            1.210552
## T+11            1.211617
## T+12            1.212680
## T+13            1.213741
## T+14            1.214800
## T+15            1.215857
## T+16            1.216912
## T+17            1.217966
## T+18            1.219017
## T+19            1.220066
## T+20            1.221113
## T+21            1.222159
## T+22            1.223202
## T+23            1.224244
## T+24            1.225283
fcast = c(ensd_hp_index_ts_d2,forecast24ts)
class(fcast)
## [1] "numeric"
fcast = ts(fcast,start = 1991)
class(fcast)
## [1] "ts"
forecast_24_2db = diffinv(fcast,xi = ensd_hp_index_ts_d2[1:2],differences = 2)
forecast_24_2db = ts(forecast_24_2db,start=1991,frequency = 12)
plot(forecast_24_2db,type='o',ylab= 'House Price Index',
     main="Time Series Plot of House Price Index with the forecasted values") 

Conclusion

For the House Price Index dataset of the East Souch Central Division of America we performed Time Series Analysis. The data intially had a trend which we elminated after 2 original differences. After fitting the ARIMA models, evidences for conditional heteroscedasticity were observed in the residues. After further investigation we could fit GARCH component to the residuals of ARIMA model sucessfully. The best model fit through this process is ARIMA(1,2,3) + GARCH(1,1).

The House Price Index is forecasted for the next 24 months.