Financial Econometrics - HOMEWORK 7

Group Members

  1. Pham Thi Truc Na - MAMAIU20098
  2. Nguyen Ngoc Kim Chi - MAMAIU20034
  3. Nguyen Ngoc Khanh Minh - MAMAIU20026
  4. Le Ngoc Yen - MAMAIU20059

Libraries

library(car)
## Loading required package: carData
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(sandwich)
library(stats)
library(aTSA)
## 
## Attaching package: 'aTSA'
## The following object is masked from 'package:graphics':
## 
##     identify
library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
## 
## Attaching package: 'forecast'
## The following object is masked from 'package:aTSA':
## 
##     forecast

Constructing ARMA Models

To build an ARMA model for the house price changes. There are three stages involved: identification, estimation and diagnostic checking. The first stage is check stationary by using Dickey-Fuller test

#Import data
library(readxl)
UKHP <- read_excel("E:/FE/UKHP.xls", col_types = c("date", "numeric"))
View(UKHP)
#Create variable: 
names(UKHP)[2]= "hp"
UKHP$dhp = c(NA, 100*diff(UKHP$hp)/UKHP$hp[1:nrow(UKHP)-1])
#Constructing ARMA Models
UKHP = UKHP [-1,] ##drop observations (NA) unless necessary

##1. Check Stationary
library(tseries)
## 
## Attaching package: 'tseries'
## The following objects are masked from 'package:aTSA':
## 
##     adf.test, kpss.test, pp.test
adf.test(UKHP$dhp)
## Warning in adf.test(UKHP$dhp): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  UKHP$dhp
## Dickey-Fuller = -5.1732, Lag order = 6, p-value = 0.01
## alternative hypothesis: stationary

The small p−value returned by the Dickey-Fuller test suggests that our data is stationary.

Estimating Autocorrelation Coefficients

The second stage is carried out by looking at the autocorrelation and partial autocorrelation coefficients to identify any structure in the data.

acf(UKHP$dhp, lag.max=12)

pacf(UKHP$dhp, lag.max=12)

-> The ACF dies away rather slowly, while only the first two PACF values seem strongly significant.

Using Information Criteria to Decide on Model Orders

Using the criterion based on the estimated standard errors, the model with the lowest value of AIC and SBIC should be chosen.

ar11 = arima(UKHP$dhp, order= c(1,0,1))
library(lmtest)
library(psych)
## 
## Attaching package: 'psych'
## The following object is masked from 'package:car':
## 
##     logit
coeftest(ar11)
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.822373   0.059625 13.7923 < 2.2e-16 ***
## ma1       -0.541725   0.087676 -6.1787 6.462e-10 ***
## intercept  0.428586   0.141410  3.0308  0.002439 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AIC(ar11)
## [1] 933.4199
AIC(ar11, k=log(nrow(UKHP)))
## [1] 948.5675
aic_table = array (NA , c (6,6,2) )
for ( ar in 0:5) {
  for ( ma in 0:5) {
    arma = arima ( UKHP$dhp , order = c ( ar ,0, ma ) )
    aic_table [ ar +1, ma +1,1] = AIC ( arma )
    aic_table [ ar +1, ma +1,2] = AIC ( arma , k = log ( nrow ( UKHP ) ) )
  }
}
## Warning in arima(UKHP$dhp, order = c(ar, 0, ma)): possible convergence problem:
## optim gave code = 1

## Warning in arima(UKHP$dhp, order = c(ar, 0, ma)): possible convergence problem:
## optim gave code = 1

## Warning in arima(UKHP$dhp, order = c(ar, 0, ma)): possible convergence problem:
## optim gave code = 1
### AIC values
aic_table[,,1]
##           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
## [1,] 1001.2637 977.4020 935.5083 931.4545 930.2949 929.9505
## [2,]  958.7914 933.4199 925.6487 923.7888 924.4999 929.3072
## [3,]  922.4600 924.4080 926.2427 926.4088 926.1038 925.6015
## [4,]  924.4016 926.4344 928.1834 925.9574 925.4296 918.6900
## [5,]  926.2610 928.2587 914.1016 918.3987 927.4242 918.0929
## [6,]  928.2454 927.6420 923.3094 918.1430 920.1053 927.6499
which.min (aic_table[,,1]) #entry (5,3) represents an ARMA(4,2) model
## [1] 17
### SBIC values
aic_table[,,2]
##           [,1]     [,2]     [,3]     [,4]     [,5]     [,6]
## [1,] 1008.8375 988.7627 950.6558 950.3889 953.0163 956.4588
## [2,]  970.1521 948.5675 944.5832 946.5102 951.0082 959.6024
## [3,]  937.6076 943.3425 948.9641 952.9171 956.3990 959.6836
## [4,]  943.3361 949.1558 954.6917 956.2526 959.5117 956.5590
## [5,]  948.9824 954.7670 944.3968 952.4808 965.2931 959.7488
## [6,]  954.7536 957.9371 957.3914 956.0119 961.7612 973.0927
which.min (aic_table[,,2])
## [1] 3
  • We see that the AIC has a value of 933.4199 and the BIC a value of 948.5675.

  • The table created by the code above is presented below with the AIC values in the first table and the SBIC values in the second table.

  • Row and column indicates start with 1, the respectuve AR and MA order represented is to be reduced by 1.

  • Therefore, in this case, the criteria choose different models: AIC selects an ARMA(4,2), while SBIC selects the smaller ARMA(2,0) model.

Forecasting Using ARMA model

Compare ARMA(2,0) and ARMA(4,2) which one is the best fit model

- Let us first estimate the ARMA(0,2) model for the time period

arma20 = arima(UKHP$dhp, order=c(2,0,0))
checkresiduals(arma20)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,0) with non-zero mean
## Q* = 5.1323, df = 8, p-value = 0.7433
## 
## Model df: 2.   Total lags used: 10
### Forecasting
ar2 = arima ( UKHP$dhp [ UKHP$Month <="2015-12-01"] , order = c (2,0,0) )
dynamic_fc = predict(ar2,n.ahead = 27)

coeftest(ar2)
## 
## z test of coefficients:
## 
##           Estimate Std. Error z value  Pr(>|z|)    
## ar1       0.235302   0.054366  4.3281 1.504e-05 ***
## ar2       0.340531   0.054500  6.2482 4.151e-10 ***
## intercept 0.441567   0.137422  3.2132  0.001313 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
static_fc = ar2$coef[3]+ ar2$coef[1]* UKHP$dhp[299:325]+ ar2$coef[2]* UKHP$dhp[298:324]

- Let us second estimate the ARMA(4,2) model for the time period

arma42 = arima(UKHP$dhp, order=c(4,0,2))
checkresiduals(arma42)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(4,0,2) with non-zero mean
## Q* = 14.429, df = 4, p-value = 0.006044
## 
## Model df: 6.   Total lags used: 10
### Forecasting
ar42 = arima ( UKHP$dhp [ UKHP$Month <="2015-12-01"] , order = c (4,0,2) )
dynamic_fc = predict(ar42,n.ahead = 27)

coeftest(ar42)
## 
## z test of coefficients:
## 
##             Estimate Std. Error   z value  Pr(>|z|)    
## ar1        1.2147060  0.0537964   22.5797 < 2.2e-16 ***
## ar2       -0.8316660  0.0877442   -9.4783 < 2.2e-16 ***
## ar3       -0.1658027  0.0878208   -1.8880  0.059030 .  
## ar4        0.3850525  0.0538454    7.1511  8.61e-13 ***
## ma1       -0.9886525  0.0098747 -100.1199 < 2.2e-16 ***
## ma2        0.9987283  0.0187421   53.2879 < 2.2e-16 ***
## intercept  0.4431573  0.1428588    3.1021  0.001922 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
static_fc = ar42$coef[3]+ ar42$coef[1]* UKHP$dhp[299:325]+ ar42$coef[2]* UKHP$dhp[298:324]

Calculate error MAE, MAPE, MSE

#Estimate ARIMA(2,0)
accuracy(ar2)
##                      ME     RMSE       MAE      MPE     MAPE      MASE
## Training set 0.00105131 1.013463 0.7716421 -20.5119 317.6721 0.7936192
##                       ACF1
## Training set -0.0004932139
#Estimate ARIMA(4,2)
accuracy(ar42)
##                         ME      RMSE       MAE       MPE     MAPE      MASE
## Training set -0.0008804006 0.9790295 0.7440369 -190.0471 490.0513 0.7652277
##                     ACF1
## Training set -0.02676992

The best ARIMA model suggested by R is ARMA(2,0). Since

  • The residuals seem to follow a white noise process (\(p_{value}>0.05\))

  • All parameters are significant, i.e. the model fits well to the data.

Forecasting the best model (2,0)

fcast = forecast(ar2, h = 2)
fcast
##     Point Forecast      Lo 80    Hi 80     Lo 95    Hi 95
## 299      0.3400591 -0.9587464 1.638865 -1.646292 2.326411
## 300      0.1804524 -1.1538241 1.514729 -1.860147 2.221052
plot(fcast)