Financial Econometrics - Homework 5

Instructor: Dr. Nguyen Phuong Anh

1 Group Members

Nguyen Minh Quan - MAMAIU19036
Lam Hue Dung - MAMAIU18060
Le Nguyen Dang Khoa - MAMAIU19008

For further discussion, please contact us via email: quannguyenuw@gmail.com.

2 Libraries

library(rio)
library(car)
library(lmtest)
library(sandwich)
library(stats)
library(aTSA)
library(forecast)

3 Linear Regression

url = 'https://raw.githubusercontent.com/QuanNguyenIU/Res_Med_Fin/main/macro.xls'
macro = rio::import(file = url)
head(macro, 6)

##         Date MICROSOFT  SANDP   CPI  INDPRO M1SUPPLY  CCREDIT BMINUSA USTB3M
## 1 1986-03-01  0.095486 238.90 108.8 56.5414    624.3 606.7990    1.50   6.76
## 2 1986-04-01  0.111979 235.52 108.6 56.5654    647.0 614.3669    1.40   6.24
## 3 1986-05-01  0.121528 247.35 108.9 56.6850    645.7 621.9152    1.20   6.33
## 4 1986-06-01  0.106771 250.84 109.5 56.4959    662.8 627.8910    1.21   6.40
## 5 1986-07-01  0.098958 236.12 109.5 56.8096    673.4 633.6083    1.28   6.00
## 6 1986-08-01  0.098958 252.93 109.7 56.7348    678.4 640.5126    1.46   5.69
##   USTB10Y
## 1    7.78
## 2    7.30
## 3    7.71
## 4    7.80
## 5    7.30
## 6    7.17

macro$dspread = c(NA, diff(macro$BMINUSA)) 
macro$dcredit = c(NA, diff(macro$CCREDIT)) 
macro$dprod = c(NA, diff(macro$INDPRO))
macro$dmoney = c(NA, diff(macro$M1SUPPLY)) 
macro$inflation = c(NA, diff(log(macro$CPI))) 
macro$rterm = c(NA, diff(macro$USTB10Y - macro$USTB3M))
macro$dinflation = c(NA, 100 * diff (macro$inflation))
macro$rsandp = c(NA, 100 * diff(log(macro$SANDP))) 
macro$ermsoft = c(NA, 100 * diff(log(macro$MICROSOFT))) - macro$USTB3M / 12 
macro$ersandp = macro$rsandp - macro$USTB3M / 12

lm_msoft = lm(ermsoft ~ ersandp + dprod + dcredit +
                dinflation + dmoney + dspread + rterm, data = macro)
summary(lm_msoft)

## 
## Call:
## lm(formula = ermsoft ~ ersandp + dprod + dcredit + dinflation + 
##     dmoney + dspread + rterm, data = macro)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.075  -4.440  -0.403   4.616  24.480 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.326002   0.475481   2.789  0.00556 ** 
## ersandp      1.280799   0.094354  13.574  < 2e-16 ***
## dprod       -0.303032   0.736881  -0.411  0.68113    
## dcredit     -0.025364   0.027149  -0.934  0.35078    
## dinflation   2.194670   1.264299   1.736  0.08341 .  
## dmoney      -0.006871   0.015568  -0.441  0.65919    
## dspread      2.260064   4.140284   0.546  0.58548    
## rterm        4.733069   1.715814   2.758  0.00609 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.845 on 375 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.3452, Adjusted R-squared:  0.333 
## F-statistic: 28.24 on 7 and 375 DF,  p-value: < 2.2e-16

Comments:

The adjusted \(R^2\) equals \(0.333,\) i.e. the variability of the dependent variable is \(33.3\%\) explained by the independent variables.
The regression \(F-\)statistic is \(28.24\) with a corresponding negligible \(p-\)value of \(<2.2\cdot10^{-16};\) so the null hypothesis that all slope parameters are jointly \(0\) should be rejected.
The parameters on ersandp, dinflation, rterm are significant at \(10\%\) confidence. As presented below, we cannot reject the null hypothesis that the parameters on dprod, dcredit, dmoney, dspread (the remaining independent variables) are jointly \(0,\) since the corresponding \(p-\)value is \(0.7986.\)

linearHypothesis(lm_msoft,
                 c('dprod = 0', 'dcredit = 0',
                   'dmoney = 0', 'dspread = 0'))

## Linear hypothesis test
## 
## Hypothesis:
## dprod = 0
## dcredit = 0
## dmoney = 0
## dspread = 0
## 
## Model 1: restricted model
## Model 2: ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney + dspread + 
##     rterm
## 
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    379 23180                           
## 2    375 23078  4    101.88 0.4139 0.7986

4 Diagnostic Tests

4.1 Multicollinearity

cor(macro[-(1:2), c('dprod', 'dcredit', 'dinflation',
                    'dmoney', 'dspread', 'rterm')])

##                  dprod      dcredit  dinflation       dmoney     dspread
## dprod       1.00000000  0.094273354 -0.14355079 -0.052514358 -0.05275628
## dcredit     0.09427335  1.000000000 -0.02460369  0.150165099  0.06281801
## dinflation -0.14355079 -0.024603694  1.00000000 -0.093571291 -0.22710010
## dmoney     -0.05251436  0.150165099 -0.09357129  1.000000000  0.17069868
## dspread    -0.05275628  0.062818012 -0.22710010  0.170698675  1.00000000
## rterm      -0.04375067 -0.004029469  0.04160626  0.003800624 -0.01762237
##                   rterm
## dprod      -0.043750669
## dcredit    -0.004029469
## dinflation  0.041606256
## dmoney      0.003800624
## dspread    -0.017622374
## rterm       1.000000000

The extreme correlations are \(0.17\) and \(-0.23,\) which are acceptable.

vif(lm_msoft)

##    ersandp      dprod    dcredit dinflation     dmoney    dspread      rterm 
##   1.045330   1.047018   1.039351   1.090154   1.062975   1.132181   1.005900

The VIF scores are all relatively small, which are acceptable.

4.2 Heteroscedasticity

plot(macro$Date[-(1:2)], lm_msoft$residuals,
     type = 'l', xlab = '', ylab = '')

There is no clear pattern of variance changing from the above plot of residuals.

bptest(formula(lm_msoft), data = macro, studentize = FALSE)

## 
##  Breusch-Pagan test
## 
## data:  formula(lm_msoft)
## BP = 6.3131, df = 7, p-value = 0.5037

bptest(formula(lm_msoft), data = macro, studentize = TRUE)

## 
##  studentized Breusch-Pagan test
## 
## data:  formula(lm_msoft)
## BP = 3.1607, df = 7, p-value = 0.8698

The Breusch\(-\)Pagan test returns a large \(p-\)value, so we should not reject the null hypothesis that the residuals have constant variance.

4.3 Autocorrelation

dwtest(lm_msoft)

## 
##  Durbin-Watson test
## 
## data:  lm_msoft
## DW = 2.0974, p-value = 0.8176
## alternative hypothesis: true autocorrelation is greater than 0

bgtest(lm_msoft, order = 10)

## 
##  Breusch-Godfrey test for serial correlation of order up to 10
## 
## data:  lm_msoft
## LM test = 4.7666, df = 10, p-value = 0.9062

Both the Durbin\(-\)Watson test and Breusch\(-\)Godfrey test suggest that the null hypothesis of non\(-\)autocorrelation should not be rejected.

5 Robust Regression

coeftest(lm_msoft, vcov. = vcovHC(lm_msoft, type = 'HC1'))

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  1.3260025  0.4590678  2.8885  0.004096 ** 
## ersandp      1.2807988  0.0929615 13.7777 < 2.2e-16 ***
## dprod       -0.3030317  0.6345495 -0.4776  0.633246    
## dcredit     -0.0253637  0.0208151 -1.2185  0.223790    
## dinflation   2.1946700  1.3068027  1.6794  0.093903 .  
## dmoney      -0.0068714  0.0109006 -0.6304  0.528835    
## dspread      2.2600645  3.4278130  0.6593  0.510088    
## rterm        4.7330689  1.7265468  2.7413  0.006412 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

coeftest(lm_msoft, vcov. = NeweyWest(lm_msoft, lag = 6,
                                     adjust = T, prewhite = F))

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  1.3260025  0.5027048  2.6377  0.008694 ** 
## ersandp      1.2807988  0.0998922 12.8218 < 2.2e-16 ***
## dprod       -0.3030317  0.5216170 -0.5809  0.561625    
## dcredit     -0.0253637  0.0223410 -1.1353  0.256975    
## dinflation   2.1946700  1.3136172  1.6707  0.095614 .  
## dmoney      -0.0068714  0.0109474 -0.6277  0.530596    
## dspread      2.2600645  2.8418132  0.7953  0.426948    
## rterm        4.7330689  1.7585143  2.6915  0.007431 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimated results are similar to obtained results from traditional linear regression.

6 ARIMA

url = 'https://raw.githubusercontent.com/QuanNguyenIU/Res_Med_Fin/main/UKHP.xls'
UKHP = rio::import(file = url)
head(UKHP, 6)

##        Month Average House Price
## 1 1991-01-01            53051.72
## 2 1991-02-01            53496.80
## 3 1991-03-01            52892.86
## 4 1991-04-01            53677.44
## 5 1991-05-01            54385.73
## 6 1991-06-01            55107.38

Z = ts(UKHP$`Average House Price`)
DHP = 100 * diff(Z) / lag(Z, -1)
plot(UKHP$`Month`[-(1:1)], DHP, type = 'l', 
     xlab = '', ylab = '')

plot(acf(DHP, plot = F)[1:12])

pacf(DHP, lag = 12)

The ACF dies away rather slowly, while only the first two PACF values seem strongly significant.

adf.test(DHP)

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -11.23    0.01
## [2,]   1  -6.40    0.01
## [3,]   2  -5.78    0.01
## [4,]   3  -5.49    0.01
## [5,]   4  -5.08    0.01
## [6,]   5  -4.43    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -12.37    0.01
## [2,]   1  -7.19    0.01
## [3,]   2  -6.53    0.01
## [4,]   3  -6.25    0.01
## [5,]   4  -5.84    0.01
## [6,]   5  -5.21    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -12.36    0.01
## [2,]   1  -7.19    0.01
## [3,]   2  -6.53    0.01
## [4,]   3  -6.25    0.01
## [5,]   4  -5.84    0.01
## [6,]   5  -5.21    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

The small \(p-\)value returned by the Dickey-Fuller test suggests that our data is stationary.

auto.arima(DHP, max.order = 10)

## Series: DHP 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2    mean
##       0.2361  0.3340  0.4275
## s.e.  0.0521  0.0523  0.1259
## 
## sigma^2 = 0.9756:  log likelihood = -457.23
## AIC=922.46   AICc=922.58   BIC=937.61

The best ARIMA model suggested by R is ARMA\((2,0)\).

fit = auto.arima(DHP, max.order = 10)
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,0) with non-zero mean
## Q* = 5.1323, df = 7, p-value = 0.6438
## 
## Model df: 3.   Total lags used: 10

The residuals seem to follow a white noise process.

coeftest(fit)

## 
## z test of coefficients:
## 
##           Estimate Std. Error z value  Pr(>|z|)    
## ar1       0.236107   0.052112  4.5308 5.877e-06 ***
## ar2       0.334036   0.052308  6.3859 1.704e-10 ***
## intercept 0.427546   0.125874  3.3966 0.0006822 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

All parameters are significant, i.e. the model fits well to the data.

fcast = forecast(fit, h = 2)
fcast

##     Point Forecast      Lo 80   Hi 80     Lo 95    Hi 95
## 328      0.1075045 -1.1583314 1.37334 -1.828424 2.043433
## 329      0.4033191 -0.8973213 1.70396 -1.585839 2.392477

plot(fcast)