Financial Econometrics - Homework 5
Instructor: Dr. Nguyen Phuong Anh
1 Group Members
- Nguyen Minh Quan - MAMAIU19036
- Lam Hue Dung - MAMAIU18060
- Le Nguyen Dang Khoa - MAMAIU19008
For further discussion, please contact us via email: quannguyenuw@gmail.com.
2 Libraries
library(rio)
library(car)
library(lmtest)
library(sandwich)
library(stats)
library(aTSA)
library(forecast)3 Linear Regression
url = 'https://raw.githubusercontent.com/QuanNguyenIU/Res_Med_Fin/main/macro.xls'
macro = rio::import(file = url)
head(macro, 6)## Date MICROSOFT SANDP CPI INDPRO M1SUPPLY CCREDIT BMINUSA USTB3M
## 1 1986-03-01 0.095486 238.90 108.8 56.5414 624.3 606.7990 1.50 6.76
## 2 1986-04-01 0.111979 235.52 108.6 56.5654 647.0 614.3669 1.40 6.24
## 3 1986-05-01 0.121528 247.35 108.9 56.6850 645.7 621.9152 1.20 6.33
## 4 1986-06-01 0.106771 250.84 109.5 56.4959 662.8 627.8910 1.21 6.40
## 5 1986-07-01 0.098958 236.12 109.5 56.8096 673.4 633.6083 1.28 6.00
## 6 1986-08-01 0.098958 252.93 109.7 56.7348 678.4 640.5126 1.46 5.69
## USTB10Y
## 1 7.78
## 2 7.30
## 3 7.71
## 4 7.80
## 5 7.30
## 6 7.17
macro$dspread = c(NA, diff(macro$BMINUSA))
macro$dcredit = c(NA, diff(macro$CCREDIT))
macro$dprod = c(NA, diff(macro$INDPRO))
macro$dmoney = c(NA, diff(macro$M1SUPPLY))
macro$inflation = c(NA, diff(log(macro$CPI)))
macro$rterm = c(NA, diff(macro$USTB10Y - macro$USTB3M))
macro$dinflation = c(NA, 100 * diff (macro$inflation))
macro$rsandp = c(NA, 100 * diff(log(macro$SANDP)))
macro$ermsoft = c(NA, 100 * diff(log(macro$MICROSOFT))) - macro$USTB3M / 12
macro$ersandp = macro$rsandp - macro$USTB3M / 12lm_msoft = lm(ermsoft ~ ersandp + dprod + dcredit +
dinflation + dmoney + dspread + rterm, data = macro)
summary(lm_msoft)##
## Call:
## lm(formula = ermsoft ~ ersandp + dprod + dcredit + dinflation +
## dmoney + dspread + rterm, data = macro)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.075 -4.440 -0.403 4.616 24.480
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.326002 0.475481 2.789 0.00556 **
## ersandp 1.280799 0.094354 13.574 < 2e-16 ***
## dprod -0.303032 0.736881 -0.411 0.68113
## dcredit -0.025364 0.027149 -0.934 0.35078
## dinflation 2.194670 1.264299 1.736 0.08341 .
## dmoney -0.006871 0.015568 -0.441 0.65919
## dspread 2.260064 4.140284 0.546 0.58548
## rterm 4.733069 1.715814 2.758 0.00609 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.845 on 375 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.3452, Adjusted R-squared: 0.333
## F-statistic: 28.24 on 7 and 375 DF, p-value: < 2.2e-16
Comments:
- The adjusted \(R^2\) equals \(0.333,\) i.e. the variability of the dependent variable is \(33.3\%\) explained by the independent variables.
- The regression \(F-\)statistic is \(28.24\) with a corresponding negligible \(p-\)value of \(<2.2\cdot10^{-16};\) so the null hypothesis that all slope parameters are jointly \(0\) should be rejected.
- The parameters on ersandp, dinflation, rterm are significant at \(10\%\) confidence. As presented below, we cannot reject the null hypothesis that the parameters on dprod, dcredit, dmoney, dspread (the remaining independent variables) are jointly \(0,\) since the corresponding \(p-\)value is \(0.7986.\)
linearHypothesis(lm_msoft,
c('dprod = 0', 'dcredit = 0',
'dmoney = 0', 'dspread = 0'))## Linear hypothesis test
##
## Hypothesis:
## dprod = 0
## dcredit = 0
## dmoney = 0
## dspread = 0
##
## Model 1: restricted model
## Model 2: ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney + dspread +
## rterm
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 379 23180
## 2 375 23078 4 101.88 0.4139 0.7986
4 Diagnostic Tests
4.1 Multicollinearity
cor(macro[-(1:2), c('dprod', 'dcredit', 'dinflation',
'dmoney', 'dspread', 'rterm')])## dprod dcredit dinflation dmoney dspread
## dprod 1.00000000 0.094273354 -0.14355079 -0.052514358 -0.05275628
## dcredit 0.09427335 1.000000000 -0.02460369 0.150165099 0.06281801
## dinflation -0.14355079 -0.024603694 1.00000000 -0.093571291 -0.22710010
## dmoney -0.05251436 0.150165099 -0.09357129 1.000000000 0.17069868
## dspread -0.05275628 0.062818012 -0.22710010 0.170698675 1.00000000
## rterm -0.04375067 -0.004029469 0.04160626 0.003800624 -0.01762237
## rterm
## dprod -0.043750669
## dcredit -0.004029469
## dinflation 0.041606256
## dmoney 0.003800624
## dspread -0.017622374
## rterm 1.000000000
The extreme correlations are \(0.17\) and \(-0.23,\) which are acceptable.
vif(lm_msoft)## ersandp dprod dcredit dinflation dmoney dspread rterm
## 1.045330 1.047018 1.039351 1.090154 1.062975 1.132181 1.005900
The VIF scores are all relatively small, which are acceptable.
4.2 Heteroscedasticity
plot(macro$Date[-(1:2)], lm_msoft$residuals,
type = 'l', xlab = '', ylab = '')There is no clear pattern of variance changing from the above plot of residuals.
bptest(formula(lm_msoft), data = macro, studentize = FALSE) ##
## Breusch-Pagan test
##
## data: formula(lm_msoft)
## BP = 6.3131, df = 7, p-value = 0.5037
bptest(formula(lm_msoft), data = macro, studentize = TRUE)##
## studentized Breusch-Pagan test
##
## data: formula(lm_msoft)
## BP = 3.1607, df = 7, p-value = 0.8698
The Breusch\(-\)Pagan test returns a large \(p-\)value, so we should not reject the null hypothesis that the residuals have constant variance.
4.3 Autocorrelation
dwtest(lm_msoft)##
## Durbin-Watson test
##
## data: lm_msoft
## DW = 2.0974, p-value = 0.8176
## alternative hypothesis: true autocorrelation is greater than 0
bgtest(lm_msoft, order = 10)##
## Breusch-Godfrey test for serial correlation of order up to 10
##
## data: lm_msoft
## LM test = 4.7666, df = 10, p-value = 0.9062
Both the Durbin\(-\)Watson test and Breusch\(-\)Godfrey test suggest that the null hypothesis of non\(-\)autocorrelation should not be rejected.
5 Robust Regression
coeftest(lm_msoft, vcov. = vcovHC(lm_msoft, type = 'HC1'))##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3260025 0.4590678 2.8885 0.004096 **
## ersandp 1.2807988 0.0929615 13.7777 < 2.2e-16 ***
## dprod -0.3030317 0.6345495 -0.4776 0.633246
## dcredit -0.0253637 0.0208151 -1.2185 0.223790
## dinflation 2.1946700 1.3068027 1.6794 0.093903 .
## dmoney -0.0068714 0.0109006 -0.6304 0.528835
## dspread 2.2600645 3.4278130 0.6593 0.510088
## rterm 4.7330689 1.7265468 2.7413 0.006412 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
coeftest(lm_msoft, vcov. = NeweyWest(lm_msoft, lag = 6,
adjust = T, prewhite = F))##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.3260025 0.5027048 2.6377 0.008694 **
## ersandp 1.2807988 0.0998922 12.8218 < 2.2e-16 ***
## dprod -0.3030317 0.5216170 -0.5809 0.561625
## dcredit -0.0253637 0.0223410 -1.1353 0.256975
## dinflation 2.1946700 1.3136172 1.6707 0.095614 .
## dmoney -0.0068714 0.0109474 -0.6277 0.530596
## dspread 2.2600645 2.8418132 0.7953 0.426948
## rterm 4.7330689 1.7585143 2.6915 0.007431 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimated results are similar to obtained results from traditional linear regression.
6 ARIMA
url = 'https://raw.githubusercontent.com/QuanNguyenIU/Res_Med_Fin/main/UKHP.xls'
UKHP = rio::import(file = url)
head(UKHP, 6)## Month Average House Price
## 1 1991-01-01 53051.72
## 2 1991-02-01 53496.80
## 3 1991-03-01 52892.86
## 4 1991-04-01 53677.44
## 5 1991-05-01 54385.73
## 6 1991-06-01 55107.38
Z = ts(UKHP$`Average House Price`)
DHP = 100 * diff(Z) / lag(Z, -1)
plot(UKHP$`Month`[-(1:1)], DHP, type = 'l',
xlab = '', ylab = '')plot(acf(DHP, plot = F)[1:12])pacf(DHP, lag = 12)The ACF dies away rather slowly, while only the first two PACF values seem strongly significant.
adf.test(DHP)## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -11.23 0.01
## [2,] 1 -6.40 0.01
## [3,] 2 -5.78 0.01
## [4,] 3 -5.49 0.01
## [5,] 4 -5.08 0.01
## [6,] 5 -4.43 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -12.37 0.01
## [2,] 1 -7.19 0.01
## [3,] 2 -6.53 0.01
## [4,] 3 -6.25 0.01
## [5,] 4 -5.84 0.01
## [6,] 5 -5.21 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -12.36 0.01
## [2,] 1 -7.19 0.01
## [3,] 2 -6.53 0.01
## [4,] 3 -6.25 0.01
## [5,] 4 -5.84 0.01
## [6,] 5 -5.21 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
The small \(p-\)value returned by the Dickey-Fuller test suggests that our data is stationary.
auto.arima(DHP, max.order = 10)## Series: DHP
## ARIMA(2,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 mean
## 0.2361 0.3340 0.4275
## s.e. 0.0521 0.0523 0.1259
##
## sigma^2 = 0.9756: log likelihood = -457.23
## AIC=922.46 AICc=922.58 BIC=937.61
The best ARIMA model suggested by R is ARMA\((2,0)\).
fit = auto.arima(DHP, max.order = 10)
checkresiduals(fit)##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,0) with non-zero mean
## Q* = 5.1323, df = 7, p-value = 0.6438
##
## Model df: 3. Total lags used: 10
The residuals seem to follow a white noise process.
coeftest(fit)##
## z test of coefficients:
##
## Estimate Std. Error z value Pr(>|z|)
## ar1 0.236107 0.052112 4.5308 5.877e-06 ***
## ar2 0.334036 0.052308 6.3859 1.704e-10 ***
## intercept 0.427546 0.125874 3.3966 0.0006822 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All parameters are significant, i.e. the model fits well to the data.
fcast = forecast(fit, h = 2)
fcast## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 328 0.1075045 -1.1583314 1.37334 -1.828424 2.043433
## 329 0.4033191 -0.8973213 1.70396 -1.585839 2.392477
plot(fcast)