FINANCIAL
ECONOMETRICS
Homework 6
Homework 6
1 Incalss Lab
Multiple Regression Using an APT-Style Model
We work with the macro dataset:
## # A tibble: 385 × 10
## Date MICROSOFT SANDP CPI INDPRO M1SUPPLY CCREDIT BMINUSA
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1986-03-01 00:00:00 0.0955 239. 109. 56.5 624. 607. 1.5
## 2 1986-04-01 00:00:00 0.112 236. 109. 56.6 647 614. 1.4
## 3 1986-05-01 00:00:00 0.122 247. 109. 56.7 646. 622. 1.2
## 4 1986-06-01 00:00:00 0.107 251. 110. 56.5 663. 628. 1.21
## 5 1986-07-01 00:00:00 0.0990 236. 110. 56.8 673. 634. 1.28
## 6 1986-08-01 00:00:00 0.0990 253. 110. 56.7 678. 641. 1.46
## 7 1986-09-01 00:00:00 0.0981 231. 110. 56.8 684. 650. 1.31
## 8 1986-10-01 00:00:00 0.135 244. 110. 57.1 692. 657. 1.38
## 9 1986-11-01 00:00:00 0.173 249. 110. 57.4 709. 657. 1.39
## 10 1986-12-01 00:00:00 0.168 242. 110. 57.9 740. 666. 1.48
## # ℹ 375 more rows
## # ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>
Linear Regression Models (LRM) work better for stationary variables, meaning that their statistical properties such as mean and variance do not change over time. In cases where the variables are not stationary, it is common to take the first difference, or second difference of the variables to make them stationary before applying LRM.
macro$dspread = c(NA , diff ( macro$BMINUSA ) )
macro$dcredit = c(NA , diff ( macro$CCREDIT ) )
macro$dprod = c (NA , diff ( macro$INDPRO ) )
macro$dmoney = c (NA , diff ( macro$M1SUPPLY ) )
macro$inflation = c (NA , diff ( log ( macro$CPI ) ) )
macro$rterm = c (NA , diff ( macro$USTB10Y - macro$USTB3M ) )
macro$dinflation = c (NA ,100* diff ( macro$inflation ) )
macro$rsandp = c (NA ,100* diff ( log ( macro$SANDP ) ) )
macro$ermsoft = c (NA ,100* diff ( log ( macro$MICROSOFT ) ) ) - macro$USTB3M/12
macro$ersandp = macro$rsandp - macro$USTB3M /12Then, we run the multiple regression. The variables are put on the right-hand side connected with a plus sign. After that, run the summary of the model.
lm_msoft = lm ( ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney +
+ dspread + rterm , data = macro )
summary(lm_msoft)##
## Call:
## lm(formula = ermsoft ~ ersandp + dprod + dcredit + dinflation +
## dmoney + +dspread + rterm, data = macro)
##
## Residuals:
## Min 1Q Median 3Q Max
## -36.075 -4.440 -0.403 4.616 24.480
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.326002 0.475481 2.789 0.00556 **
## ersandp 1.280799 0.094354 13.574 < 2e-16 ***
## dprod -0.303032 0.736881 -0.411 0.68113
## dcredit -0.025364 0.027149 -0.934 0.35078
## dinflation 2.194670 1.264299 1.736 0.08341 .
## dmoney -0.006871 0.015568 -0.441 0.65919
## dspread 2.260064 4.140284 0.546 0.58548
## rterm 4.733069 1.715814 2.758 0.00609 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.845 on 375 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.3452, Adjusted R-squared: 0.333
## F-statistic: 28.24 on 7 and 375 DF, p-value: < 2.2e-16
Recalled: The null hypothesis that all of the slope parameters are jointly zero.
- The regression F-statistic takes a value of 28.24.
- The p-value of < 2.2e-16
\(\implies\) Null hypothesis should be rejected.
However, there are a number of parameter estimates that are not significantly different from zero – specifically those on the ‘dprod’, ‘dcredit’, ‘dmoney’ and ‘dspread’ variables (as Pr(>|t|) > 0.05).
\(\implies\) Test \(H_0\): the parameters on these four variables are jointly zero using an F-test.
library ( car )
linearHypothesis ( lm_msoft , c ("dprod =0","dcredit =0","dmoney =0","dspread =0") )## Linear hypothesis test
##
## Hypothesis:
## dprod = 0
## dcredit = 0
## dmoney = 0
## dspread = 0
##
## Model 1: restricted model
## Model 2: ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney + +dspread +
## rterm
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 379 23180
## 2 375 23078 4 101.88 0.4139 0.7986
The resulting F-test statistic follows an F(4, 375) distribution. The F-statistic value is 0.4139 with p-value 0.7986
\(\implies\) The null hypothesis cannot be rejected
The parameters on ‘rterm’ and ‘dinflation’ are significant at the 10% level. Hence they are not included in this F-test and the variables are retained.
2 FPT: CAPM & diagnostic tests
In this section, we will use FPT and VnIndex from 20-03-2019 to 14-02-2023 as follows:
The data of VNDINDEX:
## # A tibble: 6 × 2
## Date vn_price
## <date> <dbl>
## 1 2023-03-17 1047.
## 2 2023-03-16 1062.
## 3 2023-03-15 1040.
## 4 2023-03-14 1053.
## 5 2023-03-13 1053
## 6 2023-03-10 1056.
The data of FPT:
## # A tibble: 6 × 2
## Date fpt_price
## <date> <dbl>
## 1 2023-03-17 79.2
## 2 2023-03-16 80.5
## 3 2023-03-15 78.7
## 4 2023-03-14 78.9
## 5 2023-03-13 79.5
## 6 2023-03-10 80.6
In this exercise, we use the risk free rate r = 0.03962 (VietNam 5-years Goverment Bond) and take the log return. Doing the same with the exercise in class, we compute the excess log return for the purpose of stationary.
library(dplyr)
Riskfree = 0.03962 #VietNam 5-years Goverment Bond
FPT$rFPT = c(NA, 100*diff(log(FPT$fpt_price))) - Riskfree
VNINDEX$rVNINDEX = c(NA, 100*diff(log(VNINDEX$vn_price))) - Riskfree
data = merge(VNINDEX, FPT, by = 'Date') #VNINDEX~x; FPT~y
head(data)## Date vn_price rVNINDEX fpt_price rFPT
## 1 2019-03-20 1006.59 0.38748218 23.21 0.5220579
## 2 2019-03-21 1002.30 2.02891874 23.08 0.7433331
## 3 2019-03-22 981.78 -0.74300125 22.90 0.0477544
## 4 2019-03-25 988.71 1.86366293 22.88 1.4131901
## 5 2019-03-26 970.07 -0.01075194 22.55 -0.7906728
## 6 2019-03-27 969.79 -0.66870158 22.72 -0.6101723
lm_data = lm(rFPT ~ rVNINDEX, data = data)
summary(lm_data)##
## Call:
## lm(formula = rFPT ~ rVNINDEX, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0792 -0.6293 0.1782 0.7741 5.1706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1222 0.0414 -2.952 0.00324 **
## rVNINDEX 0.9241 0.0307 30.105 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.308 on 997 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.4762, Adjusted R-squared: 0.4757
## F-statistic: 906.3 on 1 and 997 DF, p-value: < 2.2e-16
2.1 Assumption 1
This assumption \(: E(u_t) = 0\) always holds.
2.2 Assumption 2
Testing for Heteroscedasticity
To get a first impression of the properties of the residuals, we want
to plot them.
From the plot, it is hard to see any clear pattern so we need to run the formal statistical test. The two test demonstrated here are the Breusch–Pagan and the studentized Breusch–Pagan test.
bptest ( formula ( lm_data ) , data = data , studentize = FALSE )##
## Breusch-Pagan test
##
## data: formula(lm_data)
## BP = 2.5059, df = 1, p-value = 0.1134
bptest ( formula ( lm_data ) , data = data , studentize = TRUE )##
## studentized Breusch-Pagan test
##
## data: formula(lm_data)
## BP = 1.1328, df = 1, p-value = 0.2872
The results of the first test indicate that the p-value is 0.1134, which is greater than the typical significance level of 0.05. Therefore, we fail to reject the null hypothesis that there is no heteroscedasticity in the model.
The results of the second test, which uses studentized residuals, also indicate that the p-value is greater than 0.05, meaning that there is no evidence of heteroscedasticity in the model.
\(\implies\) The variance of error is constant
2.3 Assumption 3
Check the null hypothesis \(H_0: Cov (u_i, u_j) = 0\)
dwtest(lm_data)##
## Durbin-Watson test
##
## data: lm_data
## DW = 2.0039, p-value = 0.5241
## alternative hypothesis: true autocorrelation is greater than 0
As the DW value in the range that there is no evidence of auto-correlation
\(\implies\) There is no autocorrelation.
2.4 Assumption 4
Independent variables are non-stochastic
This assumption always hold.
2.5 Assumption 5:
library(moments)
skewness(lm_data$residuals)## [1] -0.7012466
kurtosis(lm_data$residuals)## [1] 5.42406
hist ( lm_data$residuals , main = "")
box ()jarque.test(lm_data$residuals)##
## Jarque-Bera Normality Test
##
## data: lm_data$residuals
## JB = 326.47, p-value < 2.2e-16
## alternative hypothesis: greater
agostino.test(lm_data$residuals)##
## D'Agostino skewness test
##
## data: lm_data$residuals
## skew = -0.70125, z = -8.27171, p-value = 2.22e-16
## alternative hypothesis: data have a skewness
anscombe.test(lm_data$residuals)##
## Anscombe-Glynn kurtosis test
##
## data: lm_data$residuals
## kurt = 5.4241, z = 7.8152, p-value = 5.487e-15
## alternative hypothesis: kurtosis is not equal to 3
The p-values for all three tests are very small (less than 0.05), which indicates strong evidence against the null hypothesis of normality.
\(\implies\) The residuals are not normally distributed
IN CONCLUSION, THE CAPM MODEL DOESN’T SATISFY THE ASSUMPTION 5
As we find the evidence of the non-normality, then we need to stay with OLS and:
- Increase sample size (Central Limit Theorem -> statistics followsappropriate distribution)
- Often: 1 or 2 extreme residuals causes us to reject the normality assumption: outliers. An alternative is to use dummy variables to remove outliers.