FINANCIAL ECONOMETRICS
Homework 6

1 Incalss Lab

Multiple Regression Using an APT-Style Model

We work with the macro dataset:

## # A tibble: 385 × 10
##    Date                MICROSOFT SANDP   CPI INDPRO M1SUPPLY CCREDIT BMINUSA
##    <dttm>                  <dbl> <dbl> <dbl>  <dbl>    <dbl>   <dbl>   <dbl>
##  1 1986-03-01 00:00:00    0.0955  239.  109.   56.5     624.    607.    1.5 
##  2 1986-04-01 00:00:00    0.112   236.  109.   56.6     647     614.    1.4 
##  3 1986-05-01 00:00:00    0.122   247.  109.   56.7     646.    622.    1.2 
##  4 1986-06-01 00:00:00    0.107   251.  110.   56.5     663.    628.    1.21
##  5 1986-07-01 00:00:00    0.0990  236.  110.   56.8     673.    634.    1.28
##  6 1986-08-01 00:00:00    0.0990  253.  110.   56.7     678.    641.    1.46
##  7 1986-09-01 00:00:00    0.0981  231.  110.   56.8     684.    650.    1.31
##  8 1986-10-01 00:00:00    0.135   244.  110.   57.1     692.    657.    1.38
##  9 1986-11-01 00:00:00    0.173   249.  110.   57.4     709.    657.    1.39
## 10 1986-12-01 00:00:00    0.168   242.  110.   57.9     740.    666.    1.48
## # ℹ 375 more rows
## # ℹ 2 more variables: USTB3M <dbl>, USTB10Y <dbl>

Linear Regression Models (LRM) work better for stationary variables, meaning that their statistical properties such as mean and variance do not change over time. In cases where the variables are not stationary, it is common to take the first difference, or second difference of the variables to make them stationary before applying LRM.

macro$dspread = c(NA , diff ( macro$BMINUSA ) )
macro$dcredit = c(NA , diff ( macro$CCREDIT ) )
macro$dprod = c (NA , diff ( macro$INDPRO ) )
macro$dmoney = c (NA , diff ( macro$M1SUPPLY ) )
macro$inflation = c (NA , diff ( log ( macro$CPI ) ) )
macro$rterm = c (NA , diff ( macro$USTB10Y - macro$USTB3M ) )
macro$dinflation = c (NA ,100* diff ( macro$inflation ) )
macro$rsandp = c (NA ,100* diff ( log ( macro$SANDP ) ) )
macro$ermsoft = c (NA ,100* diff ( log ( macro$MICROSOFT ) ) ) - macro$USTB3M/12
macro$ersandp = macro$rsandp - macro$USTB3M /12

Then, we run the multiple regression. The variables are put on the right-hand side connected with a plus sign. After that, run the summary of the model.

lm_msoft = lm ( ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney +
                         +                    dspread + rterm , data = macro )
summary(lm_msoft)
## 
## Call:
## lm(formula = ermsoft ~ ersandp + dprod + dcredit + dinflation + 
##     dmoney + +dspread + rterm, data = macro)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.075  -4.440  -0.403   4.616  24.480 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.326002   0.475481   2.789  0.00556 ** 
## ersandp      1.280799   0.094354  13.574  < 2e-16 ***
## dprod       -0.303032   0.736881  -0.411  0.68113    
## dcredit     -0.025364   0.027149  -0.934  0.35078    
## dinflation   2.194670   1.264299   1.736  0.08341 .  
## dmoney      -0.006871   0.015568  -0.441  0.65919    
## dspread      2.260064   4.140284   0.546  0.58548    
## rterm        4.733069   1.715814   2.758  0.00609 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.845 on 375 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.3452, Adjusted R-squared:  0.333 
## F-statistic: 28.24 on 7 and 375 DF,  p-value: < 2.2e-16

Recalled: The null hypothesis that all of the slope parameters are jointly zero.

  • The regression F-statistic takes a value of 28.24.
  • The p-value of < 2.2e-16

\(\implies\) Null hypothesis should be rejected.

However, there are a number of parameter estimates that are not significantly different from zero – specifically those on the ‘dprod’, ‘dcredit’, ‘dmoney’ and ‘dspread’ variables (as Pr(>|t|) > 0.05).

\(\implies\) Test \(H_0\): the parameters on these four variables are jointly zero using an F-test.

library ( car )
linearHypothesis ( lm_msoft , c ("dprod =0","dcredit =0","dmoney =0","dspread =0") )
## Linear hypothesis test
## 
## Hypothesis:
## dprod = 0
## dcredit = 0
## dmoney = 0
## dspread = 0
## 
## Model 1: restricted model
## Model 2: ermsoft ~ ersandp + dprod + dcredit + dinflation + dmoney + +dspread + 
##     rterm
## 
##   Res.Df   RSS Df Sum of Sq      F Pr(>F)
## 1    379 23180                           
## 2    375 23078  4    101.88 0.4139 0.7986

The resulting F-test statistic follows an F(4, 375) distribution. The F-statistic value is 0.4139 with p-value 0.7986

\(\implies\) The null hypothesis cannot be rejected

The parameters on ‘rterm’ and ‘dinflation’ are significant at the 10% level. Hence they are not included in this F-test and the variables are retained.

2 FPT: CAPM & diagnostic tests

In this section, we will use FPT and VnIndex from 20-03-2019 to 14-02-2023 as follows:

The data of VNDINDEX:

## # A tibble: 6 × 2
##   Date       vn_price
##   <date>        <dbl>
## 1 2023-03-17    1047.
## 2 2023-03-16    1062.
## 3 2023-03-15    1040.
## 4 2023-03-14    1053.
## 5 2023-03-13    1053 
## 6 2023-03-10    1056.

The data of FPT:

## # A tibble: 6 × 2
##   Date       fpt_price
##   <date>         <dbl>
## 1 2023-03-17      79.2
## 2 2023-03-16      80.5
## 3 2023-03-15      78.7
## 4 2023-03-14      78.9
## 5 2023-03-13      79.5
## 6 2023-03-10      80.6

In this exercise, we use the risk free rate r = 0.03962 (VietNam 5-years Goverment Bond) and take the log return. Doing the same with the exercise in class, we compute the excess log return for the purpose of stationary.

library(dplyr)
Riskfree = 0.03962 #VietNam 5-years Goverment Bond
FPT$rFPT = c(NA, 100*diff(log(FPT$fpt_price))) - Riskfree 
VNINDEX$rVNINDEX = c(NA, 100*diff(log(VNINDEX$vn_price))) - Riskfree 

data = merge(VNINDEX, FPT, by = 'Date') #VNINDEX~x; FPT~y

head(data)
##         Date vn_price    rVNINDEX fpt_price       rFPT
## 1 2019-03-20  1006.59  0.38748218     23.21  0.5220579
## 2 2019-03-21  1002.30  2.02891874     23.08  0.7433331
## 3 2019-03-22   981.78 -0.74300125     22.90  0.0477544
## 4 2019-03-25   988.71  1.86366293     22.88  1.4131901
## 5 2019-03-26   970.07 -0.01075194     22.55 -0.7906728
## 6 2019-03-27   969.79 -0.66870158     22.72 -0.6101723
lm_data = lm(rFPT ~ rVNINDEX, data = data)
summary(lm_data)
## 
## Call:
## lm(formula = rFPT ~ rVNINDEX, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.0792 -0.6293  0.1782  0.7741  5.1706 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -0.1222     0.0414  -2.952  0.00324 ** 
## rVNINDEX      0.9241     0.0307  30.105  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.308 on 997 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.4762, Adjusted R-squared:  0.4757 
## F-statistic: 906.3 on 1 and 997 DF,  p-value: < 2.2e-16

2.1 Assumption 1

This assumption \(: E(u_t) = 0\) always holds.

2.2 Assumption 2

Testing for Heteroscedasticity

To get a first impression of the properties of the residuals, we want to plot them.

From the plot, it is hard to see any clear pattern so we need to run the formal statistical test. The two test demonstrated here are the Breusch–Pagan and the studentized Breusch–Pagan test.

bptest ( formula ( lm_data ) , data = data , studentize = FALSE )
## 
##  Breusch-Pagan test
## 
## data:  formula(lm_data)
## BP = 2.5059, df = 1, p-value = 0.1134
bptest ( formula ( lm_data ) , data = data , studentize = TRUE )
## 
##  studentized Breusch-Pagan test
## 
## data:  formula(lm_data)
## BP = 1.1328, df = 1, p-value = 0.2872
  • The results of the first test indicate that the p-value is 0.1134, which is greater than the typical significance level of 0.05. Therefore, we fail to reject the null hypothesis that there is no heteroscedasticity in the model.

  • The results of the second test, which uses studentized residuals, also indicate that the p-value is greater than 0.05, meaning that there is no evidence of heteroscedasticity in the model.

\(\implies\) The variance of error is constant

2.3 Assumption 3

Check the null hypothesis \(H_0: Cov (u_i, u_j) = 0\)

dwtest(lm_data)
## 
##  Durbin-Watson test
## 
## data:  lm_data
## DW = 2.0039, p-value = 0.5241
## alternative hypothesis: true autocorrelation is greater than 0

As the DW value in the range that there is no evidence of auto-correlation

\(\implies\) There is no autocorrelation.

2.4 Assumption 4

Independent variables are non-stochastic

This assumption always hold.

2.5 Assumption 5:

library(moments)
skewness(lm_data$residuals)
## [1] -0.7012466
kurtosis(lm_data$residuals)
## [1] 5.42406
hist ( lm_data$residuals , main = "")
box ()

jarque.test(lm_data$residuals)
## 
##  Jarque-Bera Normality Test
## 
## data:  lm_data$residuals
## JB = 326.47, p-value < 2.2e-16
## alternative hypothesis: greater
agostino.test(lm_data$residuals)
## 
##  D'Agostino skewness test
## 
## data:  lm_data$residuals
## skew = -0.70125, z = -8.27171, p-value = 2.22e-16
## alternative hypothesis: data have a skewness
anscombe.test(lm_data$residuals)
## 
##  Anscombe-Glynn kurtosis test
## 
## data:  lm_data$residuals
## kurt = 5.4241, z = 7.8152, p-value = 5.487e-15
## alternative hypothesis: kurtosis is not equal to 3

The p-values for all three tests are very small (less than 0.05), which indicates strong evidence against the null hypothesis of normality.

\(\implies\) The residuals are not normally distributed

IN CONCLUSION, THE CAPM MODEL DOESN’T SATISFY THE ASSUMPTION 5

As we find the evidence of the non-normality, then we need to stay with OLS and:

- Increase sample size (Central Limit Theorem -> statistics followsappropriate distribution)

- Often: 1 or 2 extreme residuals causes us to reject the normality assumption: outliers. An alternative is to use dummy variables to remove outliers.