Data collection

DIS - Jan 1970 - Oct 2019
monthly return = month n return/ month n-1 return
excess monthly return = monthly return - risk-free rate

Data for the factors are collected via https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html

rDIS: return of the stock price of Walt Disney Co
rDIS_ex: excess return of Walt Disney Co
rM_ex: return spread between the capitalization weighted stock market and cash
rSmB: return spread of small minus large stocks in terms of capitalization
rHmL: return spread of cheap minus expensive stocks
rMoM: return of the momentum effect which is the top winner minus bottom losers portfolios
rRmL: return spread of the most profitable firms minus the least profitable
rCmA: return spread of firms that invest conservatively minus aggressively

DISData <- read.table("stock_regression.csv",sep=',',header=TRUE)

rDIS <- DISData$rDIS
rf <- DISData$rf
rDIS_ex <- DISData$rDIS_ex
rM_ex <- DISData$rM_ex
rSmB <- DISData$rSmB
rHmL <- DISData$rHmL
rRmW <- DISData$rRmW
rCmA <- DISData$rCmA
MoM <- DISData$MoM

SP vs DIS

SPData <- read.table("SP.csv",sep=',',header=TRUE)

rSP <- SPData$rSP
temp_rDIS <- rDIS[1:597]

par(mfrow=c(1,2))
plot(rSP, temp_rDIS, main="Returns on DIS vs returns on SP500")

# ls fit
rDISSPl2 <- lsfit(rSP,temp_rDIS)
abline(rDISSPl2)

plot(rDISSPl2$residuals,type="l",main="Residual Plot of LS Regression of rDIS Against rSP")

The return of DIS and SP 500 is positively correlated and clustered around from -0.1 to 0.1.

One factor model: LS

CAPM model:

\(R =\alpha + \beta F +\varepsilon\)

\(\varepsilon\) ~ \(N(0,\sigma^2)\)

Onefactor <- lm(rDIS_ex ~ rM_ex)
summary(Onefactor)

## 
## Call:
## lm(formula = rDIS_ex ~ rM_ex)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27480 -0.03829 -0.00347  0.03416  0.31926 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.003339   0.002789   1.197    0.232    
## rM_ex       1.206888   0.061507  19.622   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06761 on 595 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3929, Adjusted R-squared:  0.3919 
## F-statistic:   385 on 1 and 595 DF,  p-value: < 2.2e-16

cor(rDIS_ex,rM_ex,use = "complete.obs")

## [1] 0.6267942

c(summary(Onefactor)$r.squared, cor(rDIS_ex,rM_ex,use = "complete.obs")^2)

## [1] 0.3928709 0.3928709

The market (excess) is significant in explaining the variation in the return of DIS.
The p-value of rM_ex is <2e-16 which is less than 5% , so rM_ex is useful in explaining the variation in the return of DIS.

\(\alpha\) is not significantly different from 0 at 5% significant level since the p-value for \(\alpha\) is 0.232 which is larger than 5%. This means that \(\alpha\) is as good as 0, the stock has earned a return adequately for the risk taken, in this case, comparing to the market return.
\(\beta\) is significantly away from 1. It is significantly larger than 1, it means that the stock is more sensitive to market flutuations.

The one factor model can explain the variation in the DIS returns by 39.29%.

Model diagnostic: LS CAPM model

par(mfrow=c(2,1))
plot(Onefactor$residuals,type="l",main="Residual Plot of LS Regression of rDIS_ex Against rM_ex ")
n <- length(rDIS_ex) - 1
plot(Onefactor$residuals[-n],Onefactor$residuals[-1],main="Residual against previous residual ")

#cor(Onefactor$residuals[-n],Onefactor$residuals[-1])

par(mfrow=c(1,2))
qqnorm(stdres(Onefactor),main="Q-Q Plot of Standardized Residuals")
abline(0,1,col="red")
qqnorm(studres(Onefactor),main="Q-Q Plot of Studentized Residuals")
abline(0,1,col="red")

From the above graphs, we can observe that the standardized/studentized residuals from one factor model is roughly i.i.d distributed. The linearity relation with the model explains the data well. However, from the normal Q-Q plots, the residuals both have heavier left and right tails than normal distribution. It might be better to model the residuals with a heavier tailed distribution, such as t distribution.

One factor Model: CAPM - LAD regression

Onefactorlad <- rq(rDIS_ex ~ rM_ex,0.5)
summary(Onefactorlad)

## 
## Call: rq(formula = rDIS_ex ~ rM_ex, tau = 0.5)
## 
## tau: [1] 0.5
## 
## Coefficients:
##             coefficients lower bd upper bd
## (Intercept) -0.00062     -0.00371  0.00443
## rM_ex        1.16757      1.04451  1.29701

Model diagnostic: LAD CAPM

# model diagnostics: LAD 
par(mfrow=c(2,1))
plot(Onefactorlad$residuals,type="l",main="Residual Plot of LAD Regression of rDIS_ex Against rM_ex ")
plot(Onefactorlad$residuals[-n],Onefactorlad$residuals[-1],main="Residual against previous residual ")

#cor(Onefactorlad$residuals[-n],Onefactorlad$residuals[-1])

We would not use LAD regression in the following as LAD regression is sensitive to one type of outliers, however the sample size is quite large, thus LAD regression would not make much differences in the results. From the model diagnostic in the LS regression, the residuals are roughly i.i.d distributed, thus we do not need to conduct the LS regression, the conduction of LS regression is simply for illustration purposes. If we found outliers, the preferred method would be using a non-linear formulation or apply a transformation, for example, taking log to the monthly return or remove suspected observations. Since the data would be the same across this report, we conclude that LAS regression would not be illustrated furthermore since the conclusion will be the same.

After conducting the LAD regression for the one factor model, the result for the coefficients of alpha and beta is similar while LAD regression has many drawbacks, it has no explicit formula for coefficient estimator and the lack of distribution theory makes us prefer LS regression.

3 factor model - LS regression

pairs(cbind(rDIS_ex,rM_ex,rSmB,rHmL))

FF3factor <- lm(rDIS_ex ~ rM_ex + rSmB + rHmL)
summary(FF3factor)

## 
## Call:
## lm(formula = rDIS_ex ~ rM_ex + rSmB + rHmL)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27180 -0.03818 -0.00287  0.03382  0.31898 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.002881   0.002820   1.021    0.307    
## rM_ex        1.229253   0.065887  18.657   <2e-16 ***
## rSmB        -0.017823   0.095114  -0.187    0.851    
## rHmL         0.113157   0.098719   1.146    0.252    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06764 on 593 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3943, Adjusted R-squared:  0.3912 
## F-statistic: 128.7 on 3 and 593 DF,  p-value: < 2.2e-16

Fama-French Five factor model - LS regression

pairs(cbind(rDIS_ex,rM_ex,rSmB,rHmL,rRmW,rCmA))

FF5factor <- lm(rDIS_ex ~ rM_ex + rSmB + rHmL + rRmW + rCmA)
summary(FF5factor)

## 
## Call:
## lm(formula = rDIS_ex ~ rM_ex + rSmB + rHmL + rRmW + rCmA)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28471 -0.03715 -0.00214  0.03315  0.32407 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.001007   0.002890   0.348 0.727676    
## rM_ex       1.275013   0.069701  18.293  < 2e-16 ***
## rSmB        0.088953   0.099841   0.891 0.373318    
## rHmL        0.044226   0.133342   0.332 0.740256    
## rRmW        0.452494   0.136627   3.312 0.000983 ***
## rCmA        0.122656   0.205197   0.598 0.550238    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06714 on 591 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.4053, Adjusted R-squared:  0.4003 
## F-statistic: 80.56 on 5 and 591 DF,  p-value: < 2.2e-16

There is a positive linear relationship between rHmL and rCmA, thus we should remove one of these factors if they are statistically significant.

The five-factor model as a whole is statistically significant as the p-value is < 2.2e-16 which is smaller than 5%.

Checking p-values for each factors in FF-5:

p-value for rM_ex is <2e-16, indicating rM_ex is statistically significant at 5% level.
p-value for rSmB is 0.373318 , indicating rSmB is not statistically significant at 5% level.
p-value for rHmL is 0.740256 , indicating rHmL is not statistically significant at 5% level.
p-value for rRmW is 0.000983, indicating rRmW is statistically significant at 5% level.
p-value for rCmA is 0.550238, indicating rCmA is not statistically significant at 5% level.
Thus, we would include rM_ex and rRmW into our model.

Carhart four-factor model - LS regression

We decided to do the four-factor model analysis that includes three factor and the momentum factor to determine whether the momentum factor is statistically significant to explain the returns of DIS.

pairs(cbind(rDIS_ex,rM_ex,rSmB,rHmL,MoM))

FF4factor <- lm(rDIS_ex ~ rM_ex + rSmB + rHmL + MoM)
summary(FF4factor)

## 
## Call:
## lm(formula = rDIS_ex ~ rM_ex + rSmB + rHmL + MoM)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.27435 -0.03805 -0.00439  0.03536  0.31960 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.003911   0.002874   1.361   0.1741    
## rM_ex        1.204067   0.067268  17.900   <2e-16 ***
## rSmB        -0.022397   0.094974  -0.236   0.8137    
## rHmL         0.068633   0.101657   0.675   0.4998    
## MoM         -0.118556   0.066523  -1.782   0.0752 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06752 on 592 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.3975, Adjusted R-squared:  0.3934 
## F-statistic: 97.64 on 4 and 592 DF,  p-value: < 2.2e-16

The Carhart Four-factor model as a whole is statistically significant, the p-value is < 2.2e-16 which is smaller than 5%.

From the Carhart Four-factor model, it includes the first 3 factors, i.e. rM_ex, rSmB and MoM. We have analysed whether it is statistically significant or not, the conclusion towards these 3 factors are the same so we will only focus on the momentum factor.

p-value for MoM is 0.0752, indicating MoM is not statistically significant at 5% level. Thus, we will not include this factor while conducting the new model. The momentum factor is not significant in this case because we have used a long-term data in analysing the factors, while in momentum strategies, financial analysts incorporate the 52-week price high/low in their buy/sell recommendations. In the medium-run, 6-12 months, stocks with high return tend to continue outperform over the next 6-12 months. While stocks with high long-term returns tend to underperform those with low long-term returns.

New model - LS

pairs(cbind(rDIS_ex,rM_ex,rRmW))

ownmodel <- lm(rDIS_ex ~ rM_ex + rRmW)
summary(ownmodel)

## 
## Call:
## lm(formula = rDIS_ex ~ rM_ex + rRmW)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.28682 -0.03704 -0.00185  0.03248  0.32419 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.001892   0.002804   0.675  0.50023    
## rM_ex       1.256516   0.062980  19.951  < 2e-16 ***
## rRmW        0.407951   0.127640   3.196  0.00147 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.06709 on 594 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.4031, Adjusted R-squared:  0.4011 
## F-statistic: 200.6 on 2 and 594 DF,  p-value: < 2.2e-16

The new model as a whole is statistically significant ,the p-value is < 2.2e-16 which is smaller than 5%.
p-value for rM_ex is <2e-16, indicating rM_ex is statistically significant at 5% level.
p-value for rRmW is 0.00147, indicating rRmW is statistically significant at 5% level.
These 2 factors are still statistically significant. Even though rRmW is added to the one factor model however the contribution is not as significant as rM_ex, the coefficient is only 0.4 compared to 1.26.

Model diagnostics: new model

par(mfrow=c(2,1))
plot(ownmodel$residuals,type="l",main="Residual Plot of LS Regression")

n <- length(rDIS_ex) - 1
plot(ownmodel$residuals[-n],ownmodel$residuals[-1],main="Residual against previous residual ")

#cor(ownmodel$residuals[-n],ownmodel$residuals[-1])

par(mfrow=c(1,2))
qqnorm(stdres(ownmodel),main="Q-Q Plot of Standardized Residuals")
abline(0,1,col="red")
qqnorm(studres(ownmodel),main="Q-Q Plot of Studentized Residuals")
abline(0,1,col="red")

The model diagnostic for all the models we conducted are similar. The standardized/ studentized residuals from one factor model is roughly i.i.d. distributed. The linearity relation with the model explains the data well. However, from normal Q-Q plots, the residuals both have heavier left and right tails than normal distribution.
We would do the comparison of models below.

round(c(summary(Onefactor)$r.squared, summary(FF3factor)$r.squared, summary(FF4factor)$r.squared, summary(FF5factor)$r.squared,  summary(ownmodel)$r.squared),5)

## [1] 0.39287 0.39425 0.39748 0.40530 0.40314

The model diagnostic for all of the above models are similar. They can all be used for the estimation of DIS stock and it is all statistically significant, thus we will choose the best model by considering the R squared and order of the model.

The Fama-French five factor model has the largest R squared, this mean that it explains the return the best, however the differences between it and the one factor model is less than 0.01. More surprisingly is that the R squared of our own model is not the highest since we combined the statistically significance factors. Although higher models can explain more of the return, however it has a trade-off. When the order gets higher, the model gets even more complicated. Therefore, lower order models are preferred.

We would choose the one factor model for the prediction or estimation of stock returns, because the R squared difference between the one factor model and five factor model doesn’t justify choosing a much higher order model. Even though an extra factor, i.e. profitability factor, is important in determining expected return but the differences between having it and not does not affect the expected return much.

Furthermore, the problems with multi-factor model is that it is A-theoretical, meaning that the result aren’t being backed up by any theory, while one factor model can be explained by CAPM. The second problem is data-snooping. We look at the past data and add them back into our model, such factors might by chance work but there is no theory to proof it will continue to work in the future.

In conclusion, the one factor model CAPM will be chosen.

Stock regression

Charlotte Tse

Can the capital asset pricing model (CAPM) and Fama-French models describle and explain stock return in the long term?

Introduction