Abstract

In this workshop we will learn the basics of simple regression models in the context of Finance. We will learn how to run a regression model for the Market Model and a regression model to estimate the Capital Asset Pricing Model (CAPM).

Q Simple regression model

In a simple regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

E[Ri]=α+β(RM)

We can express the same equation using BO as alpha, and B1 as market beta:

E[Ri]=β0+β1(RM)

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

r(i,t)=β0+β1∗r(M,t)+εt

Where:

εt is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σε); the error term is expected to have mean=0 and a specific standard deviation (also called volatility).

r(i,t) is the return of the stock i at time t.

r(M,t) is the market return at time t

β0 and β1 are coefficients or constants

Data download

Now it’s time to use real data to better understand this model. Download monthly prices for Alfa (ALFAA.MX) and the IPCyC (^MXX) from Yahoo from January 2015 to Dec 2019. You must use ALSEA and the IPCyC to construct your own market model). You have to:

library(quantmod)
getSymbols(c("ALFAA.MX", "^MXX"), from="2015-01-01", to= "2019-12-31", periodicity="monthly", src="yahoo")
r_ALFAA <- na.omit(diff(log(ALFAA.MX$ALFAA.MX.Adjusted)))
r_MXX <- na.omit(diff(log(MXX$MXX.Adjusted)))
all_rets <- merge(r_ALFAA, r_MXX)
colnames(all_rets) <- c("ALFAA", "MXX")

Q Visualize the relationship

Do a scatter plot putting the IPCyC returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns. Type:

plot.default(x=all_rets$MXX,y=all_rets$ALFAA)
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='blue')

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. We also add a line that better represents the relationship between the stock returns and the market returns. Type:

plot.default(x=all_rets$MXX,y=all_rets$ALFAA, xlim=c(-0.30,0.30) )
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='blue')

WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN THE REGRESSION LINE IS WHAT REPRESENTS THE POSITIVE RELATIONSHIP BETWEEN THE RETURNS IN THE MARKET AND THE STOCKS. WHEN THE MARKET RETURNS INCREASE, SO DO THE ALFA’S IN ALMOST A 1 TO 1 RELATIONSHIP.

Q Running the market regression model

Using the lm() function, run a simple regression model to see how the monthly returns of the stock are related with the market return. The first parameter of the function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).

What you will get is called The Market Regression Model. You are trying to examine how the market returns can explain stock returns from Jan 2015 to Aug 2020.

Assign your market model to an object named “reg”.

reg <- lm(r_ALFAA ~ r_MXX)
sumreg<- summary(reg)
sumreg

We can calculate the main sums of squares of a regression model. In the Note “Basics of Linear Regression Models” you can remember what are these sums of squares.

For the sum of squares of total deviations from the mean of Y (SST), you can do the following:

Calculate a variable for the mean of the dependent variable Y (in this case, the stock return):

meanY = mean(r_ALFAA)

Calculate a variable with the squared deviations of each value of Y (stock returns) from its mean, and get the sum of these values:

squared_deviations_1 <- (r_ALFAA - meanY)^2
SST = sum(squared_deviations_1)
SST
[1] 0.2491152

For the sum of squares of the regression model (SSRM) you have to use the predicted values of the regression model, also called the fitted values of the model. These values are stored in the regression object reg we created with the lm function:

The fitted (predicted) values of the regression model are stored in the fitted.values attribute of the regression object:

fittedY = reg$fitted.values

Now you can get the SSRM with a similar process we followed to get the SST. Remember that you have to get the sum of squared deviations of each fitted value from the mean of Y.

squared_deviations_2 = (fittedY-meanY)^2
SSRM = sum(squared_deviations_2)
SSRM
[1] 0.1008404

In a similar process, you can get the sum of squares for the errors (SSE). To get the SSE you have to get the sum of squares of the difference between the real values of Y (stock return) and the predicted values (fittedY).

You can compare if your calculations of sum of squares are correct by running the ANOVA function as follows:

anova(reg)

In the column Sum Sq you can see the SSRM and the SSE.

RESPOND TO THE FOLLOWING QUESTIONS:

  1. What are the standard errors of the beta coefficients? (b0 and b1) What are they for? b0 IS -0.0102618, THE STANDARD ERROR IS 0.0066429. b1 IS 1.1688907, AND THE STANDARD ERROR IS 0.1877383.THE STANDARD ERROR OF A COEFFICIENT IS WHAT ONE CAN EXPECT AS THE STANDARD DEVIATION OF THE -COEFFICIENT-. REGRESSION COEFFICIENTS ARE CONSTANTLY CHANGING, SO THE STANDARD ERROR OF A COEFFICIENT GIVES US INFORMATION ABOUT THE RATE OF CHANGE OF A COEFFICIENT FROM ITS MEAN VALUE. THESE COEFFICIENTS HELP US CALCULATE THE STANDARD ERROR AND FROM THAT DERIVE TO THE CONFIDENCE INTERVAL.

  2. What is the total sum of squares (SST) ? (provide the result, and explain the formula) THE SST IS THE SUM OF SQUARES TOTAL OF THE REGRESSION MODEL.

THE TOTAL SUM OF SQUARES (SST) IS EQUAL TO SSRM + SSE.

sumsquares <- anova(reg)
sumsquares
Analysis of Variance Table

Response: r_ALFAA
          Df  Sum Sq  Mean Sq F value    Pr(>F)    
r_MXX      1 0.10084 0.100840  38.765 6.113e-08 ***
Residuals 57 0.14827 0.002601                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

SSE: 0.1482748 SSRM: 0.1008404 SST: 0.2491151

  1. What is the sum of squared errors (SSE) ? (provide the result, and explain the formula)
SSE <- SST - SSRM
SSE
[1] 0.1482748

AS WE DID IN THE EXERCISE ABOVE, SSE IS EQUAL TO .01482748

  1. What is the sum of squared regression differences (SSR) ? (provide the result and explain the formula)

AS WE DID IN THE EXERCISE ABOVE, SSR IS EQUAL TO 0.1008404 :/

  1. What is the coefficient of determination of the regression (the R-squared)? (provide the result and explain the formula) WE CAN OBTAIN R-SQUARED DIRECTLY FROM THE REGRESSION MODEL.
summary(reg)

Call:
lm(formula = r_ALFAA ~ r_MXX)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.097701 -0.036862 -0.004467  0.030768  0.146265 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.010262   0.006643  -1.545    0.128    
r_MXX        1.168891   0.187738   6.226 6.11e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.051 on 57 degrees of freedom
Multiple R-squared:  0.4048,    Adjusted R-squared:  0.3944 
F-statistic: 38.77 on 1 and 57 DF,  p-value: 6.113e-08

HERE WE CAN TELL THAT THE MULTIPLE R-SQUARED ARE EQUIVALENT TO 0.4048.

  1. Interpret the results of the beta coefficients (b0 and b1) and their corresponding t-values and p-values with your own words.

BETA ONE IS EQUAL TO 1.1688907, WHICH MEANS IT HAS A POSITIVE RELATIONSHIP TO THE MARKET RETURNS. ONE CAN SEE THAT ALFA IS RISKIER THAN THE MARKET SINCE THE BETA ONE COEFFICIENT IS HIGHER THAN 1, AND THE RELATIONSHIP IS DIRECTLY PROPORTIONAL ON A 1 TO 1.17% RATIO. THE P-VALUE IS 0.000000028, MAKING AN ERROR WHEN WE CONCLUDE TO REJECT THE NULL HYPOTHESIS IS ALMOST 0.

  1. Estimate an approximate 95% confidence interval for b0 and b1 and interpret them

AROUND 95% OF THE TIME, B1 CAN MOVE FROM 0.7934141 TO 1.5443674. ALFA RETURNS ARE RISKIER THAN THE MARKET SINCE BETA 1 CAN MOVE FROM LESS THAN ONE TO OVER ONE, BUT, WE DO NOT HAVE ENOUGH INDICATORS THAT ALFA RETURNS ARE RISKIER THAN THE MARKET RETURNS. T VALUE OF THE TEST IS LESS THAN 2, THUS WE CANNOT REJECT THE NULL HYPOTHESIS THAT SATES THAT BETA 1 EQUALS 1.

AROUND 95% OF THE TIME, BETA 0 WILL MOVE BETWEEN -0.023 TO 0.003, THUS BETA 0 ISN’T SIGNIFICANTLY LESS THAN 0. HOWEVER, MOST OF THE TIME BETA 0 WILL MOVE IN NEGATIVE VALUES. THE P-VALUE OF BETA 0, 0.127, 87% OF THE TIME WILL MOVE IN NEGATIVE VALUES. IF WE REJECT THE NULL HYPOTHSIS THE PROBABILITY THAT THE CONCLUSION WILL BE WRONG IS EQUIVALENT TO 0.127.

Q Estimating the CAPM model for a stock

The CAPM model

The Capital Asset Pricing Model states that the expected return of a stock is given by the risk-free rate plus its beta coefficient multiplied by the market premium return. In mathematical terms:

E[Ri]=Rf+β1(RM−Rf)

We can express the same equation as:

(E[Ri]−Rf)=β1(RM−Rf)

Then, we are saying that the expected value of the premium return of a stock is equal to the premium market return multiplied by its market beta coefficient. You can estimate the beta coefficient of the CAPM using a regression model and using continuously compounded returns instead of simple returns. However, you must include the intercept b0 in the regression equation:

(ri−rf)=β0+β1(rM−rf)+ε

Where ε ∼ N(0, σε); the error is a random shock with an expected mean=0 and a specific standard deviation or volatility. This error represents the result of all factors that influence stock returns, and cannot be explained by the model (by the market).

In the market model, the dependent variable was the stock return and the independent variable was the market return. Unlike the market model, here the dependent variable is the difference between the stock return minus the risk-free rate (the stock premium return), and the independent variable is the premium return, which is equal to the market return minus the risk-free rate. Let’s run this model in r with a couple of stocks.

Data collection

options(scipen=999)
library(quantmod)

Download stock data

Download monthly stock data for Apple, Tesla and the S&P500 from 2014 to Dec, 2020 from Yahoo Finance using the getSymbols function and obtain continuously compounded returns for each.

getSymbols(c("AAPL", "^GSPC", "TSLA"), from="2014-01-01", 
           to="2020-12-01", periodicity="monthly", src="yahoo")
[1] "AAPL"  "^GSPC" "TSLA" 
prices <- merge(AAPL$AAPL.Adjusted,GSPC$GSPC.Adjusted, TSLA$TSLA.Adjusted)
prices <- merge(Ad(AAPL), Ad(GSPC), Ad(TSLA))
APPL_r <- na.omit(diff(log(prices$AAPL.Adjusted)))
GSPC_r <- na.omit(diff(log(prices$GSPC.Adjusted)))
TSLA_r <- na.omit(diff(log(prices$TSLA.Adjusted)))

Download risk-free data from the FED

Download the risk-free monthly rate for the US (6-month treasury bills), which is the TB6MS ticker:

getSymbols("TB3MS", src = "FRED")
[1] "TB3MS"

This return is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:

rfrate<-TB3MS/100/12

This return is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:

rfrate <- log(1+rfrate)

I used the formula to get cc reteurns from simple returns, which is applying the natural log of the growth factor (1+rfrate)

Subsetting the risk-free dataset

Unfortunately, when getSymbols brings data from the FED, it brings all historical values of the series, even though the end date is specified.

Then, I do a sub-setting of the risk-free rate dataset to keep only those months that are equal to the months I brought for the stocks:

rfrate <- rfrate["2014-02-01/2020-12-01"]

Estimating the premium returns

Now you have to generate new variables (columns) for the premium returns for the stocks and the S&P 500. The premium returns will be equal to the returns minus the risk-free rat:

TSLA_Premr <- TSLA_r - rfrate
APPL_Premr <- APPL_r - rfrate
GSPC_Premr <- GSPC_r - rfrate

Q Visualize the relationship

Do a scatter plot putting the S&P500 premium returns as the independent variable (X) and Tesla premium return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns:

plot.default(x=GSPC_Premr, y=TSLA_Premr)
abline(lm(TSLA_Premr ~ GSPC_Premr),col='blue')

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. We also add a line that better represents the relationship between the stock returns and the market returns. Type:

plot.default(x=GSPC_Premr, y=TSLA_Premr, ylim=c(-0.5,0.5),xlim=c(-0.6,0.6))
abline(lm(TSLA_Premr ~ GSPC_Premr),col='blue')

WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN THE PLOT HAS A DIRECTLY PROPORTIONAL POSITIVE RELATIONSHIP, WHERE INCREMENTS IN THE MARKET RETURNS IMPACT THE STOCK RETURNS AND VICEVERSA. IF THE MARKET RETURNS DECREASE, SO DOES THE STOCK RETURNS. THE SLOPE OF THE PLOT APPEARS LARGER THAN 1 DUE TO THE ANGE BEING GREATER THAN 45 DEGREES. THEN, THE RELATIONSHIP IS EXPECTED TO BE 1 TO 1.

Q Estimating the CAPM model for a stock

Use the premium returns to run the CAPM regression model for each stock.

We start with Tesla:

Tesla_CAPM <-lm(TSLA_Premr ~ GSPC_Premr, na.action=na.omit)
Tesla_s <-summary(Tesla_CAPM)
Tesla_s

To do a rough estimate of the 95% confidence interval for B0:

minB0 <- Tesla_s$coefficients[1,1]  - (2* Tesla_s$coefficients[1,2] )
maxBO <-  Tesla_s$coefficients[1,1]  + (2* Tesla_s$coefficients[1,2] )

cat("The approx. B0 confidence interval goes from", minB0, "to", maxBO)
The approx. B0 confidence interval goes from -0.01294225 to 0.05062803
t_critical_value = qt(0.025,Tesla_CAPM$df.residual)
# I get the absolute value:
t_critical_value = abs(t_critical_value)
t_critical_value
[1] 1.990063

To estimate the 95% confidence interval for B1:

minB1 <- Tesla_s$coefficients[2,1]  - (2* Tesla_s$coefficients[2,2] )
maxB1 <-  Tesla_s$coefficients[2,1]  + (2* Tesla_s$coefficients[2,2] )
cat("The approx. B1 confidence interval goes from", minB1, "to", maxB1)
The approx. B1 confidence interval goes from 0.9959955 to 2.525798
minB1 <- Tesla_s$coefficients[2,1]  - (t_critical_value* Tesla_s$coefficients[2,2] )
maxB1 <-  Tesla_s$coefficients[2,1]  + (t_critical_value* Tesla_s$coefficients[2,2] )

cat("The B1 confidence interval goes from", minB1, "to", maxB1)
The B1 confidence interval goes from 0.9997957 to 2.521998
APPL_CAPM <-lm(APPL_Premr ~ GSPC_Premr, na.action=na.omit)
APPL_s <-summary(APPL_CAPM)
APPL_s

Call:
lm(formula = APPL_Premr ~ GSPC_Premr, na.action = na.omit)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.238842 -0.029580 -0.001329  0.039277  0.128609 

Coefficients:
            Estimate Std. Error t value       Pr(>|t|)    
(Intercept) 0.013915   0.006834   2.036          0.045 *  
GSPC_Premr  1.242986   0.164456   7.558 0.000000000059 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.06074 on 80 degrees of freedom
Multiple R-squared:  0.4166,    Adjusted R-squared:  0.4093 
F-statistic: 57.13 on 1 and 80 DF,  p-value: 0.00000000005901
  1. INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.

AROUND 95% OF THE TIME, BETA 0 WILL MOVE BETWEEN -0.023 TO 0.003, THUS BETA 0 ISN’T SIGNIFICANTLY LESS THAN 0. HOWEVER, MOST OF THE TIME BETA 0 WILL MOVE IN NEGATIVE VALUES. THE P-VALUE OF BETA 0, 0.127, 87% OF THE TIME WILL MOVE IN NEGATIVE VALUES. IF WE REJECT THE NULL HYPOTHSIS THE PROBABILITY THAT THE CONCLUSION WILL BE WRONG IS EQUIVALENT TO 0.127. AROUND 95% OF THE TIME, B1 CAN MOVE FROM 0.7934141 TO 1.5443674. ALFA RETURNS ARE RISKIER THAN THE MARKET SINCE BETA 1 CAN MOVE FROM LESS THAN ONE TO OVER ONE, BUT, WE DO NOT HAVE ENOUGH INDICATORS THAT ALFA RETURNS ARE RISKIER THAN THE MARKET RETURNS. T VALUE OF THE TEST IS LESS THAN 2, THUS WE CANNOT REJECT THE NULL HYPOTHESIS THAT SATES THAT BETA 1 EQUALS 1.

  1. DO A QUICK RESEARCH ABOUT THE EFFICIENT MARKET HYPOTHESIS. BRIEFLY DESCRIBE WHAT THIS HYPOTHESIS SAYS.

ACCORDING TO RAJEEV DHIR, "THE EFFICIENT MARKET HYPOTHESIS (EMH) MAINTAINS THAT ALL STOCKS ARE PERFECTLY PRICED ACCORDING TO THEIR INHERENT INVESTMENT PROPERTIES, THE KNOWLEDGE OF WHICH ALL MARKET PARTICIPANTS POSSESS EQUALLY. - (INVESTOPEDIA, 06.30.21)

  1. ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL? I COULDNT FIND INFORMATION REGARDING THIS QUESTION, BUT I WOULD GO FOR 0.

  2. ACCORDING TO YOUR RESULTS, IS TESLA SIGNIFICANTLY RISKIER THAN THE MARKET ? WHAT IS THE t-test YOU NEED TO DO TO RESPOND THIS QUESTION? Do the test and provide your interpretation. (Hint: Here you have to change the null hypothesis for b1: H0: b1=1; Ha=b1<>1)

THE BETA 1 COEFFICIENT FOR TESLA IS 1.760, TESLA PREMIUM RETURNS ARE RISKIER THAN THE MARKET PREMIUM RETURNS, ALTHOUGH ARGUABLY WELL HAVE TO TEST THIS.

Tesla_t <- (Tesla_s$coefficients[2,1]-1)/Tesla_s$coefficients[2,2]
Tesla_t
[1] 1.989529
pvalue<- 2*pt(Tesla_t, Tesla_CAPM$df.residual, lower.tail=FALSE)
pvalue
[1] 0.05006007
pvalue1tailed = pvalue / 2
pvalue1tailed
[1] 0.02503003

THE STATISTICAL EVIDENCE POINTS TOWARDS AT THE 95% CONFINDENCE INTERVAL TO SAY THAT TESLA PREMIUM RETURNS ARE SIGNIFICANTLY RISKIER THAN THOSE OF THE MARKET.

