1 Q Simple regression model

In a simple regression model is used to understand the linear relationship between two variables assuming that one variable, the independent variable (IV), can be used as a predictor of the other variable, the dependent variable (DV). In this part we illustrate a simple regression model with the Market Model.

The Market Model states that the expected return of a stock is given by its alpha coefficient (b0) plus its market beta coefficient (b1) multiplied times the market return. In mathematical terms:

\[ E[R_i] = α + β(R_M) \]

We can express the same equation using BO as alpha, and B1 as market beta:

\[ E[R_i] = β0 + β1(R_M) \]

We can estimate the alpha and market beta coefficient by running a simple linear regression model specifying that the market return is the independent variable and the stock return is the dependent variable. It is strongly recommended to use continuously compounded returns instead of simple returns to estimate the market regression model. The market regression model can be expressed as:

\[ r_{(i,t)} = β_0 + β_1*r_{(M,t)} + ε_t \]

Where:

\(ε_t\) is the error at time t. Thanks to the Central Limit Theorem, this error behaves like a Normal distributed random variable ∼ N(0, σε); the error term is expected to have mean=0 and a specific standard deviation (also called volatility).

\(r_{(i,t)}\) is the return of the stock i at time t.

\(r_{(M,t)}\) is the market return at time t

\(β0\) and \(β1\) are coefficients or constants

1.1 Data download

Now it’s time to use real data to better understand this model. Download monthly prices for Alfa (ALFAA.MX) and the IPCyC (^MXX) from Yahoo from January 2015 to Dec 2019. You must use ALSEA and the IPCyC to construct your own market model). You have to:

# load package quantmod
library(quantmod)

# Download the data
getSymbols(c("ALFAA.MX", "^MXX"), from="2015-01-01", to= "2019-12-31", periodicity="monthly", src="yahoo")

## [1] "ALFAA.MX" "^MXX"

# Calculate continuously returns for the stock and the market index
r_ALFAA <- na.omit(diff(log(ALFAA.MX$ALFAA.MX.Adjusted))) #I dropped the na's
# For the IPC:
r_MXX <- na.omit(diff(log(MXX$MXX.Adjusted)))

# I merge them into the same object using the merge function:
all_rets <- merge(r_ALFAA, r_MXX)

#I renamed the columns:
colnames(all_rets) <- c("ALFAA", "MXX")

# Take a look at your objects!

1.2 Q Visualize the relationship

Do a scatter plot putting the IPCyC returns as the independent variable (X) and the stock return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns.Type:

plot.default(x=all_rets$MXX,y=all_rets$ALFAA)
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='blue')

# As you see, I indicated that the Market returns goes in the X axis and 
#   Alfa returns in the Y axis. 
# In the market model, the independent variable is the market returns, while
#   the dependent variable is the stock return

Sometimes graphs can be deceiving. In this case, the range of X axis and Y axis are different, so it is better to do a graph where we can make both X and Y ranges with equal distance. We also add a line that better represents the relationship between the stock returns and the market returns. Type:

plot.default(x=all_rets$MXX,y=all_rets$ALFAA, xlim=c(-0.30,0.30) )
abline(lm(all_rets$ALFAA ~ all_rets$MXX),col='blue')

WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN

IN THIS CASE, THE REGRESSION LINE IS THE LINE THAT BETTER REPRESENTS THE RELATIONSHIP BETWEEN THE MARKET RETURN AND THE STOCK RETURN.

I SEE A POSITIVE RELATIONSHIP BETWEEN MARKET RETURNS AND ALFA RETURNS. WHEN MARKET RETURNS INCRESASE, ALFA RETURNS TEND TO INCREASE. IT SEEMS THAT WHEN THE MARKET RETURNS INCREASE IN 1% OR 1 UNIT, ALFA RETURNS INCREASE A LITTLE BIT HIGHER THAN 1% SINCE THE ANGLE OF THE LINE SEEMS TO BE A LITTLE BIT HIGHER THAN 45 DEGREES. WHEN ANGLE OF THE REGRESSION LINE (WITH RESPECT TO THE X AXIS) IS 45 DEGREES, THE SLOPE=1.

1.3 Q Running the market regression model

Using the lm() function, run a simple regression model to see how the monthly returns of the stock are related with the market return. The first parameter of the function is the DEPENDENT VARIABLE (in this case, the stock return), and the second parameter must be the INDEPENDENT VARIABLE, also named the EXPLANATORY VARIABLE (in this case, the market return).

What you will get is called The Market Regression Model. You are trying to examine how the market returns can explain stock returns from Jan 2015 to Aug 2020.

Assign your market model to an object named “reg”.

# Run the regression with the lm function:
reg <- lm(r_ALFAA ~ r_MXX)
# The first variable is the Dependent variable (the stock return), and 
#   the variable after the ~ is the Independent variable or explanatory
#   variable (the market return)

# I get the summary of the regression output into a variable
sumreg<- summary(reg)
# I display the main results of the regression:
sumreg

## 
## Call:
## lm(formula = r_ALFAA ~ r_MXX)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.097701 -0.036862 -0.004467  0.030768  0.146265 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.010262   0.006643  -1.545    0.128    
## r_MXX        1.168891   0.187738   6.226 6.11e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.051 on 57 degrees of freedom
## Multiple R-squared:  0.4048, Adjusted R-squared:  0.3944 
## F-statistic: 38.77 on 1 and 57 DF,  p-value: 6.113e-08

We can calculate the main sums of squares of a regression model. In the Note “Basics of Linear Regression Models” you can remember what are these sums of squares.

For the sum of squares of total deviations from the mean of Y (SST), you can do the following:

Calculate a variable for the mean of the dependent variable Y (in this case, the stock return):

meanY = mean(r_ALFAA)

Calculate a variable with the squared deviations of each value of Y (stock returns) from its mean, and get the sum of these values:

# Calculate a vector for the squared deviations of each value of stock returns
#   from its mean:
squared_deviations_1 <- (r_ALFAA - meanY)^2
# Now I get the sum of these squared deviations
SST = sum(squared_deviations_1)
SST

## [1] 0.2491151

For the sum of squares of the regression model (SSRM) you have to use the predicted values of the regression model, also called the fitted values of the model. These values are stored in the regression object reg we created with the lm function:

The fitted (predicted) values of the regression model are stored in the fitted.values attribute of the regression object:

fittedY = reg$fitted.values

Now you can get the SSRM with a similar process we followed to get the SST. Remember that you have to get the sum of squared deviations of each fitted value from the mean of Y.

# Calculate a vector for the squared deviations of each fitted value 
#   from the Y mean:
squared_deviations_2 = (fittedY-meanY)^2
# Sum these squared deviations to get SSRM
SSRM = sum(squared_deviations_2)
SSRM

## [1] 0.1008404

In a similar process, you can get the sum of squares for the errors (SSE). To get the SSE you have to get the sum of squares of the difference between the real values of Y (stock return) and the predicted values (fittedY).

You can compare if your calculations of sum of squares are correct by running the ANOVA function as follows:

anova(reg)

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100840  38.765 6.113e-08 ***
## Residuals 57 0.14827 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In the column Sum Sq you can see the SSRM and the SSE.

RESPOND TO THE FOLLOWING QUESTIONS:

1. What are the standard errors of the beta coefficients? (b0 and b1) What are they for?

THE STANDARD ERROR OF b0 IS THE STANDARD DEVIATION OF b0; THE STANDARD ERROR OF b1 IS THE STANDARD DEVIATION OF b1.

IN THIS CASE, b0 IS EQUAL TO -0.0102618, WHILE ITS STANDARD ERROR IS 0.0066429.

b1 IS 1.1688907, WHILE ITS STANDARD ERROR IS 0.1877383.

THE STANDARD ERROR OF A COEFFICIENT IS THE EXPECTED STANDARD DEVIATION (AVERAGE VARIATION) OF THE COEFFICIENT IN THE NEAR FUTURE. REGRESSION COEFFICIENTS USUALLY CHANGE OVER TIME, SO THE STANDARD ERROR OF A COEFFICIENT GIVES US INFORMATION ABOUT HOW MUCH (PLUS OR MINUS) THE COEFFICIENT MIGHT CHANGE FROM ITS MEAN VALUE IN THE FUTURE.

SINCE THE REGRESSION COEFFICIENTS (IN THIS CASE, b0 AND b1) ARE A LINEAR COMBINATION OF RANDOM VARIABLES, ACCORDING TO THE CENTRAL LIMIT THEOREM, THESE COEFFICIENTS WILL BEHAVE LIKE NORMAL DISTRIBUTED VARIABLES. THEN, WE CAN USE THE STANDARD ERROR OF A COEFFICIENT TO CONSTRUCT ITS 95% CONFIDENCE INTERVAL.

2. What is the total sum of squares (SST) ? (provide the result, and explain the formula)

WE CAN CALCULATE SUM OF SQUARES OF A REGRESSION MODEL USING THE FUNCTION anova. WE NEED TO APPLY THE FUNCTION ANOVA TO A REGRESSION OBJECT. IN THIS CASE WE CAN DO THE FOLLOWING:

sumsquares <- anova(reg)
sumsquares

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100840  38.765 6.113e-08 ***
## Residuals 57 0.14827 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

IN THE COLUMN OF Sum Sq WE CAN SEE: A) THE SUM OF SQUARES OF THE REGRESSION MODEL (SSRM). IN THIS CASE, IT IS EQUAL TO 0.1008404. B) THE SUM OF SQUARES OF THE ERRORS OR RESIDUALS (SSE). IN THIS CASE, IT IS EQUAL TO 0.1482748.

THE TOTAL SUM OF SQUARES (SST) IS EQUAL TO SSRM + SSE. IN THE FOLLOWING PARAGRAPHS I EXPLAIN WHAT IS EACH OF THESE SUM OF SQUARES.

The SST is the Sum of Total Squares of the regression model. If we state a general formula for a regression line as:

\(E[Y_{i}]=b_{0}+b_{1}(X_{i})\)

where i goes from 1 to N observations, then:

\[ SST=\sum_{i=1}^{N}(Y_{i}-\bar{Y})^{2} \]

Then, SST is the sum of all squared distances from each point \(Y_i\) to the mean of Y (\(\bar{Y}\))

We consider \(\bar{Y}\) as the UNCONDITIONAL mean, since it is independent to the values of X.

With the anova function, the SST will be equal to the sum of the SSRM and SSE. In this case, SST = 0.2491151.

We can decompose SST in two parts: the sum of squared distances that are explained by the regression model (Sum of Squared Regression Model), and the sum of squared distances that are NOT explained by the regression model (Sum of Squared Errors):

SST = SSR + SSE

The SSE is the sum of squared errors:

\[ SSE=\sum_{i=1}^{N}(Y_{i}-E[Y])^{2} \]

The distances from Yi to E[Yi] are the distances that cannot be explained with the regression model.

The SSR is the sum of the squared distances from the unconditional mean (\(\bar{Y}\)) to the expected mean according to the regression model:

\[ SSRM=\sum_{i=1}^{N}(E[Y]-\bar{Y})^{2} \]

The coefficient of determination of the regression model R-squared is defined as:

\[ R^{2}=\frac{SSRM}{SST} \]

\(R^{2}\) is the percentage of variance of Y that is explained by the variance of X. In other words, it gives us a % of explanation of the variation of Y given variations of X.

3. What is the sum of squared errors (SSE) ? (provide the result, and explain the formula)

I GET THE SSE FROM THE anova FUNCTION, AS WE DID ABOVE:

# The anova function needs to receive an lm object as parameter. In this case, 
#   reg is a regression object, which contains the market linear regression 
#   for Alfa: 
anova(reg)

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100840  38.765 6.113e-08 ***
## Residuals 57 0.14827 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

SSE, THE SUM OF SQUARES OF RESIDUAL (ERRORS) IS EQUAL TO 0.1482748. I EXPLAIN THE FORMULA ABOVE. SSE IS THE SUM OF ALL SQUARED DEVIATIONS FROM EACH REAL VALUE OF ALFA RETURNS TO THE EXPECTED VALUE OF ALFA RETURNS CALCULATED WITH THE MARKET REGRESSION MODEL.

I CAN ALSO CALCULATE THE SSE USING THE PREVIOUS MANUAL CALCULATIONS. SINCE SST = SSE + SSRM, THEN

SSE = SST - SSRM

THEN:

SSE <- SST - SSRM
SSE

## [1] 0.1482748

WHICH IS THE SAME VALUE THAT IS SHOWN WITH THE anova FUNCTION.

4. What is the sum of squared regression differences (SSR) ? (provide the result and explain the formula)

I GET THE SSE FROM THE anova FUNCTION, AS WE DID ABOVE:

# The anova function needs to receive an lm object as parameter. In this case, 
#   reg is a regression object, which contains the market linear regression 
#   for Alfa: 
anova(reg)

## Analysis of Variance Table
## 
## Response: r_ALFAA
##           Df  Sum Sq  Mean Sq F value    Pr(>F)    
## r_MXX      1 0.10084 0.100840  38.765 6.113e-08 ***
## Residuals 57 0.14827 0.002601                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

SSRM, THE SUM OF SQUARES OF THE REGRESSION MODEL IS EQUAL TO 0.1008404. I EXPLAIN THE FORMULA ABOVE. SSRM IS THE SUM OF ALL SQUARED DEVIATIONS FROM THE MEAN OF ALFA RETURNS TO EACH EXPECTED (FITTED) VALUE OF ALFA RETURNS CALCULATED WITH THE REGRESSION MODEL.

5. What is the coefficient of determination of the regression (the R-squared)? (provide the result and explain the formula)

WE CAN GET THE R-SQUARED, WHICH IS THE % OF VARIANCE OF ALFA RETURNS EXPLAINED BY THE MARKET RETURNS, WITH THE SUMMARY OF THE REGRESSION MODEL:

summary(reg)

## 
## Call:
## lm(formula = r_ALFAA ~ r_MXX)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.097701 -0.036862 -0.004467  0.030768  0.146265 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.010262   0.006643  -1.545    0.128    
## r_MXX        1.168891   0.187738   6.226 6.11e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.051 on 57 degrees of freedom
## Multiple R-squared:  0.4048, Adjusted R-squared:  0.3944 
## F-statistic: 38.77 on 1 and 57 DF,  p-value: 6.113e-08

IN THIS CASE, THE R-SQUARED IS EQUAL TO 0.4047941. AS EXPLAINED ABOVE, R-SQUARED IS EQUAL TO:

\[ R^{2}=\frac{SSRM}{SST} \]

6. Interpret the results of the beta coefficients (b0 and b1) and their corresponding t-values and p-values with your own words.

INTERPETATION:

Looking at the b1 coefficient, we got b1=1.1688907, meaning that there is a significant positive relationship between the market returns and the Alsea returns. It is significantly related to the market since b1 is statistically significantly greater than zero. I conclude this since its p-value is 0.000000028.

The p-value is the probability of making an error when we conclude to reject the null hypothesis, which in this case is that the beta1 is equal to zero (no relationship).

When running a market regression model, it is interesting to check whether asset returns are RISKIER, equally or less risky than the market returns.

In this case I can see that Alfa is riskier than the market (on average) since its b1 coefficient is greater than 1. For each 1% change of the market return, on average, Alfa’s return moves on about 1.1688907%. However, we must look at the 95% conficence interval to see whether Alfa is SIGNIFICANTLY RISKIER than the market.

THE t-values ARE DISPLAYED IN THE REGRESSION SUMMARY TO THE RIGHT OF STANDARD ERRORS. IN THIS CASE, THE t-values CALCULATED IN THE REGRESSION MODEL ASSUME THAT THE NULL HYPOTHESIS IS THAT THE REGRESSION COEFFICIENT IS EQUAL TO ZERO. THEN, IN THE CASE OF b1, THE T-VALUE>2, BUT THIS IS TESTING WHETHER b1 IS BIGGER THAN ZERO (NOT BIGGER THAN 1)

7. Estimate an approximate 95% confidence interval for b0 and b1 and interpret them

FOR BETA1:

We can construct an approximate 95% C.I. of b1 using its standard error. In this case, b1 can move between the range calculated as:

[b1-2stderror(b1) .. b1+2stderror(b1)]

In this case, the approximate 95% C.I. would be [0.7934141 to 1.5443674].
This means that around 95% of the time, b1 can move from 0.7934141 to 1.5443674. Then, I cannot say that Alfa returns are significantly riskier (at the 95% confidence level) than the market since its b1 can move 95% of the time from a value less than one, to one, to a value bigger than 1. In this case, even though we got a b1 greater than one, we cannot say that we have enough evidence at the 95% confidence level, to conclude that Alfa is riskier than the market.

Hypothesis test to examine whether the Alfa return is riskier than the market:

H0: mean(b1) = 1 –> Alfa return is equally risky than the market returns

Ha: mean(b1) > 1 –> Alfa return are riskier than the market return

t-value = (b1 - 1) / std.error(b1)

t-value = (1.1688907 - 1) / 0.1877383 )

To reject the Null hypothesis, the |t-value| has to be bigger or equal to about 2.

In this case, since the t-value of the test is < 2, then we cannot reject the null that states that the b1=1. Then, although b1>1 on average, it is NOT SIGNIFICANTLy bigger than one (at the 95% confidence level). Then we cannot say that Alfa returns are significantly riskier than market returns.

FOR BETA0:

FOR b0, WE CAN CONSTRUCT THE 95%C.I. IN THE SAME WAY THAN THE 95%C.I. OF b1. THE 95% C.I. OF b0 IS ROUGHLY CALCULATED BY SUBTRACTING AND ADDING 2 TIMES ITS OWN STANDARD ERROR:

THE VALUE OF BETA0 (THE VALUE GOT FROM THE REGRESSION) IS EQUAL TO -0.0102618. THEN, ON AVERAGE BETA0 IS LESS THAN ZERO. HOWEVER, WE NEED TO CHECK WHETHER BETA0 IS SIGNIFICANTLY LESS THAN ZERO AT THE 95% CONFIDENCE LEVEL. THEN WE CALCULATE ITS 95% CONFIDENCE INTERVAL:

95% C.I. OF b0 = [-0.0235476 .. 0.003024]

We can say that 95% of the time, beta0 will move from a negative value (-0.0235476) to a positive value (0.003024), so we cannot say that beta0 is significantly less than zero. However, we can see that most of the time beta0 will move in negative values.

Looking at the p-value of beta0, which is 0.1279335, we can see that 87.2066459 % (1-pvalue) of the time, beta0 will move in negative values. we can also say that if we accept that beta0 is negative (reject the null hypothesis), then we will have a probability of 0.1279335 that our conclusion will be wrong. `

2 Q Estimating the CAPM model for a stock

3 The CAPM model

The Capital Asset Pricing Model states that the expected return of a stock is given by the risk-free rate plus its beta coefficient multiplied by the market premium return. In mathematical terms:

\[ E[R_i] = R_f + β_1(R_M − R_f ) \]

We can express the same equation as:

\[ (E[R_i] − R_f ) = β_1(R_M − R_f ) \]

Then, we are saying that the expected value of the premium return of a stock is equal to the premium market return multiplied by its market beta coefficient. You can estimate the beta coefficient of the CAPM using a regression model and using continuously compounded returns instead of simple returns. However, you must include the intercept b0 in the regression equation:

\[ (r_i − r_f ) = β_0 + β_1(r_M − r_f ) + ε \]

Where ε ∼ N(0, \(σ_ε\)); the error is a random shock with an expected mean=0 and a specific standard deviation or volatility. This error represents the result of all factors that influence stock returns, and cannot be explained by the model (by the market).

In the market model, the dependent variable was the stock return and the independent variable was the market return. Unlike the market model, here the dependent variable is the difference between the stock return minus the risk-free rate (the stock premium return), and the independent variable is the premium return, which is equal to the market return minus the risk-free rate. Let’s run this model in r with a couple of stocks.

You can watch this VIDEO to get an idea of how to do it.

4 Data collection

We first clean our environment and load the quantmod package:

# To clear our environment we use the remove function rm:
rm(list=ls())

# To avoid scientific notation for numbers: 
options(scipen=999)

# load package quantmod
library(quantmod)

4.1 Download stock data

Download monthly stock data for Apple, Tesla and the S&P500 from 2014 to Dec, 2020 from Yahoo Finance using the getSymbols function and obtain continuously compounded returns for each.

getSymbols(c("AAPL", "^GSPC", "TSLA"), from="2014-01-01", 
           to="2020-12-01", periodicity="monthly", src="yahoo")

## [1] "AAPL"  "^GSPC" "TSLA"

#I select only the adjusted prices of each stock and merge them together:
prices <- merge(AAPL$AAPL.Adjusted,GSPC$GSPC.Adjusted, TSLA$TSLA.Adjusted)
# Or I can do:
prices <- merge(Ad(AAPL), Ad(GSPC), Ad(TSLA))

# I calculate continuously compounded returns to all columns of the 
#   price object:
APPL_r <- na.omit(diff(log(prices$AAPL.Adjusted)))
GSPC_r <- na.omit(diff(log(prices$GSPC.Adjusted)))
TSLA_r <- na.omit(diff(log(prices$TSLA.Adjusted)))

# I use the na.omit() function to remove NA values (since the first month 
#  is not possible to calculate returns) and select only Adjusted columns.

4.2 Download risk-free data from the FED

Download the risk-free monthly rate for the US (6-month treasury bills), which is the TB6MS ticker:

getSymbols("TB3MS", src = "FRED")

## [1] "TB3MS"

This return is given in percentage and in annual rate. I divide it by 100 and 12 to get a monthly simple rate since I am using monthly rates for the stocks:

rfrate<-TB3MS/100/12

Now I get the continuously compounded return from the simple return:

rfrate <- log(1+rfrate)

I used the formula to get cc reteurns from simple returns, which is applying the natural log of the growth factor (1+rfrate)

4.3 Subsetting the risk-free dataset

Unfortunately, when getSymbols brings data from the FED, it brings all historical values of the series, even though the end date is specified.

Then, I do a sub-setting of the risk-free rate dataset to keep only those months that are equal to the months I brought for the stocks:

rfrate <- rfrate["2014-02-01/2020-12-01"]

I USED GETSYMBOLS TO DOWNLOAD DATA OF THE T-BILLS FROM THE FEDERAL RESERVE. THIS DATA HAS A DIFFERENT FORMAT FROM THE PREVIOUS ONES, SO I HAVE TO PERFORM SOME DATA MANAGEMENT ACTIONS. FIRST, I DIVIDED BETWEEN 100 AND BETWEEN 12, IN ORDER TO GET THE MONTHLY RATES IN DECIMAL. AFTERWARDS, I TURN THOSE SIMPLE RETURNS INTO CONTINUOUSLY COMPOUNDED RETURNS BY ADDING 1 AND THEN GETTING THE NATURAL LOGARITHM. FINALLY, I SELECTED A SUB-SETTING OF THE DATA, SINCE WHEN USING FRED AS SOURCE, I CANNOT SELECT PERIODS WITH GETSYMBOLS, SO ALL HISTORICAL DATA IS DOWNLOADED.

4.4 Estimating the premium returns

Now you have to generate new variables (columns) for the premium returns for the stocks and the S&P 500. The premium returns will be equal to the returns minus the risk-free rat:

TSLA_Premr <- TSLA_r - rfrate
APPL_Premr <- APPL_r - rfrate
GSPC_Premr <- GSPC_r - rfrate

5 Q Visualize the relationship

Do a scatter plot putting the S&P500 premium returns as the independent variable (X) and Tesla premium return as the dependent variable (Y). We also add a line that better represents the relationship between the stock returns and the market returns:

plot.default(x=GSPC_Premr, y=TSLA_Premr)
abline(lm(TSLA_Premr ~ GSPC_Premr),col='blue')

plot.default(x=GSPC_Premr, y=TSLA_Premr, ylim=c(-0.5,0.5),xlim=c(-0.6,0.6))
abline(lm(TSLA_Premr ~ GSPC_Premr),col='blue')

WHAT DOES THE PLOT TELL YOU? BRIEFLY EXPLAIN

WE CAN SEE A CLEAR POSITIVE RELATIONSHIP BETWEEN THE MARKET PREMIUM RETURNS AND THE STOCK PREMIUM RETURNS. WHEN THE MARKET PREMIUM RETURN INCREASES, IT IS VERY LIKELY THAT THE STOCK PREMIUM RETURNS ALSO INCREASES, AND VICEVERSA, WHEN THE MARKET PREMIUM RETURNS DECREASES, THE STOCK PREMIUM RETURNS IS LIKELY TO DECREASE. THE SLOPE OF THE LINE LOOKS LIKE A VALUE BIGGER THAN 1 SINCE ITS ANGLE IS HIGHER THAN 45 DEGREES. THEN, FOR 1% INCREASE OF THE MARKET PREMIUM RETURN, IT IS EXPECTED THAT THE STOCK PREMIUM RETURN WILL MOVE IN A LITTLE BIT MORE THAN 1%.

6 Q Estimating the CAPM model for a stock

Use the premium returns to run the CAPM regression model for each stock.

We start with Tesla:

Tesla_CAPM <-lm(TSLA_Premr ~ GSPC_Premr, na.action=na.omit)

# Note that I added the parameter na.action=na.omit to validate in case some
# of the return series have NA values. NA values will be omitted
# I apply the function summary to the Tesla_CAPM object to get the coefficients and the
# standard errors. I assign the result in the Tesla_s object
Tesla_s <-summary(Tesla_CAPM)
# The summary function, shows the results for the B1 and B0 coefficients, their
# residuals, t and p values.
# The first line shows the B0 coefficients
# The second, the coefficients for B1

Tesla_s

## 
## Call:
## lm(formula = TSLA_Premr ~ GSPC_Premr, na.action = na.omit)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34187 -0.08967 -0.02076  0.08358  0.42657 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  0.01884    0.01589   1.186     0.239    
## GSPC_Premr   1.76090    0.38245   4.604 0.0000154 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1413 on 80 degrees of freedom
## Multiple R-squared:  0.2095, Adjusted R-squared:  0.1996 
## F-statistic:  21.2 on 1 and 80 DF,  p-value: 0.00001538

To do a rough estimate of the 95% confidence interval for B0:

minB0 <- Tesla_s$coefficients[1,1]  - (2* Tesla_s$coefficients[1,2] )
maxBO <-  Tesla_s$coefficients[1,1]  + (2* Tesla_s$coefficients[1,2] )

cat("The approx. B0 confidence interval goes from", minB0, "to", maxBO)

## The approx. B0 confidence interval goes from -0.01294225 to 0.05062803

ACTUALLY, I CAN CALCULATE THE EXACT 95% CONFIDENCE INTERVAL USING THE T-CRITICAL VALUE NEEDED TO COVER 95% OF THE POSSIBLE VALUES OF BETA0

INSTEAD OF USING t=2 AS CRITICAL VALUE, WE CALCULATE THE EXACT t-value USING THE t-STUDENT DISTRIBUTION. IN R, WE CAN USE THE FUNCTION qt, WHICH GIVES US A t-value FOR A SPECIFIC PROBABILITY TO THE LEFT OF THE DISTRIBUTION:

t_critical_value = qt(0.025,Tesla_CAPM$df.residual)
# I get the absolute value:
t_critical_value = abs(t_critical_value)
t_critical_value

## [1] 1.990063

I NEED TO SPECIFY 0.025 SINCE THIS FUNCTION RECEIVES THE AREA (TO THE LEFT) UNDER THE T PROBABILITY DISTRIBUTION THAT CAN GIVE US 95% OF THE AREA UNDER THE PROBABILITY FUNCTION (THE CONFIDENCE INTERVAL). SINCE THE PROBABILITY FUNCTION IS SYMETRIC, WE HAVE 02.5% TO THE LEFT AND 2.5% TO THE RIGHT, AND 95% IN THE MIDDLE. I ALSO SPECIFY THE DEGREES OF FREEDOM FOR THE T-PROBABILITY DISTRIBUTION, AND I SPECIFY THE DEGREES OF FREEDOM OF THE RESIDUALS OF THE TESLA REGRESSION MODEL, WHICH IS EQUAL TO THE # OF OBSERVATIONS MINUS THE # OF COEFFICIENTS TO ESTIMATE (IN THIS CASE, 2).

THE T CRITIAL VALUE FOR THIS REGRESSION IS 1.9900634 , WHICH IS VERY CLOSE TO 2, AS EXPECTED. I CAN USE THIS VALUE TO CALCULATE THE EXACT 95% CONFIDENCE INTERVAL USING THE PREVIOUS FORMULA:

minB0 <- Tesla_s$coefficients[1,1]  - (t_critical_value* Tesla_s$coefficients[1,2] )
maxBO <-  Tesla_s$coefficients[1,1]  + (t_critical_value* Tesla_s$coefficients[1,2] )

cat("The B0 confidence interval goes from", minB0, "to", maxBO)

## The B0 confidence interval goes from -0.01278433 to 0.05047011

To estimate the 95% confidence interval for B1:

minB1 <- Tesla_s$coefficients[2,1]  - (2* Tesla_s$coefficients[2,2] )
maxB1 <-  Tesla_s$coefficients[2,1]  + (2* Tesla_s$coefficients[2,2] )

cat("The approx. B1 confidence interval goes from", minB1, "to", maxB1)

## The approx. B1 confidence interval goes from 0.9959955 to 2.525798

I CAN DO THE SAME PROCESS I FOLLOWED FOR BETA0 TO CALCULATE THE EXACT 95% CONFIDENCE INTERVAL FOR BETA1:

minB1 <- Tesla_s$coefficients[2,1]  - (t_critical_value* Tesla_s$coefficients[2,2] )
maxB1 <-  Tesla_s$coefficients[2,1]  + (t_critical_value* Tesla_s$coefficients[2,2] )

cat("The B1 confidence interval goes from", minB1, "to", maxB1)

## The B1 confidence interval goes from 0.9997957 to 2.521998

Follow the same procedure to get Apple’s CAPM and respond after you run your CAPM regression model for both stocks:

I WILL RESPOND THESE QUESTIONS USING TESLA

FOR TESLA:

THE REGRESSION OUTPUT IS THE FOLLOWING:

Tesla_s

## 
## Call:
## lm(formula = TSLA_Premr ~ GSPC_Premr, na.action = na.omit)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34187 -0.08967 -0.02076  0.08358  0.42657 
## 
## Coefficients:
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  0.01884    0.01589   1.186     0.239    
## GSPC_Premr   1.76090    0.38245   4.604 0.0000154 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1413 on 80 degrees of freedom
## Multiple R-squared:  0.2095, Adjusted R-squared:  0.1996 
## F-statistic:  21.2 on 1 and 80 DF,  p-value: 0.00001538

(a) INTERPRET THE RESULTS OF THE COEFFICIENTS (b0 and b1), THEIR STANDARD ERRORS, P-VALUES AND 95% CONFIDENCE INTERVALS.

BETA0 IS THE EXPECTED PREMIUM RETURN OF TESLA WHEN THE MARKET PREMIUM RETURN IS EQUAL TO ZERO. IN THIS CASE, BETA0= 0.0188429. BETA0 IS A MEASURE OF EXCESS PREMIUM RETURNS OF THE STOCK OVER THE MARKET. IN THIS CONTEXT, THIS COEFFICIENT IS ALSO CALLED ALPHA OF JENSEN. HERE, WE CAN SEE THAT BO COEFFICIENT IS POSITIVE, BUT NOT SIGNIFICANTLY DIFFERENT TO ZERO, SINCE ITS P-VALUE IS HIGHER THAN O.O5 AND THE T-STATISTIC IS LOWER THAN 2, WHICH MEANS THAT TESLA’S PREMIUM RETURNS AREN’T SIGNIFICANTLY HIGHER THAN THE PREMIUM RETURNS OF THE MARKET.

*ON THE OTHER HAND, B1 COEFFICIENT IS A MEASURE OF RISK SENSITIVITY. IT TELLS HOW MUCH THE STOCK PREMIUM RETURNS WILL MOVE ON AVERAGE FOR EACH 1% MOVEMENT IN THE MARKET PREMIUM RETURNS. FOR TESLA, WE HAVE A POSITIVE AND SIGNIFICANT RELATION BETWEEN ITS PREMIUM RETURNS AND THE MARKET PREMIUM RETURNS SINCE ITS BETA1= 1.7608969, AND ITS t-VALUE IS 4.6042453.

(b) DO A QUICK RESEARCH ABOUT THE EFFICIENT MARKET HYPOTHESIS. BRIEFLY DESCRIBE WHAT THIS HYPOTHESIS SAYS.

THIS HYPOTHESIS STATES THAT SHARE PRICES REFLECT ALL INFORMATION AND ARE ALWAYS TRADED AT THEIR FAIR VALUE ON EXCHANGES, MAKING IR IMPOSSIBLE FOR INVESTORS TO PURCHASE UNDERVALUATED STOCKS OR SELL STOCKS FOR INFLATED PRICES. THEREFORE, IT SHOULD BE IMPOSSIBLE TO OUTPERFORM THE OVERALL MARKET AND THE ONLY WAY AN INVESTOR CAN OBTAIN HIGHER RETURNS IS BY PURCHASING RISKIER INVESTMENTS.

(c) ACCORDING TO THE EFFICIENT MARKET HYPOTHESIS, WHAT IS THE EXPECTED VALUE OF b0 in the CAPM REGRESSION MODEL?

ZERO. THIS HYPOTHESIS IMPLIES THAT IT IS IMPOSSIBLE TO FIND A STOCK OR A PORTFOLIO THAT SYSTEMATICALLY OFFERES ALPHA BIGGER THAN ZERO.

(d) ACCORDING TO YOUR RESULTS, IS TESLA SIGNIFICANTLY RISKIER THAN THE MARKET ? WHAT IS THE t-test YOU NEED TO DO TO RESPOND THIS QUESTION? Do the test and provide your interpretation. (Hint: Here you have to change the null hypothesis for b1: H0: b1=1; Ha=b1<>1)

SINCE B1 COEFFICIENT FOR TESLA IS 1.7608969, WE COULD THINK AT FIRST SIGHT THAT ITS PREMIUM RETURNS ARE RISKIER THAN THE MARKET PREMIUM RETURNS. HOWEVER, WE NEED TO TEST IF THIS AFFIRMATION IS VALID 95% OF THE TIME. IN ORDER TO DO SO, WE HAVE TO DO THE FOLLOWING HYPOTHESIS TESTING:

H0: b1=1 Ha: b1>1

THE t-VALUE OF THIS TEST IS:

\[ t=\frac{(b_{1}-1)}{SD(b_{1})} \] IN THIS CASE, THE t-value IS:

\[ t=\frac{(1.7608969-1)}{0.3824507} \]

Tesla_t <- (Tesla_s$coefficients[2,1]-1)/Tesla_s$coefficients[2,2]
Tesla_t

## [1] 1.989529

THE t-value OF THIS TEST IS 1.9895292, which is very close to 2!

I CAN GET THE P-VALUE OF THIS t-value USING THE pt FUNCTION WITH THE DEGREES OF FREEDOM OF THE REGRESSION RESIDUALS:

pvalue<- 2*pt(Tesla_t, Tesla_CAPM$df.residual, lower.tail=FALSE)
pvalue

## [1] 0.05006007

THE pt FUNCTION CALCULATES 1-TAILED P-VALUE; USUALLY, MOST OF THE ECONOMETRIC SOFTWARE REPORTS 2-TAILED P-VALUE, WHICH IS THE DOUBLE OF THE 1-TAILED P-VALUE. THE REASON IS THAT THE 95% CONFIDENCE INTERVAL DOES NOT CONSIDER THE 2.5% OF THE DISTRIBUTION TO THE LEFT TAIL AND THE 2.5% OF THE DISTRIBUTION TO THE RIGHT.

IN THIS CASE, THE P-VALUE WAS BARELY HIGHER THAN 0.05. IN THESE CASES, WE CAN USE OUR OWN CRITERIA TO CONSIDER THAT THE P-VALUE IS 0.05. IN THIS CASE, I WOULD CONSIDER THE P-VALUE TO BE 0.05, SO I WOULD REJECT THE NULL HYPOTHESIS AND ACCEPT THE HYPOTHESIS THAT SAYS THAT B1>1. IN OTHER WORDS, I CAN SAY THAT THERE IS ENOUGH STATISTICAL EVIDENCE AT THE 95% CONFINDENCE INTERVAL TO SAY THAT TESLA PREMIUM RETURNS ARE SIGNIFICANTLY RISKIER THAN THOSE OF THE MARKET.

SEVERAL FINANCIAL ANALYSTS WOULD CALCULATE THE 1-TAILED P-VALUE TO MAKE THEIR OWN CONCLUSIONS WHEN TESTING FOR THE MARKET RISK OF A STOCK. IN OTHER WORDS, THEY WOULD USE A 90% CONFIDENCE INTERVAL INSTEAD OF A 95% CONFIDENCE INTERVAL.

I CAN CALCULATE THE 1-TAILED P-VALUE BY JUST DIVIDING BY 2 THE ORIGINAL P-VALUE:

pvalue1tailed = pvalue / 2
pvalue1tailed

## [1] 0.02503003

THE DECISION TO USE 1-TAILED OR 2-TAILED P-VALUE IS A MATTER OF PERSONAL JUSTIFICATION. SOME ANALYSITS JUSTIFY THE USE OF 1-TAILED PVALUE WITH PAST INFORMATION AND GENERAL BELIEF THAT THE STOCK YOU ARE ANALYZING IS USUALLY MORE RISKY THAN THE MARKET.

Workshop 4 SOLUTION, Financial Econometrics I

Introduction to Hypothesis testing

Alberto Dorantes, Ph.D.

March 10, 2020