1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.

At the beginning of the R Notebook write Workshop 6 - Financial Econometrics I and your name (as we did in previous workshop).

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.

2 Predictions with multiple regression models

We will use the dataset http://www.apradie.com/datos/datamx2020q4.xlsx, which has quarterly financial data for all Mexican firms from 2000 to 2020. You have to download the market data and merge it with this dataset. You can check the solution of W6 to help you with this part. You have to:

  1. Bring the data
# Load the package readxl
library(readxl)
# Download the excel file from a web site:
download.file("http://www.apradie.com/datos/datamx2020q4.xlsx", "dataw7.xlsx", mode="wb")
dataset <- read_excel("dataw7.xlsx")

# Download market data
library(quantmod)
getSymbols("^MXX", from="2000-01-01", to="2019-12-31", periodicity="monthly", src="yahoo")
## [1] "^MXX"
  1. Merge the market return
# Collapse monthly data to quarterly data using the last value of each quarter

QMXX <- to.quarterly(MXX, indexAt = 'startof')
# The to.quarterly function collapse the dataset from monthly to quarterly
#   taking the values of the last month of each quarter. 
# in the option indexAt I indicate that I want to keep the first month of the 
#   quarter as index. I do this since the panel data uses the first month for 
#   each quarter


# Select the Adjusted price column
QMXX <- Ad(QMXX)
# Change column name
colnames(QMXX) <- c("IPC")
# Obtain market returns
QMXX$IPCrets <- diff(log(QMXX))
# Create a data frame object from QMXX
QMXX.df <- data.frame(quarter=index(QMXX), coredata(QMXX))

# Turn the common column to the same type
dataset$quarter <- as.Date(dataset$quarter)
# Many-to-1 merge
dataset <- merge(dataset, QMXX.df, by="quarter")

# Turn the data frame into panel data 
library(plm)
## Warning: package 'plm' was built under R version 4.1.1
paneldata <- pdata.frame(dataset, index = c("firmcode", "quarter"))
  1. Calculate and winsorize the book-to-market ratio. To calculate book-to-market ratio, use the price (original stock price) instead of price (adjusted stock price), since the adjusted price is not the actual historical stock price.

Install the statar package before working in this section. This package has a better winsorize function. You only specify the minimum and maximum percentile you want to use for the winsorization:

# Keep only active firms
paneldata <- paneldata[paneldata$status == "active", ]

# Calculate bmr
paneldata$bookvalue <- paneldata$totalassets - paneldata$totalliabilities
paneldata$marketvalue<-paneldata$originalhistoricalstockprice*paneldata$sharesoutstanding
paneldata$bmr <- paneldata$bookvalue / paneldata$marketvalue

# Winsorize bmr
# We will use the winsorize function from the statar package.
# This function can work with panel data (the previous function from the robustHD
#   package cannot)
library(statar)
# We only specify the 2 percentiles for the winsorization: 
paneldata$bmr_w <- winsorize(paneldata$bmr, probs = c(0.02,0.98))

Now we will learn how to make predictions for multiple regression models using the predict.lm function. Do the following:

  1. Learn about earnings per share (EPS). Do your research on the Internet or books. In your R script, explain with your own words what is earnings per share and how it can be estimated.

  2. If you divide earnings per share by stock price, what do you get? Explain what might be this new ratio and how do you think differs from the previous earnings per share.

  3. Generate a new variable called eps for earnings per share for all firms and all quarters using the panel dataset. Use the variable ebit as a measure for earnings and sharesoutstanding as number of shares.

# Calculate eps
paneldata$eps <- paneldata$ebit / paneldata$sharesoutstanding
  1. Generate a new variable called epsp earnings per share divided by stock price.

  2. Winsorize this epsp variable and name this winsorized variable as epspw. Find the best way to do this winsorization (to the left and/or to the right)

#Calculate epsp
paneldata$epsp <- paneldata$eps / paneldata$originalhistoricalstockprice
# Winsorize epsp
paneldata$epsp_w <- winsorize(paneldata$epsp, probs = c(0.02,0.98))
## 0.88 % observations replaced at the bottom
## 0.88 % observations replaced at the top
  1. Run the multiple regression model for all firms and for ONLY the last quarter (2019q4) to examine whether the Earnings per Share divided by price (winsorized) and book-to-market ratio (winsorized) have explanatory power for firm returns. INTERPRET your regression model. Make sure you interpret the partial betas as well as their level of significance.
# Calculate cc returns for all returns
paneldata$stockreturn <- diff(log(paneldata$adjustedstockprice))
lastq <- as.data.frame(paneldata[(paneldata$quarter=="2019-10-01"),])

# Construct the regression model: 
#   first write the dependent variable (y), then all explanatory variables
reg1 <- lm(stockreturn ~ epsp_w + bmr_w, data = lastq)
s_reg1 <- summary(reg1)
s_reg1
## 
## Call:
## lm(formula = stockreturn ~ epsp_w + bmr_w, data = lastq)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42027 -0.07916 -0.01485  0.07058  0.62443 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.05782    0.02709   2.134   0.0351 *
## epsp_w      -0.01946    0.14011  -0.139   0.8898  
## bmr_w       -0.03106    0.01915  -1.622   0.1077  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1584 on 110 degrees of freedom
##   (39 observations deleted due to missingness)
## Multiple R-squared:  0.02496,    Adjusted R-squared:  0.007232 
## F-statistic: 1.408 on 2 and 110 DF,  p-value: 0.249
  1. Predict the stock return for a company with a bmrw=0.8, and a epspw=0.050. What is the expected stock return? Do this manually.

  2. Now predict the result using the predict.lm() function, which will output the result of the prediction. If you set the interval argument to “confidence” you will get the 95% confidence interval. Did you get the same prediction than manually? Interpret the 95% confidence interval for the prediction.

new_x <- data.frame(epsp_w=c(0.05), bmr_w=c(0.8))
predict.lm(reg1, newdata = new_x)
##          1 
## 0.03199618
pr_reg1 <- predict.lm(reg1, new_x, interval = "confidence")
pr_reg1
##          fit          lwr        upr
## 1 0.03199618 -0.002305707 0.06629807
# Join both objects in order to have a better perception
pr_reg1.df <- cbind(new_x, pr_reg1)
pr_reg1.df
##   epsp_w bmr_w        fit          lwr        upr
## 1   0.05   0.8 0.03199618 -0.002305707 0.06629807
  1. Run a multiple regression model for all firms and all quarters to examine whether the market return and the BMR provides explanation for firm returns. Estimate the following predictions:
  1. Predict the firm return for a company with a BMR=0.8, holding the market return constant. Interpret the prediction and the 95% Confidence Interval.
# Construct the model using the whole paneldata object
reg2 <- lm(stockreturn ~ IPCrets + bmr_w, data = paneldata)
s_reg2 <- summary(reg2)
s_reg2
## 
## Call:
## lm(formula = stockreturn ~ IPCrets + bmr_w, data = paneldata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26380 -0.07630 -0.00448  0.07449  1.56551 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.032240   0.002972   10.85   <2e-16 ***
## IPCrets      0.830298   0.023419   35.45   <2e-16 ***
## bmr_w       -0.032087   0.002315  -13.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1675 on 7096 degrees of freedom
##   (5061 observations deleted due to missingness)
## Multiple R-squared:  0.1712, Adjusted R-squared:  0.171 
## F-statistic: 733.1 on 2 and 7096 DF,  p-value: < 2.2e-16
new_x2 <- data.frame(bmr_w=0.8, IPCrets = 1)
pr_reg2 <- predict.lm(reg2, new_x2, interval = "confidence")
pr_reg2
##         fit       lwr       upr
## 1 0.8368685 0.7916157 0.8821213
pr_reg2.df <- cbind(new_x2, pr_reg2)
pr_reg2.df
##   bmr_w IPCrets       fit       lwr       upr
## 1   0.8       1 0.8368685 0.7916157 0.8821213
  1. Predict the firm returns for a company when the quarterly market is equal to -2%, -1%, 0%, +1%, and +2%. Check the 95% CI of the predictions. Provide a brief INTERPRETATION of the output and the graph
new_x2b <- data.frame(IPCrets=seq(from=-0.02, to=0.02, by=0.01), bmr_w=1)
pr_reg2b <- predict.lm(reg2, new_x2b, interval = "confidence")
pr_reg2b
##             fit          lwr          upr
## 1 -0.0164523682 -0.020726696 -0.012178040
## 2 -0.0081493921 -0.012259322 -0.004039462
## 3  0.0001535839 -0.003838245  0.004145413
## 4  0.0084565599  0.004532355  0.012380765
## 5  0.0167595360  0.012849856  0.020669215
pr_reg2b.df <- cbind(new_x2b, pr_reg2b) 
colnames(pr_reg2b.df) <- c("IPCrets", "bmr_w", "Stockreturn", "lwr", "upr")
pr_reg2b.df
##   IPCrets bmr_w   Stockreturn          lwr          upr
## 1   -0.02     1 -0.0164523682 -0.020726696 -0.012178040
## 2   -0.01     1 -0.0081493921 -0.012259322 -0.004039462
## 3    0.00     1  0.0001535839 -0.003838245  0.004145413
## 4    0.01     1  0.0084565599  0.004532355  0.012380765
## 5    0.02     1  0.0167595360  0.012849856  0.020669215
library(ggplot2)
ggplot(pr_reg2b.df, aes(x = IPCrets, y=Stockreturn))+
  geom_point(size = 2) + geom_line() +
  geom_errorbar(aes(ymax = upr, ymin=lwr))

library(prediction)
prediction(reg2, at = list(IPCrets=seq(from=-0.02, to=0.02, by=0.01), bmr_w=1))
##  IPCrets bmr_w          x
##    -0.02     1 -0.0164524
##    -0.01     1 -0.0081494
##     0.00     1  0.0001536
##     0.01     1  0.0084566
##     0.02     1  0.0167595

3 Multiple regression models with future values of DV

In the previous part we examined whether the market return and the BMR influence the stock return. We run the models using contemporary values of the variables. What does this mean? We examine whether the BMR and the market return of a period influence the stock return in the SAME period. In multiple regression models in R, we can also examine non-contemporaneous (lagged or future) effects of the independent variables. For example, we can examine whether the BMR and the market return influences future stock return (1 quarter or 4 quarters later). In Finance it is important to examine lag effects of independent variables on the dependent variable. An important reason is that financial statement releases of public firms usualy last for 1 or 2 months after the closing accounting period. For example, the financial statement of Q4 of 2020 can be release up to February or March 2021.

Let’s work with an example.

Which are the lagged variables we can use in R? We can use the lag function and the plm function. The plm stands for panel-data linear model.

lag(variable, #) refers to the LAG number # of the variable. Lag # 1 refers to the value of the variable 1 period ago.

Examples:

lag(bmrw,1) refers to the previous value of book-to-market ratio in the dataset. If the dataset has quarters, then it is the bmr value 1 quarter ago.

lag(bmr,4) refers to the previous value of book-to-market ratio ONE year ago if the dataset has quarterly data.

If you want to go forward (ahead) instead, you can still use the lag function, but you need to specifya negative #. For example:

lag(bmr,-4) refers to the future value of book-to-market ratio ONE year in the future if the dataset has quarterly data.

Using the same panel dataset we used in the previous part, do the following models:

  1. Model 1. Design two regression models to examine whether the BMRw (winsorized) and the market return influence a) the future stock return one quarter later, and b) the future stock return one year later. Consider ONLY active firms. Run the model and briefly INTERPRET your results.

We will use the plm function from the plm package.

# Load the package
library(plm)
# Use the lag() function with -1 indicating to go forward 1 period
# Using negative numbers is like going reverse
model1 <- plm(lag(stockreturn, -1) ~ IPCrets + bmr_w, data=paneldata, model="pooling")
s_model1 <- summary(model1)
s_model1
## Pooling Model
## 
## Call:
## plm(formula = lag(stockreturn, -1) ~ IPCrets + bmr_w, data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3348414 -0.0774005 -0.0027878  0.0818477  1.4644899 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) 0.0017384  0.0032410  0.5364    0.5917    
## IPCrets     0.2748376  0.0253327 10.8491 < 2.2e-16 ***
## bmr_w       0.0114935  0.0025282  4.5461 5.557e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    237.61
## Residual Sum of Squares: 233.09
## R-Squared:      0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16

This model can also be constructed as:

model1a<-plm(stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata, model="pooling")
summary(model1a)
## Pooling Model
## 
## Call:
## plm(formula = stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3348414 -0.0774005 -0.0027878  0.0818477  1.4644899 
## 
## Coefficients:
##               Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)  0.0017384  0.0032410  0.5364    0.5917    
## lag(IPCrets) 0.2748376  0.0253327 10.8491 < 2.2e-16 ***
## lag(bmr_w)   0.0114935  0.0025282  4.5461 5.557e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    237.61
## Residual Sum of Squares: 233.09
## R-Squared:      0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16
  1. Model 2. Run the following model:
  1. Run a multiple regression model to examine whether BMRw and EPSPw influence future stock returns one year later. INTERPRET the output of the model.
model2 <- plm(lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata, model="pooling")
s_model2 <- summary(model2)
s_model2
## Pooling Model
## 
## Call:
## plm(formula = lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 126, T = 1-76, N = 4756
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3128451 -0.0740470 -0.0028452  0.0789605  1.4889600 
## 
## Coefficients:
##               Estimate Std. Error t-value Pr(>|t|)  
## (Intercept)  0.0019973  0.0041118  0.4857  0.62717  
## bmr_w        0.0089931  0.0035967  2.5004  0.01244 *
## epsp_w      -0.0066547  0.0343372 -0.1938  0.84634  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    157.48
## Residual Sum of Squares: 157.26
## R-Squared:      0.0013965
## Adj. R-Squared: 0.00097633
## F-statistic: 3.3235 on 2 and 4753 DF, p-value: 0.03611
  1. Using predict.lm() or prediction() functions, predict the stock return of firms when BMRw moves from 0.6 to 1.6, moving by 0.10. Keep EPSPw constant (with no change). Run a plot of these predictions. Interpret the graph.
newx_model2 <- data.frame(bmr_w = seq(from=0.6, to=1.6, by=0.1), epsp_w=mean(paneldata$epsp_w, na.rm=TRUE))
pr1_model2 <- prediction_summary(model=model2, at=newx_model2,level=0.95)
colnames(pr1_model2) <- c("bmr_w","epsp_w", "Predicted_return")
var_b0 <- s_model2$coefficients[1,2]^2
var_b1 <- s_model2$coefficients[2,2]^2
var_b2 <- s_model2$coefficients[3,2]^2
cov_coeff <- cov(matrix(c(s_model2$coefficients[1,1], s_model2$coefficients[2,1],
s_model2$coefficients[3,1])))
pr1_model2$SE <- sqrt(var_b0 + pr1_model2$bmr_w^2*var_b1 +
pr1_model2$epsp_w^2*var_b2 + 2*cov_coeff)
## Warning in var_b0 + pr1_model2$bmr_w^2 * var_b1 + pr1_model2$epsp_w^2 * : Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.
pr1_model2$lwr <- pr1_model2$Predicted_return - 2*pr1_model2$SE
pr1_model2$upr <- pr1_model2$Predicted_return + 2*pr1_model2$SE

pr1_model2
##  bmr_w  epsp_w Predicted_return NA NA NA NA NA      SE      lwr     upr
##    0.6 0.06349         0.006971 NA NA NA NA NA 0.01221 -0.01746 0.03140
##    0.7 0.06349         0.007870 NA NA NA NA NA 0.01228 -0.01670 0.03244
##    0.8 0.06349         0.008769 NA NA NA NA NA 0.01236 -0.01596 0.03349
##    0.9 0.06349         0.009669 NA NA NA NA NA 0.01245 -0.01523 0.03457
##    1.0 0.06349         0.010568 NA NA NA NA NA 0.01255 -0.01453 0.03567
##    1.1 0.06349         0.011467 NA NA NA NA NA 0.01266 -0.01385 0.03678
##    1.2 0.06349         0.012366 NA NA NA NA NA 0.01277 -0.01318 0.03791
##    1.3 0.06349         0.013266 NA NA NA NA NA 0.01290 -0.01253 0.03907
##    1.4 0.06349         0.014165 NA NA NA NA NA 0.01303 -0.01190 0.04023
##    1.5 0.06349         0.015064 NA NA NA NA NA 0.01318 -0.01129 0.04142
##    1.6 0.06349         0.015964 NA NA NA NA NA 0.01333 -0.01069 0.04262
ggplot(pr1_model2, aes(x = bmr_w, y=Predicted_return))+
  geom_point(size = 2) + geom_line() +
  geom_errorbar(aes(ymax = upr, ymin=lwr))

4 Quiz 8 and W8 submission

Go to Canvas and respond Quiz 8. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.

The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
  • Not submitted (0%)

Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.