Abstract
In this workshop we learn how to do predictions with multiple regression models in the context of Finance. Also, we learn how to run panel-data multiple regression changing the timing of the dependent variable, so we can examine how financial ratios can be related with future stock returns one quarter or one year later.You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.
At the beginning of the R Notebook write Workshop 6 - Financial Econometrics I and your name (as we did in previous workshop).
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.
We will use the dataset http://www.apradie.com/datos/datamx2020q4.xlsx, which has quarterly financial data for all Mexican firms from 2000 to 2020. You have to download the market data and merge it with this dataset. You can check the solution of W6 to help you with this part. You have to:
# Load the package readxl
library(readxl)
# Download the excel file from a web site:
download.file("http://www.apradie.com/datos/datamx2020q4.xlsx", "dataw7.xlsx", mode="wb")
<- read_excel("dataw7.xlsx")
dataset
# Download market data
library(quantmod)
getSymbols("^MXX", from="2000-01-01", to="2019-12-31", periodicity="monthly", src="yahoo")
## [1] "^MXX"
# Collapse monthly data to quarterly data using the last value of each quarter
<- to.quarterly(MXX, indexAt = 'startof')
QMXX # The to.quarterly function collapse the dataset from monthly to quarterly
# taking the values of the last month of each quarter.
# in the option indexAt I indicate that I want to keep the first month of the
# quarter as index. I do this since the panel data uses the first month for
# each quarter
# Select the Adjusted price column
<- Ad(QMXX)
QMXX # Change column name
colnames(QMXX) <- c("IPC")
# Obtain market returns
$IPCrets <- diff(log(QMXX))
QMXX# Create a data frame object from QMXX
<- data.frame(quarter=index(QMXX), coredata(QMXX))
QMXX.df
# Turn the common column to the same type
$quarter <- as.Date(dataset$quarter)
dataset# Many-to-1 merge
<- merge(dataset, QMXX.df, by="quarter")
dataset
# Turn the data frame into panel data
library(plm)
## Warning: package 'plm' was built under R version 4.1.1
<- pdata.frame(dataset, index = c("firmcode", "quarter")) paneldata
Install the statar package before working in this section. This package has a better winsorize function. You only specify the minimum and maximum percentile you want to use for the winsorization:
# Keep only active firms
<- paneldata[paneldata$status == "active", ]
paneldata
# Calculate bmr
$bookvalue <- paneldata$totalassets - paneldata$totalliabilities
paneldata$marketvalue<-paneldata$originalhistoricalstockprice*paneldata$sharesoutstanding
paneldata$bmr <- paneldata$bookvalue / paneldata$marketvalue
paneldata
# Winsorize bmr
# We will use the winsorize function from the statar package.
# This function can work with panel data (the previous function from the robustHD
# package cannot)
library(statar)
# We only specify the 2 percentiles for the winsorization:
$bmr_w <- winsorize(paneldata$bmr, probs = c(0.02,0.98)) paneldata
Now we will learn how to make predictions for multiple regression models using the predict.lm function. Do the following:
Learn about earnings per share (EPS). Do your research on the Internet or books. In your R script, explain with your own words what is earnings per share and how it can be estimated.
If you divide earnings per share by stock price, what do you get? Explain what might be this new ratio and how do you think differs from the previous earnings per share.
Generate a new variable called eps for earnings per share for all firms and all quarters using the panel dataset. Use the variable ebit as a measure for earnings and sharesoutstanding as number of shares.
# Calculate eps
$eps <- paneldata$ebit / paneldata$sharesoutstanding paneldata
Generate a new variable called epsp earnings per share divided by stock price.
Winsorize this epsp variable and name this winsorized variable as epspw. Find the best way to do this winsorization (to the left and/or to the right)
#Calculate epsp
$epsp <- paneldata$eps / paneldata$originalhistoricalstockprice
paneldata# Winsorize epsp
$epsp_w <- winsorize(paneldata$epsp, probs = c(0.02,0.98)) paneldata
## 0.88 % observations replaced at the bottom
## 0.88 % observations replaced at the top
# Calculate cc returns for all returns
$stockreturn <- diff(log(paneldata$adjustedstockprice))
paneldata<- as.data.frame(paneldata[(paneldata$quarter=="2019-10-01"),])
lastq
# Construct the regression model:
# first write the dependent variable (y), then all explanatory variables
<- lm(stockreturn ~ epsp_w + bmr_w, data = lastq)
reg1 <- summary(reg1)
s_reg1 s_reg1
##
## Call:
## lm(formula = stockreturn ~ epsp_w + bmr_w, data = lastq)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.42027 -0.07916 -0.01485 0.07058 0.62443
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.05782 0.02709 2.134 0.0351 *
## epsp_w -0.01946 0.14011 -0.139 0.8898
## bmr_w -0.03106 0.01915 -1.622 0.1077
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1584 on 110 degrees of freedom
## (39 observations deleted due to missingness)
## Multiple R-squared: 0.02496, Adjusted R-squared: 0.007232
## F-statistic: 1.408 on 2 and 110 DF, p-value: 0.249
Predict the stock return for a company with a bmrw=0.8, and a epspw=0.050. What is the expected stock return? Do this manually.
Now predict the result using the predict.lm() function, which will output the result of the prediction. If you set the interval argument to “confidence” you will get the 95% confidence interval. Did you get the same prediction than manually? Interpret the 95% confidence interval for the prediction.
<- data.frame(epsp_w=c(0.05), bmr_w=c(0.8))
new_x predict.lm(reg1, newdata = new_x)
## 1
## 0.03199618
<- predict.lm(reg1, new_x, interval = "confidence")
pr_reg1 pr_reg1
## fit lwr upr
## 1 0.03199618 -0.002305707 0.06629807
# Join both objects in order to have a better perception
<- cbind(new_x, pr_reg1)
pr_reg1.df pr_reg1.df
## epsp_w bmr_w fit lwr upr
## 1 0.05 0.8 0.03199618 -0.002305707 0.06629807
# Construct the model using the whole paneldata object
<- lm(stockreturn ~ IPCrets + bmr_w, data = paneldata)
reg2 <- summary(reg2)
s_reg2 s_reg2
##
## Call:
## lm(formula = stockreturn ~ IPCrets + bmr_w, data = paneldata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.26380 -0.07630 -0.00448 0.07449 1.56551
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.032240 0.002972 10.85 <2e-16 ***
## IPCrets 0.830298 0.023419 35.45 <2e-16 ***
## bmr_w -0.032087 0.002315 -13.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1675 on 7096 degrees of freedom
## (5061 observations deleted due to missingness)
## Multiple R-squared: 0.1712, Adjusted R-squared: 0.171
## F-statistic: 733.1 on 2 and 7096 DF, p-value: < 2.2e-16
<- data.frame(bmr_w=0.8, IPCrets = 1)
new_x2 <- predict.lm(reg2, new_x2, interval = "confidence")
pr_reg2 pr_reg2
## fit lwr upr
## 1 0.8368685 0.7916157 0.8821213
<- cbind(new_x2, pr_reg2)
pr_reg2.df pr_reg2.df
## bmr_w IPCrets fit lwr upr
## 1 0.8 1 0.8368685 0.7916157 0.8821213
<- data.frame(IPCrets=seq(from=-0.02, to=0.02, by=0.01), bmr_w=1)
new_x2b <- predict.lm(reg2, new_x2b, interval = "confidence")
pr_reg2b pr_reg2b
## fit lwr upr
## 1 -0.0164523682 -0.020726696 -0.012178040
## 2 -0.0081493921 -0.012259322 -0.004039462
## 3 0.0001535839 -0.003838245 0.004145413
## 4 0.0084565599 0.004532355 0.012380765
## 5 0.0167595360 0.012849856 0.020669215
<- cbind(new_x2b, pr_reg2b)
pr_reg2b.df colnames(pr_reg2b.df) <- c("IPCrets", "bmr_w", "Stockreturn", "lwr", "upr")
pr_reg2b.df
## IPCrets bmr_w Stockreturn lwr upr
## 1 -0.02 1 -0.0164523682 -0.020726696 -0.012178040
## 2 -0.01 1 -0.0081493921 -0.012259322 -0.004039462
## 3 0.00 1 0.0001535839 -0.003838245 0.004145413
## 4 0.01 1 0.0084565599 0.004532355 0.012380765
## 5 0.02 1 0.0167595360 0.012849856 0.020669215
library(ggplot2)
ggplot(pr_reg2b.df, aes(x = IPCrets, y=Stockreturn))+
geom_point(size = 2) + geom_line() +
geom_errorbar(aes(ymax = upr, ymin=lwr))
library(prediction)
prediction(reg2, at = list(IPCrets=seq(from=-0.02, to=0.02, by=0.01), bmr_w=1))
## IPCrets bmr_w x
## -0.02 1 -0.0164524
## -0.01 1 -0.0081494
## 0.00 1 0.0001536
## 0.01 1 0.0084566
## 0.02 1 0.0167595
In the previous part we examined whether the market return and the BMR influence the stock return. We run the models using contemporary values of the variables. What does this mean? We examine whether the BMR and the market return of a period influence the stock return in the SAME period. In multiple regression models in R, we can also examine non-contemporaneous (lagged or future) effects of the independent variables. For example, we can examine whether the BMR and the market return influences future stock return (1 quarter or 4 quarters later). In Finance it is important to examine lag effects of independent variables on the dependent variable. An important reason is that financial statement releases of public firms usualy last for 1 or 2 months after the closing accounting period. For example, the financial statement of Q4 of 2020 can be release up to February or March 2021.
Let’s work with an example.
Which are the lagged variables we can use in R? We can use the lag function and the plm function. The plm stands for panel-data linear model.
lag(variable, #) refers to the LAG number # of the variable. Lag # 1 refers to the value of the variable 1 period ago.
Examples:
lag(bmrw,1) refers to the previous value of book-to-market ratio in the dataset. If the dataset has quarters, then it is the bmr value 1 quarter ago.
lag(bmr,4) refers to the previous value of book-to-market ratio ONE year ago if the dataset has quarterly data.
If you want to go forward (ahead) instead, you can still use the lag function, but you need to specifya negative #. For example:
lag(bmr,-4) refers to the future value of book-to-market ratio ONE year in the future if the dataset has quarterly data.
Using the same panel dataset we used in the previous part, do the following models:
We will use the plm function from the plm package.
# Load the package
library(plm)
# Use the lag() function with -1 indicating to go forward 1 period
# Using negative numbers is like going reverse
<- plm(lag(stockreturn, -1) ~ IPCrets + bmr_w, data=paneldata, model="pooling")
model1 <- summary(model1)
s_model1 s_model1
## Pooling Model
##
## Call:
## plm(formula = lag(stockreturn, -1) ~ IPCrets + bmr_w, data = paneldata,
## model = "pooling")
##
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3348414 -0.0774005 -0.0027878 0.0818477 1.4644899
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 0.0017384 0.0032410 0.5364 0.5917
## IPCrets 0.2748376 0.0253327 10.8491 < 2.2e-16 ***
## bmr_w 0.0114935 0.0025282 4.5461 5.557e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 237.61
## Residual Sum of Squares: 233.09
## R-Squared: 0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16
This model can also be constructed as:
<-plm(stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata, model="pooling")
model1asummary(model1a)
## Pooling Model
##
## Call:
## plm(formula = stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata,
## model = "pooling")
##
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3348414 -0.0774005 -0.0027878 0.0818477 1.4644899
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 0.0017384 0.0032410 0.5364 0.5917
## lag(IPCrets) 0.2748376 0.0253327 10.8491 < 2.2e-16 ***
## lag(bmr_w) 0.0114935 0.0025282 4.5461 5.557e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 237.61
## Residual Sum of Squares: 233.09
## R-Squared: 0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16
<- plm(lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata, model="pooling")
model2 <- summary(model2)
s_model2 s_model2
## Pooling Model
##
## Call:
## plm(formula = lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata,
## model = "pooling")
##
## Unbalanced Panel: n = 126, T = 1-76, N = 4756
##
## Residuals:
## Min. 1st Qu. Median 3rd Qu. Max.
## -1.3128451 -0.0740470 -0.0028452 0.0789605 1.4889600
##
## Coefficients:
## Estimate Std. Error t-value Pr(>|t|)
## (Intercept) 0.0019973 0.0041118 0.4857 0.62717
## bmr_w 0.0089931 0.0035967 2.5004 0.01244 *
## epsp_w -0.0066547 0.0343372 -0.1938 0.84634
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Total Sum of Squares: 157.48
## Residual Sum of Squares: 157.26
## R-Squared: 0.0013965
## Adj. R-Squared: 0.00097633
## F-statistic: 3.3235 on 2 and 4753 DF, p-value: 0.03611
<- data.frame(bmr_w = seq(from=0.6, to=1.6, by=0.1), epsp_w=mean(paneldata$epsp_w, na.rm=TRUE))
newx_model2 <- prediction_summary(model=model2, at=newx_model2,level=0.95)
pr1_model2 colnames(pr1_model2) <- c("bmr_w","epsp_w", "Predicted_return")
<- s_model2$coefficients[1,2]^2
var_b0 <- s_model2$coefficients[2,2]^2
var_b1 <- s_model2$coefficients[3,2]^2
var_b2 <- cov(matrix(c(s_model2$coefficients[1,1], s_model2$coefficients[2,1],
cov_coeff $coefficients[3,1])))
s_model2$SE <- sqrt(var_b0 + pr1_model2$bmr_w^2*var_b1 +
pr1_model2$epsp_w^2*var_b2 + 2*cov_coeff) pr1_model2
## Warning in var_b0 + pr1_model2$bmr_w^2 * var_b1 + pr1_model2$epsp_w^2 * : Recycling array of length 1 in vector-array arithmetic is deprecated.
## Use c() or as.vector() instead.
$lwr <- pr1_model2$Predicted_return - 2*pr1_model2$SE
pr1_model2$upr <- pr1_model2$Predicted_return + 2*pr1_model2$SE
pr1_model2
pr1_model2
## bmr_w epsp_w Predicted_return NA NA NA NA NA SE lwr upr
## 0.6 0.06349 0.006971 NA NA NA NA NA 0.01221 -0.01746 0.03140
## 0.7 0.06349 0.007870 NA NA NA NA NA 0.01228 -0.01670 0.03244
## 0.8 0.06349 0.008769 NA NA NA NA NA 0.01236 -0.01596 0.03349
## 0.9 0.06349 0.009669 NA NA NA NA NA 0.01245 -0.01523 0.03457
## 1.0 0.06349 0.010568 NA NA NA NA NA 0.01255 -0.01453 0.03567
## 1.1 0.06349 0.011467 NA NA NA NA NA 0.01266 -0.01385 0.03678
## 1.2 0.06349 0.012366 NA NA NA NA NA 0.01277 -0.01318 0.03791
## 1.3 0.06349 0.013266 NA NA NA NA NA 0.01290 -0.01253 0.03907
## 1.4 0.06349 0.014165 NA NA NA NA NA 0.01303 -0.01190 0.04023
## 1.5 0.06349 0.015064 NA NA NA NA NA 0.01318 -0.01129 0.04142
## 1.6 0.06349 0.015964 NA NA NA NA NA 0.01333 -0.01069 0.04262
ggplot(pr1_model2, aes(x = bmr_w, y=Predicted_return))+
geom_point(size = 2) + geom_line() +
geom_errorbar(aes(ymax = upr, ymin=lwr))
Go to Canvas and respond Quiz 8. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.
The grade of this Workshop will be the following:
Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.