1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.

At the beginning of the R Notebook write Workshop 8 - Financial Econometrics I and your name (as we did in previous workshop).

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.

2 Predictions with multiple regression models

2.1 Data collection and management

We will use the dataset http://www.apradie.com/datos/datamx2020q4.xlsx, which has quarterly financial data for all Mexican firms from 2000 to 2020. You have to download the market data and merge it with this dataset. You can check the solution of W6 to help you with this part. You have to:

Bring the paneel dataset from a web site, and the market data from Yahoo Finance:

# Load the package readxl
library(readxl)
# Download the excel file from a web site:
download.file("http://www.apradie.com/datos/datamx2020q4.xlsx", "dataw7.xlsx", mode="wb")
dataset <- read_excel("dataw7.xlsx")

# Download market data from Yahoo Finance:
library(quantmod)
getSymbols("^MXX", from="2000-01-01", to="2019-12-31", periodicity="monthly", src="yahoo")

## [1] "^MXX"

Merge the market return

I first have to collapse the market data from monthly to quarterly data since the panel dataset has quarterly data. Once the market data is in quarters, I calculate returns and merge it with the panel dataset that has financial variables of firms:

# Collapse monthly data to quarterly data using the last value of each quarter

QMXX <- to.quarterly(MXX, indexAt = 'startof')
# The to.quarterly function collapse the dataset from monthly to quarterly
#   taking the values of the last month of each quarter. 
# in the option indexAt I indicate that I want to keep the first month of the 
#   quarter as index. I do this since the panel data uses the first month for 
#   each quarter


# Select the Adjusted price column
QMXX <- Ad(QMXX)
# Change column name
colnames(QMXX) <- c("IPC")

# I calculate market quarterly continuously compounded returns:
QMXX$IPCrets <- diff(log(QMXX))

# Create a data frame object from QMXX
QMXX.df <- data.frame(quarter=index(QMXX), coredata(QMXX))

# Convert the quarter column to a Date type of column:
dataset$quarter <- as.Date(dataset$quarter)
# I merge the market dataset with the panel dataset: 
dataset <- merge(dataset, QMXX.df, by="quarter")
# This is a many-to-1 type of data merging since the market return is merged N 
#   times according to the N firms in the panel dataset

# Convert the data set from a data frame to a panel data frame using the 
#   pdata.frame function from the plm library:
library(plm)
paneldata <- pdata.frame(dataset, index = c("firmcode", "quarter"))

2.2 Data calculations and data cleanning

Calculate and winsorize the book-to-market ratio. To calculate book-to-market ratio, use the price (original stock price) instead of price (adjusted stock price), since the adjusted price is not the actual historical stock price.

Install the statar package before working in this section. This package has a better winsorize function. You only specify the minimum and maximum percentile you want to use for the winsorization:

# Keep only active firms
paneldata <- paneldata[paneldata$status == "active", ]

# Calculate the book value of equity, that is total assetsw minus total liabilities:
paneldata$bookvalue <- paneldata$totalassets - paneldata$totalliabilities
# Calculate the market value that is the # of shares times the original stock price:
paneldata$marketvalue<-paneldata$originalhistoricalstockprice*paneldata$sharesoutstanding
# Calculate book-to-market ratio:
paneldata$bmr <- paneldata$bookvalue / paneldata$marketvalue

# Calculate the bmr Winsorized:
# We will use the winsorize function from the statar package.
# This function can work with panel data (the previous function from the robustHD
#   package cannot)
library(statar)
# We only specify the 2 percentiles for the winsorization in both sides of the 
#   distribution: 
paneldata$bmr_w <- winsorize(paneldata$bmr, probs = c(0.02,0.98))

Now we will learn how to make predictions for multiple regression models using the predict.lm function. Do the following:

1. Learn about earnings per share (EPS). Do your research on the Internet or books. EXPLAIN with your own words what is earnings per share and how it can be estimated.

EARNINGS PER SHARE IS EQUAL TO EARNINGS DIVIDED BY THE # OF SHARES. THE MEASURE FOR EARNINGS IS NET INCOME. HOWEVER, SOME ANALYSTS ALSO USE OTHER OPERATIONAL MEASURES FOR EARNINGS SUCH AS EARNINGS BEFORE INTEREST AND TAXES (EBIT). IF WE WANT TO MEASURE OPERATIONAL EARNINGS AND CALCULATE IT FOR MANY FIRMS, IT IS RECOMMENDED TO USE EBIT AS A MEASURE OF EARNINGS.

THE FORMULA OF EPS USING NET INCOME AS A MEASURE OF EARNINGS IS:

\[ EPS_{t}=\frac{NETINCOME_{t}}{\#OFSHARES_{t}} \]

THE FORMULA FOR EPS USING EBIT AS A MEASURE OF EARNINGS IS:*

\[ EPS_{t}=\frac{EBIT_{t}}{\#OFSHARES_{t}} \] IN A HYPOTHETICAL SCENARIO, IF THE ALL EARNINGS OF A PERIOD t WERE PAYED TO THE INVESTORS, THEN EPS WILL BE HOW MUCH OF ALL EARNINGS OF THE PERIOD IS PAYED TO EACH SHARE OWN BY INVESTORS.

If you divide earnings per share by stock price, what do you get? Explain what might be this new ratio and how do you think differs from the previous earnings per share.

IF WE WANT TO CALCULATE EPS FOR MORE THAN ONE FIRM, IT IS STRONGLY RECOMMENDED TO CALCULATE EARNINGS PER SHARE DEFLATED BY STOCK PRICE TO MAKE THE MEASURE COMPARABLE WITH AMONG FIRMS.

THE PROBLEM IS THAT EPS IS NOT A STANDARDIZED MEASURE SINCE EACH FIRM HAS DIFFERENT LEVELS OF STOCK PRICES, AND THE STOCK PRICE DOES NOT TELL US ANYTHING ABOUT THE VALUE OF THE FIRM. WE NEED TO MULTIPLY STOCK PRICE TIMES THE # OF SHARES OUTSTANDING TO CALCULATE THE MARKET VALUE OF THE FIRM.

THEN, IF WE DIVIDE EPS BY THE STOCK PRICE, THEN WE GET THE AMOUNT OF EARNINGS THAT EACH $1.00 OF THE MARKET VALUE WOULD RECEIVE AS EARNINGS.

FOR EXAMPLE, IMAGINE THAT THERE ARE 2 FIRMS THAT COMPETE IN THE INFORMATION TECHNOLOGY INDUSTRY. BOTH ARE ABOUT THE SAME SIZE IN TERMS OF MARKET VALUE, BUT WITH DIFFERENT STOCK PRICES:

FIRM 1: STOCK PRICE = $10.00, # OF SHARES= 100 MILLION; MARKET VALUE = $1,000 MILLION

FIRM 2: STOCK PRICE = $100.00, # OF SHARES= 10 MILLION; MARKET VALUE = $1,000 MILLION

LET’S CALCULATE EPS FOR BOTH FIRMS USING EBIT FOR THE LAST PERIOD. IMAGINE THAT BOTH FIRMS HAD EXACTLY THE SAME EARNINGS, WITH EBIT = $100 MILLION:

FIRM 1: EPS = EBIT / # OF SHARES = $100 MILLION / $100 MILLION = $1.00 PER SHARE

FIRM 2: EPS = EBIT / # OF SHARES = $100 MILLION / 10 MILLION = $10.00 PER SHARE

FIRM 2 HAS AN EPS THAT IS 10 TIMES THE EPS OF FIRM 1! THIS COMPARISON WILL NOT BE CORRECT SINCE BOTH FIRMS HAD THE SAME EARNINGS AND BOTH HAVE THE SAME MARKET VALUE!

LET’S NOW CALCULATE EARNINGS PER SHARE DEFLATED BY PRICE FOR EACH FIRM:

FIRM 1: EPSP = EPS / STOCK PRICE = $1.00 /$10.00 = $0.10

FIRM 2: EPSP = EPS / STOCK PRICE = $10.00 / $100.00 = $0.10

NOW IT MAKES SENSE THAT EPSP IS THE SAME FOR BOTH FIRMS!

Generate a new variable called eps for earnings per share for all firms and all quarters using the panel dataset. Use the variable ebit as a measure for earnings and sharesoutstanding as number of shares.

# Calculate eps
paneldata$eps <- paneldata$ebit / paneldata$sharesoutstanding

Generate a new variable called epsp earnings per share divided by stock price.
Winsorize this epsp variable and name this winsorized variable as epspw. Find the best way to do this winsorization (to the left and/or to the right)

#Calculate epsp
paneldata$epsp <- paneldata$eps / paneldata$originalhistoricalstockprice
# Winsorize epsp
paneldata$epsp_w <- winsorize(paneldata$epsp, probs = c(0.02,0.98))

## 0.88 % observations replaced at the bottom

## 0.88 % observations replaced at the top

Run the multiple regression model for all firms and for ONLY the last quarter (2019q4) to examine whether the Earnings per Share divided by price (winsorized) and book-to-market ratio (winsorized) have explanatory power for firm returns. INTERPRET your regression model. Make sure you interpret the partial betas as well as their level of significance.

# Calculate cc returns for all returns
paneldata$stockreturn <- diff(log(paneldata$adjustedstockprice))
lastq <- as.data.frame(paneldata[(paneldata$quarter=="2019-10-01"),])

# Construct the regression model: 
#   first write the dependent variable (y), then all explanatory variables
reg1 <- lm(stockreturn ~ epsp_w + bmr_w, data = lastq)
s_reg1 <- summary(reg1)
s_reg1

## 
## Call:
## lm(formula = stockreturn ~ epsp_w + bmr_w, data = lastq)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42027 -0.07916 -0.01485  0.07058  0.62443 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.05782    0.02709   2.134   0.0351 *
## epsp_w      -0.01946    0.14011  -0.139   0.8898  
## bmr_w       -0.03106    0.01915  -1.622   0.1077  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1584 on 110 degrees of freedom
##   (39 observations deleted due to missingness)
## Multiple R-squared:  0.02496,    Adjusted R-squared:  0.007232 
## F-statistic: 1.408 on 2 and 110 DF,  p-value: 0.249

INTERPRETATION:

USING QUARTERLY DATA FROM 2000 TO 2019 FOR ALL MEXICAN PUBLIC FIRMS, AFTER CONSIDERING THE EFFECT OF BOOK-TO-MARKET RATIO (BMR) ON STOCK RETURNS, THE EFFECT OF EARNINGS PER SHARE DEFLATED BY PRICE (EPSP) IS NEGATIVE, BUT IT IS NOT SIGNIFICANT. IN OTHER WORDS, THERE IS NO STATISTICAL EVIDENCE TO SAY THAT THERE IS A RELATIONSHIP BETWEEN BMR AND STOCK RETURNS.

AFTER CONSIDERING THE EFFECT OF EPSP, THE EFFECT OF BMR IS NEGATIVE AND IT IS MARGINALLY SIGNIFICANT (P-VALUE=0.10). IN OTHER WORDS, THERE IS MARGINAL STATISTICAL EVIDENCE ABOUT THE NEGATIVE RELATIONSHIP BETWEEN BMR AND STOCK RETURN. FOR EACH CHANGE IN +1 UNIT IN BMR, THE EXPECTED CHANGE IN STOCK QUARTERLY RETURN IS ABOUT -3.1%.

IN THIS CASE, THE INTERPRETATION OF THE BETA0 COEFFICIENT IS THAT WHEN EPS AND BMR ARE EQUAL TO ZERO, THE EXPECTED STOCK RETURN IS ABOUT 5.7%, AND IT IS SIGNIFICANTLY GREATER THAN ZERO. IN THIS CASE, THIS INTERPRETATION DOES NOT MAKE MUCH SENSE SINCE BMR THE ONLY WAY THAT BMR=0 IS THAT THE BOOK-VALUE IS EQUAL TO ZERO (THIS CAN ONLY HAPPEN WHEN A FIRM IS GOING BANKRUPT). THEN, IN THE CASE OF MULTIPLE REGRESSION, WE CAN USE BETA0 FOR PREDICTION, AND IN MANY CASES, THERE IS NO NEED TO PROVIDE A MEANINFUL INTERPRETATION OF BETA0.

Predict the stock return for a company with a bmrw=0.8, and a epspw=0.050. What is the expected stock return? Do this manually.

*THE VALUES OF THE REGRESSION COEFFICIENTS ARE STORED IN THE s_reg R OBJECT IN THE coefficient ATTRIBUTE. WE CAN ACCESS THESE COEFFICIENTS AS FOLLOWS:

# THE MATRIX WHERE THE BETA COEFFICIENTS AND THEIR STANDARD ERRORS, T-VALUES AND P-VALUES
#   ARE STORED IN s_reg1$coefficients:

s_reg1$coefficients

##                Estimate Std. Error    t value   Pr(>|t|)
## (Intercept)  0.05781582 0.02709340  2.1339446 0.03506885
## epsp_w      -0.01945593 0.14011492 -0.1388569 0.88981710
## bmr_w       -0.03105855 0.01914788 -1.6220356 0.10765855

# WE CAN ACCESS THE BETA COEFFICIENTS AS FOLLOWS.
# BETA1 COEFFICIENT IS LOCATED IN THE ROW #1 AND COLUMN #1 [1,1] OF THE MATRIX 
#   coefficients:
BETA0 = s_reg1$coefficients[1,1]

# BETA1 IS LOCATED IN THE ROW #2 AND COLUMN #1 [2,1] OF THE MATRIX coefficients:
BETA1 = s_reg1$coefficients[2,1]

# BETA2 IS LOCATED IN THE ROW #3 AND COLUMN #1 [3,1] OF THE MATRIX coefficients:
BETA2 = s_reg1$coefficients[3,1]

NOW THAT WE HAVE THE VALUES OF THE BETA COEFFICIENTS, I CAN MANUALLY ESTIMATE THE EXPECTED VALUE OF STOCK RETURN WHEN EPSP_W = 0.05 AND BMR_W = 0.80:

EXPECTED STOCK RETURN = BETA0 + BETA1 EPSP_W + BETA2BMR_W

SUBSTITUTING EPSP_W=0.05 AND BMRW_W=0.80:

EXPECTED STOCK RETURN = BETA0 + BETA1 0.05 + BETA20.80

WE CAN DO THIS IN R AS FOLLOWS:

stockreturn_prediction = BETA0 + BETA1*0.05 + BETA2 * 0.80
#The prediction for stock return when EPSP_w=0.05 and BMRW=0.80 IS:
stockreturn_prediction

## [1] 0.03199618

Now predict the result using the predict.lm() function, which will output the result of the prediction. If you set the interval argument to “confidence” you will get the 95% confidence interval. Did you get the same prediction than manually? Interpret the 95% confidence interval for the prediction.

I NEED TO CREATE A DATA FRAME WITH 2 VALUES FOR THE INDEPENDENT VARIABLES. IN CAN DO THIS AS FOLLOWS:

new_x <- data.frame(epsp_w=c(0.05), bmr_w=c(0.8))
# new_x will have to 0.05 for EPS_W and 0.80 for BMR_W:
new_x

##   epsp_w bmr_w
## 1   0.05   0.8

NOW THAT I CAN RUN THE predict.lm FUNCTION WITH THE new_x DATA FRAME TO GET THE PREDICTION FOR STOCK RETURN:

stockreturn_prediction = predict.lm(reg1, newdata = new_x)
stockreturn_prediction

##          1 
## 0.03199618

I GOT THE SAME PREDICTION THAN THE PREVIOUS MANUAL PREDICTION.

I CAN ALSO CALCULATE THE PREDICTION AND THE 95% CONFIDENCE INTERVAL OF THE PREDICTION AS FOLLOWS:

pr_reg1 <- predict.lm(reg1, new_x, interval = "confidence")
pr_reg1

##          fit          lwr        upr
## 1 0.03199618 -0.002305707 0.06629807

IN THE pr_reg1 OBJECT I HAVE TO PREDICTION (THE FIT VALUE) AND ALSO THE LOWER (lwr) AND THE UPPER (upr) VALUES OF THE 95% OF THE STOCK RETURN PREDICTION

I CAN JOIN THE SPECIFIC VALUES OF THE INDEPENDENT VARIABLES WITH THE PREDICTION AND THE 95% C.I. OF THE PREDICTION:

# Join both objects in order to have a better perception
pr_reg1.df <- cbind(new_x, pr_reg1)
pr_reg1.df

##   epsp_w bmr_w        fit          lwr        upr
## 1   0.05   0.8 0.03199618 -0.002305707 0.06629807

Run a multiple regression model for all firms and all quarters to examine whether the market return and the BMR provides explanation for firm returns. Estimate the following predictions:

Predict the firm return for a company with a BMR=0.8, holding the market return equal to its historical mean. Interpret the prediction and the 95% Confidence Interval.

# Construct the model using the whole paneldata object
reg2 <- lm(stockreturn ~ IPCrets + bmr_w, data = paneldata)
s_reg2 <- summary(reg2)
s_reg2

## 
## Call:
## lm(formula = stockreturn ~ IPCrets + bmr_w, data = paneldata)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.26380 -0.07630 -0.00448  0.07449  1.56551 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.032240   0.002972   10.85   <2e-16 ***
## IPCrets      0.830298   0.023419   35.45   <2e-16 ***
## bmr_w       -0.032087   0.002315  -13.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1675 on 7096 degrees of freedom
##   (5061 observations deleted due to missingness)
## Multiple R-squared:  0.1712, Adjusted R-squared:  0.171 
## F-statistic: 733.1 on 2 and 7096 DF,  p-value: < 2.2e-16

avg_market_return = mean(paneldata$IPCrets,na.rm=TRUE)
#The avg market return is:
avg_market_return

## [1] 0.02230853

new_x2 <- data.frame(bmr_w=0.8, IPCrets = avg_market_return)

# Now I do the same but including the 95% C.I. for the prediction:

pr_reg2 <- predict.lm(reg2, new_x2, interval = "confidence")
pr_reg2

##          fit       lwr        upr
## 1 0.02509361 0.0211449 0.02904231

pr_reg2.df <- cbind(new_x2, pr_reg2)
pr_reg2.df

##   bmr_w    IPCrets        fit       lwr        upr
## 1   0.8 0.02230853 0.02509361 0.0211449 0.02904231

Predict the firm returns for a company when the quarterly market is equal to -2%, -1%, 0%, +1%, and +2%.

Check the 95% CI of the predictions. Provide a brief INTERPRETATION of the output and the graph

I START CREATING A VECTOR WITH VALUES FOR THE DIFFERENT MARKET RETURNS:

IPCrets = seq(from=-0.02, to=0.02, by=0.01)
IPCrets

## [1] -0.02 -0.01  0.00  0.01  0.02

THE seq FUNCTION CREATES A SEQUENCE OF NUMBERS. IN THIS CASE, FROM -0.02 TO +0.02 JUMPING BY 0.01.

I CALCULATE THE MEDIAN OF BMR_W:

median_bmrw = median(paneldata$bmr_w,na.rm = TRUE)
median_bmrw

## [1] 0.6896754

I CALCULTED THE MEDIAN SINCE THE DISTRIBUTION OF BMRW IS NOT QUITE NORMAL. REMEMBER THAT THE BEST CENTRAL TENDENCY MEASURE OF A VARIABLE THAT DOES NOT FOLLOW A NORMAL DISTRIBUTION IS THE MEDIAN (THE 50 PERCENTILE)

NOW i CREATE A DATA FRAME WITH A COLUMN WITH THE DIFFERENT VALUES OF MARKET RETURN (FROM -2% TO +2%), AND WITH A CONSTANT VALUE FOR BMRW EQUAL TO ITS MEDIAN:

new_x2b <- data.frame(IPCrets, bmr_w=median_bmrw)
new_x2b

##   IPCrets     bmr_w
## 1   -0.02 0.6896754
## 2   -0.01 0.6896754
## 3    0.00 0.6896754
## 4    0.01 0.6896754
## 5    0.02 0.6896754

USING THIS DATA FRAME WITH VALUES FOR IPCrets AND BMRW, NOW I ESTIMATE THE STOCK RETURN PREDICTIONS AND THEIR CORRESPONDING 95% C.I. ACCORDING TO THE REGRESSION MODEL:

# I get the predictions according to the regression model stored in reg2:
pr_reg2b <- predict.lm(reg2, new_x2b, interval = "confidence")
# I dislpay the predictions for each combination of market return and bmrw: 
pr_reg2b

##            fit         lwr          upr
## 1 -0.006495126 -0.01091005 -0.002080198
## 2  0.001807850 -0.00244553  0.006061231
## 3  0.010110826  0.00597411  0.014247542
## 4  0.018413802  0.01434500  0.022482600
## 5  0.026716779  0.02266470  0.030768856

# I put togetyer the values for the independent variables and the predictions:
pr_reg2b.df <- cbind(new_x2b, pr_reg2b) 

# I change the name of the columns accordingly:
colnames(pr_reg2b.df) <- c("IPCrets", "bmr_w", "Stockreturn", "lwr", "upr")
pr_reg2b.df

##   IPCrets     bmr_w  Stockreturn         lwr          upr
## 1   -0.02 0.6896754 -0.006495126 -0.01091005 -0.002080198
## 2   -0.01 0.6896754  0.001807850 -0.00244553  0.006061231
## 3    0.00 0.6896754  0.010110826  0.00597411  0.014247542
## 4    0.01 0.6896754  0.018413802  0.01434500  0.022482600
## 5    0.02 0.6896754  0.026716779  0.02266470  0.030768856

I FINALLY PLOT THE PREDICTIONS OF STOCK RETURNS AND THEIR 95% C.I. FOR THE DIFFERENT VALUES OF MARKET RETURN FROM -2% TO +2%:

library(ggplot2)
ggplot(pr_reg2b.df, aes(x = IPCrets, y=Stockreturn))+
  geom_point(size = 2) + geom_line() +
  geom_errorbar(aes(ymax = upr, ymin=lwr))

I CAN DO THE SAME USING THE prediction LIBRARY:

library(prediction)
pr_reg3.df = prediction(reg2, at = list(IPCrets=seq(from=-0.02, to=0.02, by=0.01), bmr_w=median_bmrw))
pr_reg3.df

##  IPCrets  bmr_w         x
##    -0.02 0.6897 -0.006495
##    -0.01 0.6897  0.001808
##     0.00 0.6897  0.010111
##     0.01 0.6897  0.018414
##     0.02 0.6897  0.026717

I GOT THE SAME PREDICTION WITH THE prediction FUNCTION.

3 Multiple regression models with future values of DV

In the previous part we examined whether the market return and the BMR influence the stock return. We run the models using contemporary values of the variables. What does this mean? We examine whether the BMR and the market return of a period influence the stock return in the SAME period. In multiple regression models in R, we can also examine non-contemporaneous (lagged or future) effects of the independent variables. For example, we can examine whether the BMR and the market return influences future stock return (1 quarter or 4 quarters later). In Finance it is important to examine lag effects of independent variables on the dependent variable. An important reason is that financial statement releases of public firms usualy last for 1 or 2 months after the closing accounting period. For example, the financial statement of Q4 of 2020 can be release up to February or March 2021.

Let’s work with an example.

Which are the lagged variables we can use in R? We can use the lag function and the plm function. The plm stands for panel-data linear model.

lag(variable, #) refers to the LAG number # of the variable. Lag # 1 refers to the value of the variable 1 period ago.

Examples:

lag(bmrw,1) refers to the previous value of book-to-market ratio in the dataset. If the dataset has quarters, then it is the bmr value 1 quarter ago.

lag(bmr,4) refers to the previous value of book-to-market ratio ONE year ago if the dataset has quarterly data.

If you want to go forward (ahead) instead, you can still use the lag function, but you need to specifya negative #. For example:

lag(bmr,-4) refers to the future value of book-to-market ratio ONE year in the future if the dataset has quarterly data.

Using the same panel dataset we used in the previous part, do the following models:

Model 1. Design two regression models to examine whether the BMRw (winsorized) and the market return influence a) the future stock return one quarter later, and b) the future stock return one year later. Consider ONLY active firms. Run the model and briefly INTERPRET your results.

We will use the plm function from the plm package.

# Load the package
library(plm)
# Use the lag() function with -1 indicating to go forward 1 period
# Using negative numbers is like going reverse
model1 <- plm(lag(stockreturn, -1) ~ IPCrets + bmr_w, data=paneldata, model="pooling")
s_model1 <- summary(model1)
s_model1

## Pooling Model
## 
## Call:
## plm(formula = lag(stockreturn, -1) ~ IPCrets + bmr_w, data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3348414 -0.0774005 -0.0027878  0.0818477  1.4644899 
## 
## Coefficients:
##              Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept) 0.0017384  0.0032410  0.5364    0.5917    
## IPCrets     0.2748376  0.0253327 10.8491 < 2.2e-16 ***
## bmr_w       0.0114935  0.0025282  4.5461 5.557e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    237.61
## Residual Sum of Squares: 233.09
## R-Squared:      0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16

This model can also be constructed as:

model1a<-plm(stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata, model="pooling")
summary(model1a)

## Pooling Model
## 
## Call:
## plm(formula = stockreturn ~ lag(IPCrets) + lag(bmr_w), data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 146, T = 1-78, N = 7039
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3348414 -0.0774005 -0.0027878  0.0818477  1.4644899 
## 
## Coefficients:
##               Estimate Std. Error t-value  Pr(>|t|)    
## (Intercept)  0.0017384  0.0032410  0.5364    0.5917    
## lag(IPCrets) 0.2748376  0.0253327 10.8491 < 2.2e-16 ***
## lag(bmr_w)   0.0114935  0.0025282  4.5461 5.557e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    237.61
## Residual Sum of Squares: 233.09
## R-Squared:      0.019058
## Adj. R-Squared: 0.018779
## F-statistic: 68.3469 on 2 and 7036 DF, p-value: < 2.22e-16

INTERPRETATION:

CONSIDERING QUARTERLY DATA OF MEXICAN FIRMS FROM 2000 TO 2019, THE EFFECT OF THE MARKET RETURN ON FUTURE STOCK RETURN 1 QUARTER LATER IS POSITIVE AND SIGNIFICANT, AFTER CONSIDERING THE EFFECFT OF BOOK-TO-MARKET RATIO. FOR EACH +1% MOVEMENT IN THE MARKET RETURN IN ONE QUARTER, IT IS EXPECTED THAT THE MOVEMENT OF THE STOCK RETURN ONE QUARTER LATER WILL BE AROUND 0.27%.

AFTER CONSIDERING THE EFFECT OF THE MARKET RETURN ON FUTURE STOCK RETURN ONE QUARTER LATER, THE EFFECT OF BOOK-TO-MARKET RETURN ON FUTURE STOCK RETURN IS POSITIVE AND SIGNIFICANT. FOR EACH MOVEMENT IN +1 UNIT OF BMRW IN A QUARTER, IT IS EXPECTED THAT STOCK RETURN WILL MOVE ONE QUARTER LATER IN ABOUT 1.14%.

Model 2. Run the following model:

Run a multiple regression model to examine whether BMRw and EPSPw influence future stock returns one year later. INTERPRET the output of the model.

model2 <- plm(lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata, model="pooling")
s_model2 <- summary(model2)
s_model2

## Pooling Model
## 
## Call:
## plm(formula = lag(stockreturn, -4) ~ bmr_w + epsp_w, data = paneldata, 
##     model = "pooling")
## 
## Unbalanced Panel: n = 126, T = 1-76, N = 4756
## 
## Residuals:
##       Min.    1st Qu.     Median    3rd Qu.       Max. 
## -1.3128451 -0.0740470 -0.0028452  0.0789605  1.4889600 
## 
## Coefficients:
##               Estimate Std. Error t-value Pr(>|t|)  
## (Intercept)  0.0019973  0.0041118  0.4857  0.62717  
## bmr_w        0.0089931  0.0035967  2.5004  0.01244 *
## epsp_w      -0.0066547  0.0343372 -0.1938  0.84634  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Total Sum of Squares:    157.48
## Residual Sum of Squares: 157.26
## R-Squared:      0.0013965
## Adj. R-Squared: 0.00097633
## F-statistic: 3.3235 on 2 and 4753 DF, p-value: 0.03611

INTERPRETATION:

CONSIDERING THE EFFECT OF BMRW ON FUTURE STOCK RETURN ONE YEAR LATER, THE EFFECT OF EPSPW ON FUTURE STOCK RETURN ONE YEAR LATER IS NEGATIVE, BUT IT IS NOT SIGNIFICANT. IN OTHER WORDS, IT SEEMS THAT EARNINGS PER SHARE OF THE CURREN QUARTER MIGHT NOT BE SIGNIFICANTLY RELATED WITH FUTURE STOCK RETURNS 1 YEAR LATER.

CONSIDERING THE EFFECT OF EPSPW ON FUTURE STOCK RETURN ONE YEAR LATER, THE EFFECT OF BMRW ON FUTURE STOCK RETURNS ONE YEAR LATER IS POSITIVE AND SIGNIFICANT. FOR EACH +1 MOVEMENT IN THE BMRW, IT IS EXPECTED THAT STOCK RETURN ONE YEAR LATER WILL MOVE IN ABOUT +0.89%. THIS SOUNDS WIERD SINCE IT IS SUPPOSED THAT HIGH VALUES OF BMRW IS NOT A GOOD SIGN OF MARKET PERFORMANCE OF A STOCK, BUT IT MIGHT BE THAT FIRMS THAT HAVE LOW VALUES OF BMRW IN ONE QUARTER, THEY IMPLEMENT ACTIONS SO THAT IN 1 YEAR, THEY SIGNIFICANTLY IMPROVE THEIR PERFORMANCE. THIS IS JUST A GUESS OF POSSIBLE EXPLANATIONS OF THIS SIGNIFICANT RELATIONSHIP.

Using predict.lm() or prediction() functions, predict the stock return of firms when BMRw moves from 0.6 to 1.6, moving by 0.10. Keep EPSPw constant (with no change). Run a plot of these predictions. Interpret the graph.

library(prediction)

newx_model2 <- data.frame(bmr_w = seq(from=0.6, to=1.6, by=0.1), epsp_w=mean(paneldata$epsp_w, na.rm=TRUE))
#pr1_model2_1 <- predict.lm(model2, newx_model2, interval = "confidence")

pr1_model2 <- prediction_summary(model=model2, at=newx_model2,level=0.95)
colnames(pr1_model2) <- c("bmr_w","epsp_w", "Predicted_return")
var_b0 <- s_model2$coefficients[1,2]^2
var_b1 <- s_model2$coefficients[2,2]^2
var_b2 <- s_model2$coefficients[3,2]^2
cov_coeff <- cov(matrix(c(s_model2$coefficients[1,1], s_model2$coefficients[2,1],
s_model2$coefficients[3,1])))

pr1_model2$SE <- sqrt(var_b0 + pr1_model2$bmr_w^2*var_b1 +
pr1_model2$epsp_w^2*var_b2 + 2*cov_coeff)

## Warning in var_b0 + pr1_model2$bmr_w^2 * var_b1 + pr1_model2$epsp_w^2 * : Recycling array of length 1 in vector-array arithmetic is deprecated.
##   Use c() or as.vector() instead.

pr1_model2$lwr <- pr1_model2$Predicted_return - 2*pr1_model2$SE
pr1_model2$upr <- pr1_model2$Predicted_return + 2*pr1_model2$SE

pr1_model2

##  bmr_w  epsp_w Predicted_return NA NA NA NA NA      SE      lwr     upr
##    0.6 0.06349         0.006971 NA NA NA NA NA 0.01221 -0.01746 0.03140
##    0.7 0.06349         0.007870 NA NA NA NA NA 0.01228 -0.01670 0.03244
##    0.8 0.06349         0.008769 NA NA NA NA NA 0.01236 -0.01596 0.03349
##    0.9 0.06349         0.009669 NA NA NA NA NA 0.01245 -0.01523 0.03457
##    1.0 0.06349         0.010568 NA NA NA NA NA 0.01255 -0.01453 0.03567
##    1.1 0.06349         0.011467 NA NA NA NA NA 0.01266 -0.01385 0.03678
##    1.2 0.06349         0.012366 NA NA NA NA NA 0.01277 -0.01318 0.03791
##    1.3 0.06349         0.013266 NA NA NA NA NA 0.01290 -0.01253 0.03907
##    1.4 0.06349         0.014165 NA NA NA NA NA 0.01303 -0.01190 0.04023
##    1.5 0.06349         0.015064 NA NA NA NA NA 0.01318 -0.01129 0.04142
##    1.6 0.06349         0.015964 NA NA NA NA NA 0.01333 -0.01069 0.04262

ggplot(pr1_model2, aes(x = bmr_w, y=Predicted_return))+
  geom_point(size = 2) + geom_line() +
  geom_errorbar(aes(ymax = upr, ymin=lwr))

Workshop 7 Solution, Financial Econometrics I

Alberto Dorantes, Ph.D.

Oct 6, 2021

1 General directions for this Workshop

2 Predictions with multiple regression models

2.1 Data collection and management

2.2 Data calculations and data cleanning

3 Multiple regression models with future values of DV