Abstract
In this workshop we will learn useful data management concepts and techniques, and also we will start with the introduction to Multiple Regression Models applied to Financial analysis.You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.
At the beginning of the R Notebook write Workshop 6 - Financial Econometrics I and your name (as we did in previous workshop).
You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.
It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.
Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.
# To clear our environment we use the remove function rm:
rm(list=ls())
# To avoid scientific notation for numbers:
options(scipen=999)
Before we continue to learn more advanced econometric models, it is very important to emphasize the importance of data management in Finance. If we do not have a cleaned and well structured dataset, then our econometric models might not provide the accurate insights we are looking for. Real data sets in Finance are usually not quite cleaned and structured. No matter the quality of econometric models, if the input data is not cleaned or reliable, then the output will not be reliable either. Remember the popular say: garbage in, garbage out.
We will start learning the traditional dataset structures, and then the basics of data manipulation and data cleaning. At the end we will start learning about multiple regression models.
In time-series econometrics there are basically the following dataset structures or types:
## p_stock1 r_stock1 p_stock2 r_stock2
## 2014m1 10 0.02 20 0.01
## 2014m2 11 0.10 21 0.05
## Ticker ROA ATO ProfitMargin
## 1 ALFAA.MX 0.023 0.9 0.15
## 2 AMXL.MX 0.015 1.1 0.10
## Ticker quarter ROA ATO ProfitMargin
## 1 ALFAA.MX 2014q1 0.2 1.1 0.17
## 2 ALFAA.MX 2014q2 0.21 1.2 0.16
## 3 ... ... ... ... ...
## 4 BIMBOA.MX 2014q1 0.15 0.8 0.10
## 5 BIMBOA.MX 2014q2 0.20 0.9 0.05
## 6 ... ... ... ... ...
## 7 BIMBOA.MX 2017q1 0.15 1.1 0.09
In Econometrics, data management skills are very important. Most of the time, before designing and running an econometric model, it is needed to do simple and sophisticated data management. It is very common to merge data sets before we start our econometric analysis.
In this example we will learn how to collapse a dataset and merge one dataset with another considering the data structures we learned in the previous section.
In this case, you have to create a dataset with quarterly information for the IPyC market index using monthly data. Then, you have to merge this dataset with a panel-dataset of financial information for many Mexican stocks. In more detail, you have to do the following:
Download the dataset: http://www.apradie.com/datos/datamx2020q4.xlsx using the readxl package (you have to install this package). This dataset has real quarterly financial data of Mexican rms (from the BMV) for many years. This information comes from Economatica, a leading software and database economic and financial information. This dataset is a panel dataset, which is a combination of cross sectional and time-series dataset. Navigate through the data (using View() function) to learn more about this dataset. Save this dataset with a name.
# Load the package
library(readxl)
# Download the excel file from a web site:
download.file("http://www.apradie.com/datos/datamx2020q4.xlsx",
"dataw6.xlsx", mode="wb")
# The first parameter is the link and the second is a name for the
# local file
# Use the function read_excel()
<- read_excel("dataw6.xlsx") dataset
We need to attach the market return to my dataset. Then, I need to:
Download the monthly market index from Yahoo Finance (getsymbols)
Transform (Collapse) the dataset from monthly to quarterly
Now download the ipyc:
library(quantmod)
getSymbols("^MXX", from="2000-01-01", to= "2019-12-31",
periodicity="monthly", src="yahoo")
## [1] "^MXX"
We need to aggregate (collapse) the dataset from monthly to quarterly, keeping the LAST market index per quarter. We can use the to.quarterly function from quantmod:
<- to.quarterly(MXX,indexAt='startof') QMXX
This function creates an OHLC xts dataset for each quarter. We can have a look of the content:
head(QMXX)
## MXX.Open MXX.High MXX.Low MXX.Close MXX.Volume MXX.Adjusted
## 2000-01-01 7185.71 8417.33 6510.84 7473.25 0 7473.25
## 2000-04-01 7459.75 7639.38 5516.77 6948.33 0 6948.33
## 2000-07-01 6953.73 7548.57 6171.95 6334.56 0 6334.56
## 2000-10-01 6334.64 6502.60 5148.02 5652.19 0 5652.19
## 2001-01-01 5651.35 6615.83 5512.77 5727.89 0 5727.89
## 2001-04-01 5733.67 6895.70 5541.68 6666.17 4888424600 6666.17
We see that the Open, High, Low, Close and Adjusted prices for each quarter were calculated. We only need the Adjusted price to calculate the market returns, so we select only the Close column:
= QMXX$MXX.Adjusted
QMXX # Change the name of the column:
colnames(QMXX) <- "MXXindex"
Now we can calculate cc return of the market
$MXXreturns <- diff(log(QMXX)) QMXX
Now we are almost ready to merge this quarterly data to the panel dataset using merge.
We need to have both datasets with a common column. In this case, the common column is quarter. Both datasets must be of the same class, in this case, data frame class.
The QMXX has the quarter as index, but not as part of a column. We create a data frame with the quarter column equal to its index:
# Create a dataframe with a quarter column for the QMXX
<-data.frame(quarter=index(QMXX),coredata(QMXX))
QMXX.df# I extracted the quarter from the index
# coredata is a function that extract the data of an object
# without extracting formatting
The common column must be of the same type, so we make sure that the column quarter of dataset is a Date type:
$quarter<-as.Date(dataset$quarter) dataset
Now we do a MANY-TO-ONE merge of both datasets:
<-merge(dataset,QMXX.df,by="quarter")
dataset# Now I have a dataset for all firms-quarters, and the
# MXX index and MXX return is merged for each firm
Now we define the dataset as a panel data using the pdata.frame function. You need to install the package “plm”.
library(plm)
<- pdata.frame(dataset, index= c("firmcode","quarter")) datasetp
I need to tell R that the dataset is panel data with the structure firm-quarter.
Before running multiple regressions with this panel dataset I need some data cleanning.
When one or more independent variables of a regression model have very extreme values, the regression coefficients are usually not reliable. The extreme values of a variable are called outliers. The OLS method used for regression is very sensible for outliers. In other words, when you have outliers in an independent variable, your regression coefficients can be biased and not reliable.
In this section we continue using the same data set. You have to do the following:
Learn about the Book-to-market ratio (BMR). As comments, EXPLAIN what the BMR is.
Calculate the BMR for all firm-quarters of the dataset. Consider book value of equity as total assets minus total liabilities. Calculate market value of the fir according to the stock price and the number of shares outstanding:
$bookvalue= datasetp$totalassets - datasetp$totalliabilities
datasetp$marketvalue = datasetp$originalhistoricalstockprice*datasetp$sharesoutstanding datasetp
Check that I used original stock price, not adjusted stock price. In financial markets, the adjusted stock prices are calculated after considering dividend payments and stock splits. A stock split is when a firm decides to divide the value of its stock price by 2, 3 or other multiple with the only purpose to avoid the perception that the stock is expensive. For example, late August 2020 Apple and Tesla decided to do stock split. Apple did a split on a 4-for-1 basis. This means that if the stock price was about USD$400.00 on that day, then its price was reduced to USD$100.00, but they multiplied the number of shares (shares outstanding) by 4 to keep the same market value of the firm. In this historical dataset the shares outstanding is the historical, so we need to use the historical/original stock price without considering stock splits nor dividend payments.
We calculated book value and market value for ALL rms and ALL quarters! Now we can generate the book-to-market ratio for all firms all quarters:
$bmr= datasetp$bookvalue / datasetp$marketvalue datasetp
If bmr=1 means that the firm book value is about the same as firm market value. If that is the case, then the market value has not grown beyond book-value, meaning that the firm has not created value beyond its book value.
If bmr>1 means that the market value is less than book value. So, if bmr>1 means that the firm has significantly lost shareholder’s wealth, and it might incur in bankrupt risk.
Then, what would be the bmr level that all firms are looking for? One of the main purposes of the executives is to MAXIMIZE shareholder’s value. The way to increase shareholder’s value is to increase its market value, and the only way to increase market value is to increase stock price.
Then, the bmr level that all executives prefer is a value much less than 1.
If bmr=0.5 means that the firm market value is the double of its book value. In this case, the main difference between market value and book value is the wealth that the firm has created thanks to its valuable intangible assets such as prestige, high quality, and innovation.
Then, what do you think it might be the relationship between bmr and stock return? Intuitively, we might think that a small value of bmr is a good news, then the stock return might be related in a negative way. If bmr goes down (good news), then the stock return might go up. Then, it might be expected that the relationship between bmr and stock return is linear and negative. Some finance research (like Fama & French, 1995), mainly with US firms has found that the relationship between bmr and future stock return is negative.
However, there are mixed findings about the relationship between bmr and stock returns. Some research (like Piotrosky, 2000) has found that firms with high bmr, but strong financials (like earnings per share) usually provides significant positive stock returns. In this special case, bmr has found to be positively related to stock returns.
Finance research has found that bmr influences earnings per share, which in turn influences current and future stock returns. It is some times that firms with low bmr do not experience significant high stock returns due to the high expectations of investors. This is an interesting finding in finance that is still being researched!
In the following models we will see what happens for Mexican firms.
<-datasetp[datasetp$status=="active",] datasetpa
As you see, we can do sub-setting or selection of rows in a dataset indicating a condition as the row index, and putting nothing in the column index (after the coma)
We do this since the cancelled firms usually have non reliable information and very extreme values.
hist(datasetpa$bmr, main="Histogram of the bmr data",
xlab="bmr", col="dark blue")
boxplot(datasetpa$bmr)
<- boxplot.stats(datasetpa$bmr)$out outlier_values
library(robustHD)
<- winsorize(na.omit(datasetpa$bmr), standarized= "True", prob= 0.98, return= "data") winsbmr
What is winsorization?
Winsorizing is the process where the non-missing values of a variable (x) are ordered
x_1 <= ... <= x_n
and generate a new variable y identical to x except that the h highest and h lowest values
are replaced by the next value counting inwards from the extremes:
y_1, ... , y_h = y_(h + 1)
y_n, ... , y_(n - h + 1) = y_(n - h)
This process is applied to variables with extreme values.
For high values= upper percentile is calculated and that value is assigned to the top values located in a percentile less than the assigned.
For low values= a lower percentile is calculated and that value is assigned to the bottom values that are in a percentile lower than the assigned value.
In this case we winsorized bmr for high and low extreme values.
Now we look at a cleaner bmr data, we can see this graphically making a histogram of the new winsorized data:
hist(winsbmr, main="Histogram of the winsorized bmr data",
xlab="bmr", col="dark blue")
As expected, most of the BMR values are less than 1 for active firms in the BMR, indicating that market value is greater than book value for most of the active firms most of the time.
Once you winsorize BMR, you can safely use it in a regression model.
Using the same dataset we constructed in the previous sections, design the following regression models:
Model 1. A market model for all quarters for ALFAA. Remember the market model is a simple regression model where you use the market returns as the independent variable, and the rm return as the dependent variable. Use the if conditional with the regress command. For example:
#First, I get all of the cc stock returns
$stockreturn <- log(datasetpa$adjustedstockprice
datasetpa/ (lag(datasetpa$adjustedstockprice, 1)))
# Now I do a new object with ALFAA information to make the following
# regression models
<-as.data.frame(datasetpa[(datasetpa$firm=="ALFAA"),])
ALFAAdata# AALFA_ret <- AALFA_info$`Stock return`
#I set the stock returns as the dependent variable, and the market returns as independent:
<- lm(ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns)
reg1 <- summary(reg1)
s_reg1 s_reg1
##
## Call:
## lm(formula = ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32800 -0.07931 -0.00343 0.10590 0.39669
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01143 0.01673 -0.683 0.496
## ALFAAdata$MXXreturns 1.62667 0.17725 9.177 0.0000000000000546 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1445 on 77 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5224, Adjusted R-squared: 0.5162
## F-statistic: 84.22 on 1 and 77 DF, p-value: 0.0000000000000546
Calculate the confidence intervals:
# To do a rough estimate of the 95% confidence interval for B0:
<- s_reg1$coefficients[1,1]
B0 <- s_reg1$coefficients[1,2]
STDERRORB0 <- B0 - (2* STDERRORB0 )
minB0 <- B0 + (2* STDERRORB0 )
maxB0
cat("The B0 confidence interval goes from about ", minB0, "to", maxB0)
## The B0 confidence interval goes from about -0.04488987 to 0.02202375
# Instead of using the critical value of t=2, I can use the exact t value that
# gives me the 95% confidence intervale:
<- abs(qt(0.025,reg1$df.residual))
t_critical_value t_critical_value
## [1] 1.991254
# Note that the function qt gets the t critical value so that the area under the
# function t from -infinite to the t critical value is equal to 0.025.
# This t-critical value depends on the degrees of freedom, which is equal to
# the number of observations of the regression minus the number of coefficients
# estimated in the regression. This t-critical value will always be close to 2.
# I indicate 0.025 since the confidence interval is 95%, so I have 2.5% in the
# left, and 2.5% to the right. I specify the degrees of freedom calculated in the
# previous model.
# To do an exact estimate of the 95% confidence interval for B0:
<- B0 - (t_critical_value* STDERRORB0 )
minB0 <- B0 + (t_critical_value* STDERRORB0 )
maxB0
cat("The exact B0 confidence interval goes from", minB0, "to", maxB0)
## The exact B0 confidence interval goes from -0.04474357 to 0.02187745
# To estimate the 95% confidence interval for B1:
<- s_reg1$coefficients[2,1]
B1<- s_reg1$coefficients[2,2]
STDERRORB1 <- B1 - (t_critical_value* STDERRORB1 )
minB1 <- B1 + (t_critical_value* STDERRORB1 )
maxB1
cat("The B1 confidence interval goes from", minB1, "to", maxB1)
## The B1 confidence interval goes from 1.273728 to 1.979622
This is a market model for Alfa.
The dependent variable is Alfa quarterly returns, and the independent variable is the market returns. We are using Alfa and market returns from quarter 1 2000 to quarter 4 2019.
What can we say about the market risk of Alfa? its beta1 is 1.6266749, and its 95% confidence interval goes from 1.2737282 to 1.9796216. Then, we can say that 95% of the cases, its beta1 will be bigger than 1. In other words, we can say that there is enough evidence to say that Alfa is significantly riskier than the market, considering Alfa history from 2000 up to 2019.
In other words, we cannot say that Alfa is offering returns systematically below the market returns.
Model 2. Besides the return of the market, there are many variables that affect firm returns. According to finance research in developed countries, book-market ratio is an important variable that is systematically related to rm returns.
Using the same market model for ALFAA, now you have to include the BMR as another explanatory (independent) variable in your market model.
Run this model and INTERPRET the new regression coefficients. Remember that for multiple regression models, the beta coefficients are actually partial coefficients that measure the marginal effect of the independent variable after considering the effect of the other independent variable(s).
<- lm(ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns +
reg2 $bmr)
ALFAAdata
<- summary(reg2)
s_reg2 s_reg2
##
## Call:
## lm(formula = ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns + ALFAAdata$bmr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.30797 -0.09343 -0.01791 0.09929 0.42392
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02069 0.02666 0.776 0.440
## ALFAAdata$MXXreturns 1.58092 0.17819 8.872 0.000000000000235 ***
## ALFAAdata$bmr -0.02579 0.01676 -1.539 0.128
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1432 on 76 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.5368, Adjusted R-squared: 0.5246
## F-statistic: 44.04 on 2 and 76 DF, p-value: 0.0000000000001985
This is a multiple regression model to explain Alfa returns with the following explanatory variables: a) market returns and b) Alfa’s book-to-market ratio. We are using all quarters that Alfa has been in the Mexican market (from q1 2000 to 2019q4).
Now we have 3 beta coefficients: beta1 (B1), the sensitivity of stock returns related to changes in market returns, beta2 (B2), the sensitivity of stock returns to changes in book-to-markEt ratio, and B0, the expected value of stock return when both independent variables are zero. In this case, B0 does not take the same meaning as in the case of the CAPM since now we included another explanatory variable.
Regarding the effect of the market: I will calculate the 95% confidence interval:
<- s_reg2$coefficients[2,1]
B1<- s_reg2$coefficients[2,2]
STDERRORB1<- B1 - (t_critical_value* STDERRORB1 )
minB1 <- B1 + (t_critical_value* STDERRORB1 )
maxB1
cat("The confidence interval of beta1 goes from", minB1, "to", maxB1)
## The confidence interval of beta1 goes from 1.226091 to 1.935743
After considering the effect of bmr on Alfa quarterly returns, the effect of market returns is still significant and positive since its 95% C.I. goes from 1.2260915 to 1.9357427.
On average, considering no change in book-to-market ratio, when the market quarterly return increases in 1%, Alfa quarterly returns increases in about 1.5809171. This increase can go from 1.2260915 to 1.9357427, so it is significantly bigger than 1. Then, we can say that Alfa stock is significantly riskier than the market after considering the effect of book-to-market ratio.
Regarding the effect of BMR:
- After considering the effect of the market, the effect of BMR on
stock quarterly returns of ALFA is negative BUT NOT SIGNIFICANT.
I will calculate the 95% confidence interval of B2, the beta coefficient of BMR:
<- s_reg2$coefficients[3,1]
B2<- s_reg2$coefficients[3,2]
STDERRORB2<- B2 - (t_critical_value* STDERRORB2 )
minB2 <- B2 + (t_critical_value* STDERRORB2 )
maxB2
cat("The 95%C.I. of beta2 goes from", minB2, "to", maxB2)
## The 95%C.I. of beta2 goes from -0.05915722 to 0.007578538
We can calculate the 90% C.I. instead of the 95% C.I. to see whether at 90% confidence we can say that B2 is negative:
<- s_reg2$coefficients[3,1]
B2<- s_reg2$coefficients[3,2]
STDERRORB2# I calculate the critical t value to get the 90% C.I.
<- abs(qt(0.050,reg2$df.residual))
t_critical_value90
<- B2 - (t_critical_value90* STDERRORB2 )
minB2 <- B2 + (t_critical_value90* STDERRORB2 )
maxB2
cat("The 90%C.I. of beta2 goes from", minB2, "to", maxB2)
## The 90%C.I. of beta2 goes from -0.05369264 to 0.002113959
The 90%C.I. ends on a positive value, so I cannot say that beta2 is significantly negative at the 90% C.I. It is expected that for most stocks this beta2 should be significantly negative since BMR and stock returns usually have a negative relationship.
If we calculate 1-pvalue of beta2, we get the probability that beta2 will be negative. In this case, (1-pvalue)= (1--1.5389992) = 0.9832428. Then, we can say that beta2 is negative with this % of confidence.
Regarding the constant (b0):
I fist calculate b0 and its 95% confidence intervale:
<- s_reg2$coefficients[1,1]
B0 <- s_reg2$coefficients[1,2]
STDERRORB0 <- B0 - (2* STDERRORB0 )
minB0 <- B0 + (2* STDERRORB0 ) maxB0
When the market return is 0 AND BMR is also 0, the expected stock return of Alfa is 0.0206917%. In this case, it is difficult that BMR can be zero. One possibility is that the firm goes to bankrupcy when the book value is close to zero. Another option is that the market value is much bigger than book value, so bmr diminishes to values about 0.1 or less. We can infer that the less the BMR (if market value increases much faster than book-value), the higher will be the expected return of Alfa when the market return is zero.
IT IS ALSO IMPORTANT TO NOTE THAT the beta coefficients are PARTIAL coefficients since its value is a measure of HOW MUCH the independent variable (like BMR) is explaining the dependent variable (stock return) AFTER considering the effect of the the other(s) independent variable(s).
As we can see, the results of this model partially confirms finance research that say that bmr and stock returns are negatively related (the significance of beta2 of bmr was not quite significant). This regression is using time-series structure since we are ONLY using the rows (periods) for ALFA company.
Go to Canvas and respond Quiz 7. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.
The grade of this Workshop will be the following:
Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.