• 1 General directions for this Workshop
  • 2 Data management and Introduction to Multiple Regression
  • 3 Data management
    • 3.1 Data set structures
    • 3.2 Introduction to Data management
    • 3.3 Data cleaning - Extreme values
  • 4 Simple and Multiple regression models
  • 5 Quiz 7 and W7 submission

1 General directions for this Workshop

You will work in RStudio. Create an R Notebook document to write whatever is asked in this workshop.

At the beginning of the R Notebook write Workshop 6 - Financial Econometrics I and your name (as we did in previous workshop).

You have to replicate all the steps explained in this workshop, and ALSO you have to do whatever is asked. Any QUESTION or any STEP you need to do will be written in CAPITAL LETTERS. For ANY QUESTION, you have to RESPOND IN CAPITAL LETTERS right after the question.

It is STRONGLY RECOMMENDED that you write your OWN NOTES as if this were your notebook. Your own workshop/notebook will be very helpful for your further study.

Keep saving your .Rmd file, and ONLY SUBMIT the .html version of your .Rmd file.

# To clear our environment we use the remove function rm:
rm(list=ls())
# To avoid scientific notation for numbers: 
options(scipen=999)

2 Data management and Introduction to Multiple Regression

Before we continue to learn more advanced econometric models, it is very important to emphasize the importance of data management in Finance. If we do not have a cleaned and well structured dataset, then our econometric models might not provide the accurate insights we are looking for. Real data sets in Finance are usually not quite cleaned and structured. No matter the quality of econometric models, if the input data is not cleaned or reliable, then the output will not be reliable either. Remember the popular say: garbage in, garbage out.

We will start learning the traditional dataset structures, and then the basics of data manipulation and data cleaning. At the end we will start learning about multiple regression models.

3 Data management

3.1 Data set structures

In time-series econometrics there are basically the following dataset structures or types:

  1. Time-series: in this structure you can have many periods, and information for one “subject” (subject can be company, index, industry, etc). Also, you can have more than one subject, but the information is placed as new column for each subject. For example: the dataset created after running getsymbols:
##        p_stock1 r_stock1 p_stock2 r_stock2
## 2014m1       10     0.02       20     0.01
## 2014m2       11     0.10       21     0.05
  1. Cross-sectional structure: in this structure, you usually have many “subjects”, for ONLY ONE period For example, key performance indicators of Mexican companies for the last quarter of 2016:
##     Ticker   ROA ATO ProfitMargin
## 1 ALFAA.MX 0.023 0.9         0.15
## 2  AMXL.MX 0.015 1.1         0.10
  1. Panel-data structure: it is combination of time-series and cross-sectional. In this structure, we have more than one subject, and for each subject we can have more than one period. Example:
##      Ticker quarter  ROA ATO ProfitMargin
## 1  ALFAA.MX  2014q1  0.2 1.1         0.17
## 2  ALFAA.MX  2014q2 0.21 1.2         0.16
## 3       ...     ...  ... ...          ...
## 4 BIMBOA.MX  2014q1 0.15 0.8         0.10
## 5 BIMBOA.MX  2014q2 0.20 0.9         0.05
## 6       ...     ...  ... ...          ...
## 7 BIMBOA.MX  2017q1 0.15 1.1         0.09

3.2 Introduction to Data management

In Econometrics, data management skills are very important. Most of the time, before designing and running an econometric model, it is needed to do simple and sophisticated data management. It is very common to merge data sets before we start our econometric analysis.

In this example we will learn how to collapse a dataset and merge one dataset with another considering the data structures we learned in the previous section.

In this case, you have to create a dataset with quarterly information for the IPyC market index using monthly data. Then, you have to merge this dataset with a panel-dataset of financial information for many Mexican stocks. In more detail, you have to do the following:

Download the dataset: http://www.apradie.com/datos/datamx2020q4.xlsx using the readxl package (you have to install this package). This dataset has real quarterly financial data of Mexican rms (from the BMV) for many years. This information comes from Economatica, a leading software and database economic and financial information. This dataset is a panel dataset, which is a combination of cross sectional and time-series dataset. Navigate through the data (using View() function) to learn more about this dataset. Save this dataset with a name.

# Load the package
library(readxl)
# Download the excel file from a web site:
download.file("http://www.apradie.com/datos/datamx2020q4.xlsx",
              "dataw6.xlsx", mode="wb")
# The first parameter is the link and the second is a name for the
#  local file

# Use the function read_excel()
dataset <- read_excel("dataw6.xlsx")

We need to attach the market return to my dataset. Then, I need to:

  1. Download the monthly market index from Yahoo Finance (getsymbols)

  2. Transform (Collapse) the dataset from monthly to quarterly

Now download the ipyc:

library(quantmod)
getSymbols("^MXX", from="2000-01-01", to= "2019-12-31",
            periodicity="monthly", src="yahoo")
## [1] "^MXX"

We need to aggregate (collapse) the dataset from monthly to quarterly, keeping the LAST market index per quarter. We can use the to.quarterly function from quantmod:

QMXX <- to.quarterly(MXX,indexAt='startof')

This function creates an OHLC xts dataset for each quarter. We can have a look of the content:

head(QMXX)
##            MXX.Open MXX.High MXX.Low MXX.Close MXX.Volume MXX.Adjusted
## 2000-01-01  7185.71  8417.33 6510.84   7473.25          0      7473.25
## 2000-04-01  7459.75  7639.38 5516.77   6948.33          0      6948.33
## 2000-07-01  6953.73  7548.57 6171.95   6334.56          0      6334.56
## 2000-10-01  6334.64  6502.60 5148.02   5652.19          0      5652.19
## 2001-01-01  5651.35  6615.83 5512.77   5727.89          0      5727.89
## 2001-04-01  5733.67  6895.70 5541.68   6666.17 4888424600      6666.17

We see that the Open, High, Low, Close and Adjusted prices for each quarter were calculated. We only need the Adjusted price to calculate the market returns, so we select only the Close column:

QMXX = QMXX$MXX.Adjusted
# Change the name of the column:
colnames(QMXX) <- "MXXindex"

Now we can calculate cc return of the market

QMXX$MXXreturns <- diff(log(QMXX)) 

Now we are almost ready to merge this quarterly data to the panel dataset using merge.

We need to have both datasets with a common column. In this case, the common column is quarter. Both datasets must be of the same class, in this case, data frame class.

The QMXX has the quarter as index, but not as part of a column. We create a data frame with the quarter column equal to its index:

# Create a dataframe with a quarter column for the QMXX
QMXX.df<-data.frame(quarter=index(QMXX),coredata(QMXX))
# I extracted the quarter from the index
# coredata is a function that extract the data of an object
#   without extracting formatting

The common column must be of the same type, so we make sure that the column quarter of dataset is a Date type:

dataset$quarter<-as.Date(dataset$quarter)

Now we do a MANY-TO-ONE merge of both datasets:

dataset<-merge(dataset,QMXX.df,by="quarter")
# Now I have a dataset for all firms-quarters, and the
# MXX index and MXX return is merged for each firm

Now we define the dataset as a panel data using the pdata.frame function. You need to install the package “plm”.

library(plm)
datasetp <- pdata.frame(dataset, index= c("firmcode","quarter"))

I need to tell R that the dataset is panel data with the structure firm-quarter.

Before running multiple regressions with this panel dataset I need some data cleanning.

3.3 Data cleaning - Extreme values

When one or more independent variables of a regression model have very extreme values, the regression coefficients are usually not reliable. The extreme values of a variable are called outliers. The OLS method used for regression is very sensible for outliers. In other words, when you have outliers in an independent variable, your regression coefficients can be biased and not reliable.

In this section we continue using the same data set. You have to do the following:

Learn about the Book-to-market ratio (BMR). As comments, EXPLAIN what the BMR is.

Calculate the BMR for all firm-quarters of the dataset. Consider book value of equity as total assets minus total liabilities. Calculate market value of the fir according to the stock price and the number of shares outstanding:

datasetp$bookvalue= datasetp$totalassets - datasetp$totalliabilities
datasetp$marketvalue =  datasetp$originalhistoricalstockprice*datasetp$sharesoutstanding

Check that I used original stock price, not adjusted stock price. In financial markets, the adjusted stock prices are calculated after considering dividend payments and stock splits. A stock split is when a firm decides to divide the value of its stock price by 2, 3 or other multiple with the only purpose to avoid the perception that the stock is expensive. For example, late August 2020 Apple and Tesla decided to do stock split. Apple did a split on a 4-for-1 basis. This means that if the stock price was about USD$400.00 on that day, then its price was reduced to USD$100.00, but they multiplied the number of shares (shares outstanding) by 4 to keep the same market value of the firm. In this historical dataset the shares outstanding is the historical, so we need to use the historical/original stock price without considering stock splits nor dividend payments.

We calculated book value and market value for ALL rms and ALL quarters! Now we can generate the book-to-market ratio for all firms all quarters:

datasetp$bmr= datasetp$bookvalue / datasetp$marketvalue

If bmr=1 means that the firm book value is about the same as firm market value. If that is the case, then the market value has not grown beyond book-value, meaning that the firm has not created value beyond its book value.

If bmr>1 means that the market value is less than book value. So, if bmr>1 means that the firm has significantly lost shareholder’s wealth, and it might incur in bankrupt risk.

Then, what would be the bmr level that all firms are looking for? One of the main purposes of the executives is to MAXIMIZE shareholder’s value. The way to increase shareholder’s value is to increase its market value, and the only way to increase market value is to increase stock price.

Then, the bmr level that all executives prefer is a value much less than 1.

If bmr=0.5 means that the firm market value is the double of its book value. In this case, the main difference between market value and book value is the wealth that the firm has created thanks to its valuable intangible assets such as prestige, high quality, and innovation.

Then, what do you think it might be the relationship between bmr and stock return? Intuitively, we might think that a small value of bmr is a good news, then the stock return might be related in a negative way. If bmr goes down (good news), then the stock return might go up. Then, it might be expected that the relationship between bmr and stock return is linear and negative. Some finance research (like Fama & French, 1995), mainly with US firms has found that the relationship between bmr and future stock return is negative.

However, there are mixed findings about the relationship between bmr and stock returns. Some research (like Piotrosky, 2000) has found that firms with high bmr, but strong financials (like earnings per share) usually provides significant positive stock returns. In this special case, bmr has found to be positively related to stock returns.

Finance research has found that bmr influences earnings per share, which in turn influences current and future stock returns. It is some times that firms with low bmr do not experience significant high stock returns due to the high expectations of investors. This is an interesting finding in finance that is still being researched!

In the following models we will see what happens for Mexican firms.

  1. Keep ONLY active firms in the dataset. You can do this by typing:
datasetpa<-datasetp[datasetp$status=="active",]

As you see, we can do sub-setting or selection of rows in a dataset indicating a condition as the row index, and putting nothing in the column index (after the coma)

We do this since the cancelled firms usually have non reliable information and very extreme values.

  1. Run the histogram command to see the distribution of bmr.
hist(datasetpa$bmr, main="Histogram of the bmr data", 
     xlab="bmr", col="dark blue")

  1. Plot the outliers to identify the observations for bmr. You can type
boxplot(datasetpa$bmr)

outlier_values <- boxplot.stats(datasetpa$bmr)$out 
  1. Install the package robustHD, which has the winsorize function that performs the winsorization process. With your words EXPLAIN what is winsorization.
library(robustHD)
  1. Apply the winsor command to BMR at the 2 percentile level. Comment what you did with the winsor command. Type the following:
winsbmr <- winsorize(na.omit(datasetpa$bmr), standarized= "True", prob= 0.98, return= "data")

What is winsorization?

Winsorizing is the process where the non-missing values of a variable (x) are ordered

    x_1 <= ... <= x_n

and generate a new variable y identical to x except that the h highest and h lowest values 
are replaced by the next value counting inwards from the extremes:

    y_1, ... , y_h = y_(h + 1)

    y_n, ... , y_(n - h + 1) = y_(n - h)
    

This process is applied to variables with extreme values.

For high values= upper percentile is calculated and that value is assigned to the top values located in a percentile less than the assigned.

For low values= a lower percentile is calculated and that value is assigned to the bottom values that are in a percentile lower than the assigned value.

In this case we winsorized bmr for high and low extreme values.

Now we look at a cleaner bmr data, we can see this graphically making a histogram of the new winsorized data:

hist(winsbmr, main="Histogram of the winsorized bmr data", 
     xlab="bmr", col="dark blue")

As expected, most of the BMR values are less than 1 for active firms in the BMR, indicating that market value is greater than book value for most of the active firms most of the time.

Once you winsorize BMR, you can safely use it in a regression model.

4 Simple and Multiple regression models

Using the same dataset we constructed in the previous sections, design the following regression models:

Model 1. A market model for all quarters for ALFAA. Remember the market model is a simple regression model where you use the market returns as the independent variable, and the rm return as the dependent variable. Use the if conditional with the regress command. For example:

#First, I get all of the cc stock returns
datasetpa$stockreturn <- log(datasetpa$adjustedstockprice
        / (lag(datasetpa$adjustedstockprice, 1)))

# Now I do a new object with ALFAA information to make the following 
# regression models

ALFAAdata<-as.data.frame(datasetpa[(datasetpa$firm=="ALFAA"),])
# AALFA_ret <- AALFA_info$`Stock return`

#I set the stock returns as the dependent variable, and the market returns as independent:

reg1 <- lm(ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns)
s_reg1<- summary(reg1)
s_reg1
## 
## Call:
## lm(formula = ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32800 -0.07931 -0.00343  0.10590  0.39669 
## 
## Coefficients:
##                      Estimate Std. Error t value           Pr(>|t|)    
## (Intercept)          -0.01143    0.01673  -0.683              0.496    
## ALFAAdata$MXXreturns  1.62667    0.17725   9.177 0.0000000000000546 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1445 on 77 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5224, Adjusted R-squared:  0.5162 
## F-statistic: 84.22 on 1 and 77 DF,  p-value: 0.0000000000000546

Calculate the confidence intervals:

# To do a rough estimate of the 95% confidence interval for B0:
B0 <- s_reg1$coefficients[1,1]
STDERRORB0 <- s_reg1$coefficients[1,2]
minB0 <- B0  - (2* STDERRORB0 )
maxB0 <- B0  + (2* STDERRORB0 )

cat("The B0 confidence interval goes from about ", minB0, "to", maxB0)
## The B0 confidence interval goes from about  -0.04488987 to 0.02202375
# Instead of using the critical value of t=2, I can use the exact t value that
#  gives me the 95% confidence intervale:
t_critical_value <- abs(qt(0.025,reg1$df.residual))
t_critical_value
## [1] 1.991254
# Note that the function qt gets the t critical value so that the area under the
# function t from -infinite to the t critical value is equal to 0.025.
# This t-critical value depends on the degrees of freedom, which is equal to 
#  the number of observations of the regression minus the number of coefficients
#  estimated in the regression. This t-critical value will always be close to 2.

# I indicate 0.025 since the confidence interval is 95%, so I have 2.5% in the
# left, and 2.5% to the right. I specify the degrees of freedom calculated in the
#  previous model.

# To do an exact estimate of the 95% confidence interval for B0:
minB0 <- B0  - (t_critical_value* STDERRORB0 )
maxB0 <- B0  + (t_critical_value* STDERRORB0 )

cat("The exact B0 confidence interval goes from", minB0, "to", maxB0)
## The exact B0 confidence interval goes from -0.04474357 to 0.02187745
# To estimate the 95% confidence interval for B1:
B1<- s_reg1$coefficients[2,1]
STDERRORB1 <- s_reg1$coefficients[2,2]
minB1 <- B1  - (t_critical_value* STDERRORB1 )
maxB1 <- B1  + (t_critical_value* STDERRORB1 )

cat("The B1 confidence interval goes from", minB1, "to", maxB1)
## The B1 confidence interval goes from 1.273728 to 1.979622

This is a market model for Alfa.

The dependent variable is Alfa quarterly returns, and the independent variable is the market returns. We are using Alfa and market returns from quarter 1 2000 to quarter 4 2019.

  • There is statistical evidence to say that the market returns are linearly and positively related to Alfa’s returns since its beta1 is significantly greater than zero. On average, when the market moves in 1%, Alfa returns will move in about 1.6266749%.

What can we say about the market risk of Alfa? its beta1 is 1.6266749, and its 95% confidence interval goes from 1.2737282 to 1.9796216. Then, we can say that 95% of the cases, its beta1 will be bigger than 1. In other words, we can say that there is enough evidence to say that Alfa is significantly riskier than the market, considering Alfa history from 2000 up to 2019.

  • Even though beta0 is negative, there is no statistical evidence to say that beta0 is significantly less than 0 since its 95% C.I. can move from -0.0447436 to 0.0218774.

In other words, we cannot say that Alfa is offering returns systematically below the market returns.

Model 2. Besides the return of the market, there are many variables that affect firm returns. According to finance research in developed countries, book-market ratio is an important variable that is systematically related to rm returns.

Using the same market model for ALFAA, now you have to include the BMR as another explanatory (independent) variable in your market model.

Run this model and INTERPRET the new regression coefficients. Remember that for multiple regression models, the beta coefficients are actually partial coefficients that measure the marginal effect of the independent variable after considering the effect of the other independent variable(s).

reg2 <- lm(ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns + 
             ALFAAdata$bmr)

s_reg2 <- summary(reg2)
s_reg2
## 
## Call:
## lm(formula = ALFAAdata$stockreturn ~ ALFAAdata$MXXreturns + ALFAAdata$bmr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.30797 -0.09343 -0.01791  0.09929  0.42392 
## 
## Coefficients:
##                      Estimate Std. Error t value          Pr(>|t|)    
## (Intercept)           0.02069    0.02666   0.776             0.440    
## ALFAAdata$MXXreturns  1.58092    0.17819   8.872 0.000000000000235 ***
## ALFAAdata$bmr        -0.02579    0.01676  -1.539             0.128    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1432 on 76 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.5368, Adjusted R-squared:  0.5246 
## F-statistic: 44.04 on 2 and 76 DF,  p-value: 0.0000000000001985

This is a multiple regression model to explain Alfa returns with the following explanatory variables: a) market returns and b) Alfa’s book-to-market ratio. We are using all quarters that Alfa has been in the Mexican market (from q1 2000 to 2019q4).

Now we have 3 beta coefficients: beta1 (B1), the sensitivity of stock returns related to changes in market returns, beta2 (B2), the sensitivity of stock returns to changes in book-to-markEt ratio, and B0, the expected value of stock return when both independent variables are zero. In this case, B0 does not take the same meaning as in the case of the CAPM since now we included another explanatory variable.

Regarding the effect of the market: I will calculate the 95% confidence interval:

B1<- s_reg2$coefficients[2,1]
STDERRORB1<- s_reg2$coefficients[2,2]
minB1 <- B1  - (t_critical_value* STDERRORB1  )
maxB1 <- B1  + (t_critical_value* STDERRORB1 )

cat("The confidence interval of beta1 goes from", minB1, "to", maxB1)
## The confidence interval of beta1 goes from 1.226091 to 1.935743
  • After considering the effect of bmr on Alfa quarterly returns, the effect of market returns is still significant and positive since its 95% C.I. goes from 1.2260915 to 1.9357427.

  • On average, considering no change in book-to-market ratio, when the market quarterly return increases in 1%, Alfa quarterly returns increases in about 1.5809171. This increase can go from 1.2260915 to 1.9357427, so it is significantly bigger than 1. Then, we can say that Alfa stock is significantly riskier than the market after considering the effect of book-to-market ratio.

Regarding the effect of BMR:

  - After considering the effect of the market, the effect of BMR on
    stock quarterly returns of ALFA is negative BUT NOT SIGNIFICANT. 

 I will calculate the 95% confidence interval of B2, the beta coefficient of BMR:
B2<- s_reg2$coefficients[3,1]
STDERRORB2<- s_reg2$coefficients[3,2]
minB2 <- B2  - (t_critical_value* STDERRORB2  )
maxB2 <- B2  + (t_critical_value* STDERRORB2 )

cat("The 95%C.I. of beta2 goes from", minB2, "to", maxB2)
## The 95%C.I. of beta2 goes from -0.05915722 to 0.007578538
  • After considering the effect of the market return on Alfa quarterly returns, the effect of BMR is negative, but not significant at the 95% C.I since its 95% C.I. goes from -0.0591572 to 0.0075785. We can see that most of the values of this range are negative.

We can calculate the 90% C.I. instead of the 95% C.I. to see whether at 90% confidence we can say that B2 is negative:

B2<- s_reg2$coefficients[3,1]
STDERRORB2<- s_reg2$coefficients[3,2]
# I calculate the critical t value to get the 90% C.I. 
t_critical_value90 <- abs(qt(0.050,reg2$df.residual))

minB2 <- B2  - (t_critical_value90* STDERRORB2  )
maxB2 <- B2  + (t_critical_value90* STDERRORB2 )

cat("The 90%C.I. of beta2 goes from", minB2, "to", maxB2)
## The 90%C.I. of beta2 goes from -0.05369264 to 0.002113959
The 90%C.I. ends on a positive value, so I cannot say that beta2 is significantly negative at the 90% C.I. It is expected that for most stocks this beta2 should be significantly negative since BMR and stock returns usually have a negative relationship. 

If we calculate 1-pvalue of beta2, we get the probability that beta2 will be negative. In this case, (1-pvalue)= (1--1.5389992) = 0.9832428. Then, we can say that beta2 is negative with this % of confidence.
  • On average, considering no change in market return, when the book-to-market ratio increases in 1%, Alfa quarterly returns changes in about -0.0257893. This increase can go from -0.0536926 to 0.002114, so it is not significantly less than 0 (although it was close to be significant).

Regarding the constant (b0):

I fist calculate b0 and its 95% confidence intervale:

B0 <- s_reg2$coefficients[1,1]
STDERRORB0 <- s_reg2$coefficients[1,2]
minB0 <- B0  - (2* STDERRORB0 )
maxB0 <- B0  + (2* STDERRORB0 )
  • When the market return is 0 AND BMR is also 0, the expected stock return of Alfa is 0.0206917%. In this case, it is difficult that BMR can be zero. One possibility is that the firm goes to bankrupcy when the book value is close to zero. Another option is that the market value is much bigger than book value, so bmr diminishes to values about 0.1 or less. We can infer that the less the BMR (if market value increases much faster than book-value), the higher will be the expected return of Alfa when the market return is zero.

    IT IS ALSO IMPORTANT TO NOTE THAT the beta coefficients are PARTIAL coefficients since its value is a measure of HOW MUCH the independent variable (like BMR) is explaining the dependent variable (stock return) AFTER considering the effect of the the other(s) independent variable(s).

    As we can see, the results of this model partially confirms finance research that say that bmr and stock returns are negatively related (the significance of beta2 of bmr was not quite significant). This regression is using time-series structure since we are ONLY using the rows (periods) for ALFA company.

5 Quiz 7 and W7 submission

Go to Canvas and respond Quiz 7. You will have 3 attempts. Questions in this Quiz are related to concepts of the readings related to this Workshop.

The grade of this Workshop will be the following:

  • Complete (100%): If you submit an ORIGINAL and COMPLETE HTML file with all the activities, with your notes, and with your OWN RESPONSES to questions
  • Incomplete (75%): If you submit an ORIGINAL HTML file with ALL the activities but you did NOT RESPOND to the questions and/or you did not do all activities and respond to some of the questions.
  • Very Incomplete (10%-70%): If you complete from 10% to 75% of the workshop or you completed more but parts of your work is a copy-paste from other workshops.
  • Not submitted (0%)

Remember that you have to submit your .html file through Canvas BEFORE NEXT CLASS.