1 Data collection

You have to import the datasets. We will download these datasets from a web site. Here is the code:

# To avoid scientific notation:
options(scipen=999)

#Read Excel
library(readxl)

# Download the panel dataset from an online excel file:
download.file("http://www.apradie.com/datos/us2020q.xlsx",
              "us2020q.xlsx", mode="wb")
# The first parameter is the link and the second is a name for the
#  local file in the folder in your computer

#Download the file with a list of all US firms:
download.file("http://www.apradie.com/datos/usfirms.xlsx",
              "usfirms.xlsx", mode="wb")

Now we import (into our R environment) the panel quarterly data of all US firms:

uspanel <- read_excel("us2020q.xlsx",sheet = "data")

Now we import the data dictionary for the variables of this dataset:

dictionary <- read_excel("us2020q.xlsx",sheet = "data dictionary")

Now we import that list of all public US firms:

usfirms <- read_excel("usfirms.xlsx")

2 Data management to incorporate market return:

Now download the S&P500 from Yahoo Finance:

library(quantmod)
getSymbols("^GSPC", from="2010-01-01", to= "2020-12-31",
            periodicity="monthly", src="yahoo")
## [1] "^GSPC"

We need to aggregate (collapse) the dataset from monthly to quarterly, keeping the LAST market index per quarter. We can use the to.quarterly function from quantmod:

QSP500 <- to.quarterly(GSPC,indexAt='startof')

This function creates an OHLC xts dataset for each quarter. We can have a look of the content:

head(QSP500)
##            GSPC.Open GSPC.High GSPC.Low GSPC.Close  GSPC.Volume GSPC.Adjusted
## 2010-01-01   1116.56   1180.69  1044.50    1169.43 279192470000       1169.43
## 2010-04-01   1171.23   1219.80  1028.33    1030.71 354511440000       1030.71
## 2010-07-01   1031.10   1157.16  1010.91    1141.20 260105810000       1141.20
## 2010-10-01   1143.49   1262.60  1131.87    1257.64 257671870000       1257.64
## 2011-01-01   1257.62   1344.07  1249.05    1325.83 240896240000       1325.83
## 2011-04-01   1329.48   1370.58  1258.07    1320.64 245196520000       1320.64

We see that the Open, High, Low, Close and Adjusted prices for each quarter were calculated. We only need the Adjusted price to calculate the market returns, so we select only the Close column:

QSP500 = QSP500$GSPC.Adjusted
# Change the name of the column:
colnames(QSP500) <- "SP500index"

Now we can calculate cc return of the market

QSP500$SPreturns <- diff(log(QSP500$SP500index),lag=4) 

Now we are almost ready to merge this quarterly data to the panel dataset using merge.

We need to have both datasets with a common column. In this case, the common column is quarter. Both datasets must be of the same class, in this case, data frame class.

The QMXX has the quarter as index, but not as part of a column. We create a data frame with the quarter column equal to its index:

# Create a dataframe with a quarter column for the QMXX
QSP500.df<-data.frame(q=index(QSP500),coredata(QSP500))
# I extracted the quarter from the index
# coredata is a function that extract the data of an object
#   without extracting formatting

The common column must be of the same type, so we make sure that the column quarter of dataset is a Date type:

Now we do a MANY-TO-ONE merge of both datasets:

uspanel$q<-as.Date(uspanel$q)

uspanel<-merge(uspanel,QSP500.df,by="q")