Abstract: In this workshop we continue practicing with descriptive statistics, data understanding and financial data calculations. In addition, we introduce what is Machine Learning, and we do a review of regression models and application in Finance.
We will work with a panel data of real US public firms. This is the dataset you will use for the Problem Situation (Final Project) of this class. First, we will do descriptive statistics and visualization with the purpose of understanding the US financial market. In addition, we will start learning about machine learning and do a review of regression models.
You will download 3 datasets:
· usfirms.xlsx: List of US public firms with general information of each firm · us2020q.xlsx /data: Panel data with historical annual financial data for all US public firms. This is in an Excel sheet called “data” in the us2020q.xlsx Excel workbook. · us2020q.xlsx/data dictionary: dictionary for each variable in the panel dataset. This is in an Excel sheet called “dictionary data” in the us2020q.xlsx Excel workbook.
You have to manage these datasets to respond the following questions. Hint: it is recommended to merge the main variables of the list of firms with the panel data.
You have to import the datasets. We will download these datasets from a web site. Here is the code:
options(scipen=999)
library(readxl)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(PerformanceAnalytics)
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
##
## first, last
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
download.file("http://www.apradie.com/datos/us2020q.xlsx",
"us2020q.xlsx", mode="wb")
download.file("http://www.apradie.com/datos/usfirms.xlsx",
"usfirms.xlsx", mode="wb")
Now we import (into our R environment) the panel quarterly data of all US firms:
uspanel <- read_excel("us2020q.xlsx",sheet = "data")
Now we import the data dictionary for the variables of this dataset:
dictionary <- read_excel("us2020q.xlsx",sheet = "data dictionary")
Now we import that list of all public US firms:
usfirms <- read_excel("usfirms.xlsx")
As a quick check of the imported data, write the code do the following query:
Do an online check, find the most recent 10-K report (annual financial statement) of Apple, and compare 2020 anual sales with what you got. You must get the same amount for the anual sales reported at the end of fiscal year (for Apple, the fiscal year is Q3).
AAPL_Revenue_TotalAssets = uspanel[uspanel$firm == "AAPL" &
uspanel$year >= 2019, c(1:3,7)]
AAPL_Revenue_TotalAssets
Before doing basic descriptive statistics for the US market, you do some data management algorithms. To do this, you have to select firm-quarters with fiscalmonth=12, year=2020 and only active firms. Save the result in a new dataset called panel2020.
USF_USP_m <- merge(usfirms, uspanel, by = "firm")
panel2020 = USF_USP_m[USF_USP_m$status == "active" &
USF_USP_m$fiscalmonth == 12 &
USF_USP_m$year == 2020, c(1,12:45) ]
panel2020
In the panel dataset we do not have general information of each firm such as firm name, industry, status (active or cancelled), etc. So, we start by integrating this information into the panel data:
In this dataset, we have quarterly financial data. For income-statement variables, the information is the cumulative amount from Q1 to the corresponding quarter of the fiscal year. For balance-sheet variables, the information is the value of the variable at the end of the quarter (since the beginning when the firm was created).
The fiscal month variable indicates the following: if fiscalmonth=12, then that means that the corresponding quarter is the end-of-the-fiscal-year quarter.
panel2020_2 <- merge(usfirms, panel2020, by = "firm")
panel2020_2
Using the panel2020 dataset, you have to do the following descriptive statistics and respond the following questions.
You have to calculate market capitalization (marketcap) in this dataset. You can calculate a new column as follows:
marketcap = original stock price * shareoutstanding.
marketcap <- (panel2020_2$originalprice * panel2020_2$sharesoutstanding)
panel2020_2 <- cbind(panel2020_2, marketcap)
This is the market value of the firm in each quarter. We use the original stock price (before stock splits and dividend adjustments) since the # of shares outstanding is the historical # of shares.
panel2020_3 <- (panel2020_2[,c("totalassets", "marketcap")])
panel2020_3 <- na.omit(panel2020_3)
summary(panel2020_3)
## totalassets marketcap
## Min. : 266 Min. : 2
## 1st Qu.: 353334 1st Qu.: 293888
## Median : 1718393 Median : 1392349
## Mean : 16123109 Mean : 11874836
## 3rd Qu.: 6396786 3rd Qu.: 5357797
## Max. :3386071000 Max. :1980644694
INTERPRETACIÓN:
CON RELACIÓN A LOS ACTIVOS TOTALES PODEMOS CONCLUIR:
·CERCA DEL 25% DE LAS FIRMAS REGISTRADAS EN ESTADOS UNIDOS, CUENTAN CON UNA COLUMNA DE ACTIVOS POR USD 353 MILES DE MILLONES O MENOS; 50% CUENTAN CON UNA COLUMNA DE ACTIVOS POR USD 1.718 MILES DE MILLONES O MENOS; Y, 75% CUENTAN CON UNA COLUMNA DE ACTIVOS POR USD 6.3 MILES DE MILLONES O MENOS. · LA EMPRESA CON MAYOR NÚMERO DE ACTIVOS BAJO GESTIÓN, MANTIENE 3,386,071,000 USD
RESPECTO A LA CAPITALIZACIÓN DE MERCADO, PODEMOS CONCLUIR:
·LA MEDIANA CAPITALIZACIÓN DE LAS EMPRESAS EN EL MERCADO ESTADOUNIDENSE ES DE USD 1.392 MILES DE MILLONES. ·LA EMPRESA CON MAYOR CAPITALIZACIÓN DE MERCADO ES VALORADA EN USD 1,980,644,694; SEGURAMENTE, APPLE.
industry_panel2020 <- unique(panel2020_2$naics1)
panel2020_2 %>%
filter(year =='2020') %>%
group_by(naics1) %>%
summarize(firms = n(),
totalassets_median <- median(totalassets, na.rm = TRUE),
marketcap_median <- median(marketcap, na.rm = TRUE),
assets_percentil25 <- quantile(totalassets, probs = (0.25)))
Top10_TotalAssets <- (panel2020_2) %>%
select("CompanyName","naics1","totalassets") %>%
arrange(desc(totalassets) )
Top10_TotalAssets
Top10_MarketCap <- (panel2020_2) %>%
select("CompanyName","naics1","marketcap") %>%
arrange(desc(marketcap) )
Top10_MarketCap
hist(panel2020_2$marketcap)
INTERPRETACIÓN:
MÁS DE LAS 3,000 EMPRESAS REGISTRADAS EN EE.UU. ARROJAN UNA CAPITALIZACIÓN DE MERCADO POR DEBAJO DE 500,000,000 USD. MENOS DE 500 EMPRESAS REPORTAN UN MARKET CAP SUPERIOR A 250,000,000 USD. EL GRÁFICO SE HA VISTO EXTENDIDO POR EL MARKET CAP DE APPLE, CUYO VALOR ASCIENDE HASTA CERCA DE LOS 2 BILLONES DE USD.
Administrative_and_Support_and_Waste_Management_and_Remediation_Services <- panel2020_2[panel2020_2$naics1 =="Administrative and Support and Waste Management and Remediation Services",]
chart.Boxplot(Administrative_and_Support_and_Waste_Management_and_Remediation_Services$marketcap)
Agriculture_Forestry_Fishing_and_Hunting <- panel2020_2[panel2020_2$naics1 =="Agriculture, Forestry, Fishing and Hunting",]
chart.Boxplot(Agriculture_Forestry_Fishing_and_Hunting$marketcap)
Arts_Entertainment_and_Recreation <- panel2020_2[panel2020_2$naics1 =="Arts, Entertainment, and Recreation",]
chart.Boxplot(Arts_Entertainment_and_Recreation$marketcap)
Manufacturing <- panel2020_2[panel2020_2$naics1 =="Manufacturing",]
chart.Boxplot(Manufacturing$marketcap)
Retail_Trade <- panel2020_2[panel2020_2$naics1 =="Retail Trade",]
chart.Boxplot(Retail_Trade$marketcap)
Finance_and_Insurance <- panel2020_2[panel2020_2$naics1 =="Finance and Insurance",]
chart.Boxplot(Finance_and_Insurance$marketcap)
Professional_Scientific_and_Technical_Services <- panel2020_2[panel2020_2$naics1 =="Professional, Scientific, and Technical Services",]
chart.Boxplot(Professional_Scientific_and_Technical_Services$marketcap)
Construction <- panel2020_2[panel2020_2$naics1 =="Construction",]
chart.Boxplot(Construction$marketcap)
Educational_Services <- panel2020_2[panel2020_2$naics1 =="Educational Services",]
chart.Boxplot(Educational_Services$marketcap)
Health_Care_and_Social_Assistance <- panel2020_2[panel2020_2$naics1 =="Health Care and Social Assistance",]
chart.Boxplot(Health_Care_and_Social_Assistance$marketcap)
Information <- panel2020_2[panel2020_2$naics1 =="Information",]
chart.Boxplot(Information$marketcap)
Mining_Quarrying_and_Oil_and_Gas_Extraction <- panel2020_2[panel2020_2$naics1 =="Mining, Quarrying, and Oil and Gas Extraction",]
chart.Boxplot(Mining_Quarrying_and_Oil_and_Gas_Extraction$marketcap)
Other_Services <- panel2020_2[panel2020_2$naics1 =="Other Services (except Public Administration)",]
chart.Boxplot(Other_Services$marketcap)
Real_Estate_and_Rental_and_Leasing <- panel2020_2[panel2020_2$naics1 =="Real Estate and Rental and Leasing",]
chart.Boxplot(Real_Estate_and_Rental_and_Leasing$marketcap)
Transportation_and_Warehousing <- panel2020_2[panel2020_2$naics1 =="Transportation and Warehousing",]
chart.Boxplot(Transportation_and_Warehousing$marketcap)
Utilities <- panel2020_2[panel2020_2$naics1 =="Utilities",]
chart.Boxplot(Utilities$marketcap)
Wholesale_Trade <- panel2020_2[panel2020_2$naics1 =="Wholesale Trade",]
chart.Boxplot(Wholesale_Trade$marketcap)
Using the plm package, indicate R that uspanel is a panel data, which has firm column for the subjects and quarter (q) for the time variable. Keep the same name of uspanel.
library(plm)
##
## Attaching package: 'plm'
## The following objects are masked from 'package:dplyr':
##
## between, lag, lead
uspanel <- pdata.frame(uspanel, index= c("firm","q"))
You have to write the code to calculate the following financial variables and financial ratios for all firms-quarters of US firms.
Using the original panel dataset (uspanel), create columns for the following variables:
· Gross profit (grossprofit): Revenue - Cost of good Sold · Earnings before interest and taxes (ebit): Gross profit - Sales & general administrative expenses · Net Income (netincome): ebit - financial expenses - income taxes · Annual market return: use adjusted stock price and remember that you have quarterly data. · Market capitalization: (marketcap): original stock price * shareoutstanding.
This is the market value of the firm in each quarter. We use the original stock price (before stock splits and dividend adjustments) since the # of shares outstanding is the historical # of shares.
uspanel$grossprofit = (uspanel$revenue - uspanel$cogs)
uspanel$ebit = (uspanel$grossprofit - uspanel$sgae)
uspanel$netincome = (uspanel$ebit - uspanel$finexp - uspanel$incometax)
uspanel$Market_Cap = (uspanel$originalprice * uspanel$sharesoutstanding)
Using the same panel dataset (uspanel), create columns for the following financial ratios:
· Operational Return on Assets (roabit): roabit divided by total assets at the beginning of the period. Total assets of the beginning of the year is actually the total assets of ONE year ago.
\[roabit = (ebit/totalassets)\]
Here you have to use the lag function of the plm package to get value of total assets one year ago, which is 4 quarters ago.
Return on Assets (roa)
Operational Earnings per share (oeps): ebit / sharesoutstanding
Operational epsebit deflated by stock price (oepsp) : eps_ebit / original stock price
Book-to-market ratio (bmr): book value / market value. Use total assets as book value of the firm-quarter.
** Do your own research and briefly explain what is earnings per share deflated by price, and book-to-market ratio **
uspanel$roabit = (uspanel$ebit/uspanel$totalassets)
roabit_1 = ifelse(uspanel$roabit>
plm::lag(uspanel$roabit, 4),1,0)
uspanel$roa = (uspanel$netincome/uspanel$totalassets)
roa_1 = ifelse(uspanel$roa>
plm::lag(uspanel$roa, 4),1,0)
uspanel$OEPS = uspanel$ebit / uspanel$sharesoutstanding
uspanel$OEPSP = uspanel$OEPS / uspanel$originalprice
uspanel$BMR = uspanel$totalassets / (uspanel$originalprice*uspanel$sharesoutstanding)
EPS: Earnings per share (EPS) is calculated as a company’s profit divided by the outstanding shares of its common stock. The resulting number serves as an indicator of a company’s profitability.
BOOK TO MARKET RATIO: According to investopedia, the book-to-market ratio helps investors find a company’s value by comparing the firm’s book value to its market value. A high book-to-market ratio might mean that the market is valuing the company’s equity cheaply compared to its book value. Many investors are familiar with the price-to-book ratio, which is simply the inverse of the book-to-market ratio formula.
Using the uspanel dataset you have to run a multiple pulled regression model to examine whether the operational earnings per share deflated by price (peps) and operational ROA (roabit) are related to annual market returns 1 quarter later.
Interpret the regression coefficients of this model.
Pay attention in class for the hints to write the code to run this model. Hint: you have to use the lm function.
Do a quick research about what is Machine Learning.
De acuerdo con BBVA, en su artículo “Machine learning: ¿qué es y cómo funciona?”, el ‘machine learning’ –aprendizaje automático– es una rama de la inteligencia artificial que permite que las máquinas aprendan sin ser expresamente programadas para ello. Una habilidad indispensable para hacer sistemas capaces de identificar patrones entre los datos para hacer predicciones. Esta tecnología está presente en un sinfín de aplicaciones como las recomendaciones de Netflix o Spotify, las respuestas inteligentes de Gmail o el habla de Siri y Alexa.
“En definitiva, el ‘machine learning’ es un maestro del reconocimiento de patrones, y es capaz de convertir una muestra de datos en un programa informático capaz de extraer inferencias de nuevos conjuntos de datos para los que no ha sido entrenado previamente”, explica José Luis Espinoza, científico de datos de BBVA México.
Referencia: https://www.bbva.com/es/machine-learning-que-es-y-como-funciona/