# To avoid scientific notation:
options(scipen=999)
#Download the csv files and import them into your R environment:
download.file("https://www.apradie.com/datos/dataus2024.csv","dataus2024.csv")
trying URL 'https://www.apradie.com/datos/dataus2024.csv'
Content type 'text/csv' length 48648943 bytes (46.4 MB)
==================================================
downloaded 46.4 MB
uspanel = read.csv("dataus2024.csv")
download.file("https://www.apradie.com/datos/firmsus2024.csv","firmsus2024.csv")
trying URL 'https://www.apradie.com/datos/firmsus2024.csv'
Content type 'text/csv' length 747426 bytes (729 KB)
==================================================
downloaded 729 KB
usfirms <- read.csv("firmsus2024.csv")
install.packages("dplyr")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-x86_64/contrib/4.4/dplyr_1.1.4.tgz'
Content type 'application/x-gzip' length 1604793 bytes (1.5 MB)
==================================================
downloaded 1.5 MB
The downloaded binary packages are in
/var/folders/yq/4b07mdgj29vgsjn25yz5n7zr0000gn/T//RtmpSYaLBr/downloaded_packages
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Challenge 1: CALCULATE QUARTERLY REVENUE AND EBIT FROM YTD AMOUNTS
uspanel <- uspanel %>%
arrange(firm, year, fiscalmonth) %>%
mutate (EBIT = revenue - cogs - sgae) %>%
group_by(firm, year) %>%
mutate(qrevenue = if_else(fiscalmonth == 3, revenue, revenue -lag(revenue, default = 0)),
qEBIT = if_else(fiscalmonth == 3, EBIT, EBIT - lag(EBIT,default = 0))) %>%
ungroup()
INTERPRETATION:first I get the uspanel where it’s contained all the necessaries variables, and then I arrange the important variables which are the firm, the year and the fiscalmonth, later, I do a new variable which is EBIT, this variable is anual to make more short and easy the process, and by the new variable (anual EBIT), I a group the firm and the year, and then I make the quarter revenue and the quarter EBIT, in the both I make a conditional that is if_else, to handle the special case for the first quarter, where the year-to-date value is the same as the quartely value. I make that the fiscalmonth it’s equal to 3 for March, represents only the activity of that quarter, and then by putting the “,revenue” or “,EBIT” condition assigns the full YTD amount as the quartely revenue and EBIT.In the other part of the condition, represents the current and all previous quarter of the YTD values, and to isolate just the current quarter’s amoount, we need to substract the previous quarter’s YTD value.
Challenge 2:GROWTH OF THE US FINANCIAL MARKET
uspanel <- uspanel %>%
mutate(marketvalue= originalprice * sharesoutstanding) %>%
mutate(netincome = EBIT + otherincome + extraordinaryitems - finexp - incometax)
TA <- uspanel %>%
group_by(year) %>%
summarise(
Total_MarketValue = sum(marketvalue, na.rm =TRUE),
Total_Revenue= sum(revenue, na.rm=TRUE),
Total_NetIncome= sum(netincome, na.rm = TRUE)
)
TA
library(ggplot2)
ggplot(TA, aes(x=year)) +
geom_line(aes(y=Total_MarketValue, color="Total Market Value")) +
geom_line(aes(y=Total_Revenue, color= "Total Revenue")) +
geom_line(aes(y=Total_NetIncome, color= "Total Net Income")) +
labs(
tittle= "U.S. Financial Growth over Time",
x= "Year", y= "Total value",
color= "Variable"
) +
theme_minimal()
library(tidyr)
TA_long <- TA %>%
pivot_longer(cols=c(Total_MarketValue, Total_Revenue, Total_NetIncome),
names_to = "Variable",
values_to = "Value")
ggplot(TA_long, aes(x=year, y= Value, fill=Variable)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title="U.S. Annual Market Value, Revenue and Net Income Growth",
x="year", y="Total value",
fill="Variable"
) +
theme_minimal() +
scale_fill_manual(values=c("Total Market Value" = "blue",
"Total Revenue" = "red",
"Total Net Income" = "orange")) +
theme(axis.text.x = element_text(angle=45, hjust = 39))
The bar chart helps to compare the annual values of each variable in a single graph, as well as the line chart helps to compare variables and see how they behave over time.
Challenge 3: UN A LOGIT MODEL TO PREDICT PROBABILITY OF BEATING THE INDUSTRY MEDIAN RETURN
install.packages("statar")
trying URL 'https://cran.rstudio.com/bin/macosx/big-sur-x86_64/contrib/4.4/statar_0.7.6.tgz'
Content type 'application/x-gzip' length 189074 bytes (184 KB)
==================================================
downloaded 184 KB
The downloaded binary packages are in
/var/folders/yq/4b07mdgj29vgsjn25yz5n7zr0000gn/T//RtmpSYaLBr/downloaded_packages
library(statar)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
uspanel1 <- uspanel %>%
mutate(oepsp = (qEBIT/sharesoutstanding) / originalprice) %>%
mutate(oepspw = winsorize(oepsp, probs = c(0.01,0.99)))
0.37 % observations replaced at the bottom
0.37 % observations replaced at the top
uspanel1 <- uspanel %>%
arrange(firm, year, q) %>%
group_by(firm, year) %>%
mutate(log_return = log(adjprice / lag(adjprice))) %>%
summarise(r = sum(log_return, na.rm = TRUE), .groups = "drop")
uspanel1 <- uspanel1 %>%
group_by(firm, year) %>%
mutate(median_IR= median(r, na.rm=TRUE)) %>%
ungroup() %>%
mutate(HigherReturn=if_else(r > median_IR, 1, 0))
uspanel2 <- merge(uspanel, uspanel1)
uspanel2 <- uspanel2 %>%
mutate(oepsp = (EBIT / sharesoutstanding) / originalprice) %>%
mutate(oepspw = winsorize(oepsp, probs = c(0.01, 0.99)))
0.37 % observations replaced at the bottom
0.37 % observations replaced at the top
model <- glm(HigherReturn ~ oepspw, data = uspanel2, family = binomial)
Warning: glm.fit: algorithm did not converge
summary(model)
Call:
glm(formula = HigherReturn ~ oepspw, family = binomial, data = uspanel2)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -26.5660685236736356 785.2021808009682218 -0.034 0.973
oepspw -0.0000000000001839 3384.3166091260095527 0.000 1.000
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 0.0000000000 on 207386 degrees of freedom
Residual deviance: 0.0000012032 on 207385 degrees of freedom
(349794 observations deleted due to missingness)
AIC: 4
Number of Fisher Scoring iterations: 25
Challenge 4: INTERPRET WITH YOUR WORDS THE LOGIT MODEL
In this logit model we can observe that in the intercept we have a value of -26.566….., which represents the log-odds of Higher Return when oepspw is 0, in this case, we have a low value in the intercept that indicates, without the oepspw efect, the probability of a higher return is nearly inexistant. Also, in the oepspw we have a estimate of -0.0000…… which is a value very closely to 0, which implicate that oepspw have a insignificant efect in HigherReturn, every growth of each unit in oepspw reduce the log-odds of a higher return. In the intercept in the PZ give us a value of -0.034 and the P value is 0.973, a higher p value indicates that the intercept it’s not significant, and in the oepspw in the PZ it’s 0, and in P value is 1.00 which indicates that it’s not significant. Finally, we can analize that the highers standard errors represents a uncertainty over the estimated coeficients.