This project was created to form an ETL pipeline for an API. I accessed financial data API from link and built out a MYSQL database. R and R libraries were used as my interface with the API and the MYSQL database. All packages were listed in the first section Library Import. The report was created as an accompanying html document and will live in my GitHub, the report may be periodically updated.
I decided to explore US regional banks. I built a function to access the API and convert JSON data to a R Dataframe. This function also cleans the data and preps it for our SQL database. Another function was used to insert that R Dataframe into a MYSQL database. After the data was inserted into SQL, SQL queries were executed within R, and the query results were displayed in a datatable in R. Datatables are convenient, as they give users more flexibility than traditional print statements. SQL Windows functions were then used to manipulate the SQL data to explore aggregated behaviors. Some of those functions were generalized to handle any numeric column. Generally, the functions are there to show an understanding of SQL and how to perform data manipulation.
In the section below, the API is accessed and the JSON data is converted and inserted into a MYSQL database. After reading through all the subsections below, please see windows functions section to the right for more exploration.
rm(list=ls())
library(httr)
library(jsonlite)
library(lubridate)
library(tidyverse)
library(RMariaDB)
library(DT)
library(knitr)
library(kableExtra)
pack_ver1 <- cbind("httr",as.character(packageVersion("httr")))
pack_ver2 <- cbind("jsonlite",as.character(packageVersion("jsonlite")))
pack_ver3 <- cbind("lubridate",as.character(packageVersion("lubridate")))
pack_ver4 <- cbind("tidyverse",as.character(packageVersion("tidyverse")))
pack_ver5 <- cbind("RMariaDB",as.character(packageVersion("RMariaDB")))
pack_ver6 <- cbind("DT",as.character(packageVersion("DT")))
pack_ver7 <- cbind("knitr",as.character(packageVersion("knitr")))
pack_ver8 <- cbind("kableExtra",as.character(packageVersion("kableExtra")))
pack_versions <- as.data.frame(rbind(pack_ver1,pack_ver2,pack_ver3,pack_ver4,pack_ver5,pack_ver6,pack_ver7,pack_ver8))
colnames(pack_versions) <- c("Package Name", "Version")
kable(pack_versions)
Package Name | Version |
---|---|
httr | 1.4.0 |
jsonlite | 1.6 |
lubridate | 1.7.4 |
tidyverse | 1.2.1 |
RMariaDB | 1.0.6 |
DT | 0.5 |
knitr | 1.22 |
kableExtra | 1.0.1 |
json_to_df
Build_table
Update table
### Function accesses financial data API and returns a dataframe ready to be inserted into SQL
### Function takes companies stock tag as an input string
json_to_df <- function(tag)
{
## Build url
url <- paste("https://financialmodelingprep.com/api/v3/financials/income-statement/",tag,"?period=quarter",sep="")
headers = c('Upgrade-Insecure-Requests' = '1')
params = list(`datatype` = 'json')
## Make request
result <- GET(url = url, httr::add_headers(.headers=headers), query = params)
result<- rawToChar(result$content)
df <- as.data.frame(fromJSON(result)[2])
## Fix table column names- remove financials. and special char "."
colnames(df) <- gsub("financials.|\\.","",colnames(df))
## Convert Date to datetime/ rename date as reportdate
df$Report_Date <- as.Date(df$date, '%Y-%m-%d')
df <- df %>%
select(-date)
## Build tag column to identify stock ticker
df$Company <- tag
## Build primary key column convert financial data to numeric
df$Id <- paste(as.character(df$Report_Date),df$Company,sep="-")
cols.num <- colnames(df)[1:31]
df[,cols.num] <- sapply(df[cols.num],as.numeric)
## Data check for NA-
table(is.na(df))
return(df)
}
### Function initalizes table creation in mysql
### takes stock tag(chracter), tablename(character)
### If table already exists, will tell you to use different name or use update function instead
build_table <- function(tag,tablename){
## import df from json api call function
df <- json_to_df(tag)
## grab credentials from credential file
db_credentials<-"C:\\Users\\justin\\Desktop\\xmedia.cnf"
my_sql_db<-"xmedia"
## make connection
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
## Build table from df
tryCatch(dbWriteTable(my_conn, value = df,
name = tablename,
overwrite =FALSE,
row.names = FALSE) ,error= function(e){
print("table can not be overwritten and already exists. Please use update table function or change name")})
## Set primary key to Companytag+Date
res <- dbSendQuery(my_conn, paste("ALTER TABLE",tablename,"ADD CONSTRAINT websites_pk
PRIMARY KEY (`Id`(40)) ;"))
## Disconnect
dbClearResult(res)
dbDisconnect(my_conn)
}
### loops through list of stock tags and updates SQL DB
### Will not update db if any of stock data in new queried df already exists in SQL DB
update_table <- function(tags,tablename){
for (tag in tags){
## set error checker
skip_to_next <- FALSE
## import df from json api call function
df <- json_to_df(tag)
## grab credentials from credential file
db_credentials<-"C:\\Users\\justin\\Desktop\\xmedia.cnf"
my_sql_db<-"xmedia"
## make connection
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
# insert df into SQL. Catches primary key conflicts(duplicate data), prints stock wasnt updated, and moves on in loop
tryCatch(dbWriteTable(my_conn, value = df,
name = tablename,
overwrite= FALSE,
append = TRUE,
row.names = FALSE),error= function(e){skip_to_next <<- TRUE})
if(skip_to_next) { print(paste("Company",tag, "data already exists in DB"))
gc()
dbDisconnect(my_conn)
next }
gc()
dbDisconnect(my_conn)
print(paste("db was updated correctly with: ",tag ))
}
}
top_10_banks <- c('WFC', 'PNC', 'BBT', 'STI', 'KEY', 'MTB', 'HBAN', 'ZION', 'CMA', 'FITB')
## build a table in sql
build_table('USB',"financials")
update_table(top_10_banks,"financials")
## [1] "db was updated correctly with: WFC"
## [1] "db was updated correctly with: PNC"
## [1] "db was updated correctly with: BBT"
## [1] "db was updated correctly with: STI"
## [1] "db was updated correctly with: KEY"
## [1] "db was updated correctly with: MTB"
## [1] "db was updated correctly with: HBAN"
## [1] "db was updated correctly with: ZION"
## [1] "db was updated correctly with: CMA"
## [1] "db was updated correctly with: FITB"
## practice some query with db
db_credentials<-"C:\\Users\\justin\\Desktop\\xmedia.cnf"
my_sql_db<-"xmedia"
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
## print out description of table
print(dbGetQuery(my_conn, "DESCRIBE financials;"))
## Field Type Null Key Default Extra
## 1 Revenue double YES <NA>
## 2 RevenueGrowth double YES <NA>
## 3 CostofRevenue double YES <NA>
## 4 GrossProfit double YES <NA>
## 5 RDExpenses double YES <NA>
## 6 SGAExpense double YES <NA>
## 7 OperatingExpenses double YES <NA>
## 8 OperatingIncome double YES <NA>
## 9 InterestExpense double YES <NA>
## 10 EarningsbeforeTax double YES <NA>
## 11 IncomeTaxExpense double YES <NA>
## 12 NetIncomeNonControllingint double YES <NA>
## 13 NetIncomeDiscontinuedops double YES <NA>
## 14 NetIncome double YES <NA>
## 15 PreferredDividends double YES <NA>
## 16 NetIncomeCom double YES <NA>
## 17 EPS double YES <NA>
## 18 EPSDiluted double YES <NA>
## 19 WeightedAverageShsOut double YES <NA>
## 20 WeightedAverageShsOutDil double YES <NA>
## 21 DividendperShare double YES <NA>
## 22 GrossMargin double YES <NA>
## 23 EBITDAMargin double YES <NA>
## 24 EBITMargin double YES <NA>
## 25 ProfitMargin double YES <NA>
## 26 FreeCashFlowmargin double YES <NA>
## 27 EBITDA double YES <NA>
## 28 EBIT double YES <NA>
## 29 ConsolidatedIncome double YES <NA>
## 30 EarningsBeforeTaxMargin double YES <NA>
## 31 NetProfitMargin double YES <NA>
## 32 Report_Date date YES <NA>
## 33 Company text YES <NA>
## 34 Id text NO PRI <NA>
##
## FALSE TRUE
## 15105 59
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 1032373 55.2 2006685 107.2 1548993 82.8
## Vcells 1817225 13.9 8388608 64.0 2995184 22.9
I look at 10 of the largest publicly traded banks by capsize. Windows functions can be useful to compare value over different periods. The main windows functions used are
In my queries, LAG is used to create a growth column. Quarterly data can suffer from seasonality, therefore, we can use LAG set with 4 to compare the YOY(year over year) quarterly growth. For example, in our data (lag,4) returns Q1 2018 if our current value is Q1 2019. I then use a growth formula to measure growth. Next, I used DENSERANK to rank the results of our LAG query. This function builds a ranking based on growth and can be modified to look at industry level versus company level quarterly performance rankings. The last function builds a three-year rolling average. It’s designed to demonstrate how one would apply a cumulative sum type windows function query.
### Uses Lag to build Year over year quarterly growth for desired columns
yty_growth <- function(statistic,table_name){
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name,") SELECT Company,Report_Date,",statistic, ",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS Growth FROM Last_Year;",sep="" )
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
### Uses dense rank to rank to first develop a growth percent by any col and then rank those growth rates
### Function inputs are statistic, table name, partition set
### if input partition is set to 1, then function will rank individual Company performance
### If partition is not set to 1, function will return ranks of the aggregate banking industry
yty_ranking <- function(statistic,table_name,partition=0){
##Build connection
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
##Build query without partition
if (partition==0){
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",
statistic,", LAG(",statistic, ",4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic,
" FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,
",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,
"*-1,3) AS Growth FROM Last_Year) SELECT *, CASE WHEN Growth IS NOT NULL then DENSE_RANK() OVER ( ORDER BY Growth desc) end) AS ranked_Growth FROM ",statistic,"_table;",sep="")
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
##Build query with partition
else {
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS Growth FROM Last_Year) SELECT *, (CASE WHEN Growth IS NOT NULL then DENSE_RANK() OVER ( PARTITION BY Company ORDER BY Growth desc) end) AS ranked_Growth FROM ",statistic,"_table;",sep="")
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
}
## aggregate data to yearly data and build out a 3 year moving average
## Query does not take any inputs, it's to show how you would create a moving average, similar to creating a cumulative sum value for customers
yearly_moving_avg <- function()
{
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- paste(
" WITH yearly_revenue AS (SELECT Company, Year(Report_Date) AS years, sum(revenue) AS revenues FROM financials GROUP BY Company,",
"Year(Report_Date) ORDER BY Company,Year(Report_Date)) SELECT *, CASE WHEN years not in (2009,2010) then sum(revenues) OVER",
"(ORDER BY Company, years rows between 3 preceding and current row) else null end AS 'Three_Year_Running_Avg' FROM yearly_revenue ORDER BY Company,years;", sep="" )
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
### Uses Lag to build Year over year quarterly Growth for desired columns
yty_growth <- function(statistic,table_name){
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name,") SELECT Company,Report_Date,",statistic, ",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS Growth FROM Last_Year;",sep="" )
print(query)
res <- dbGetQuery(my_conn,query)
write.csv(res, file = paste("yty_growth",statistic,".csv",sep="" ))
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
## call function
yty_growth("Revenue","financials")
## [1] "WITH Last_Year AS (SELECT Company,Report_Date,Revenue, LAG(Revenue,4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_Revenue FROM financials) SELECT Company,Report_Date,Revenue,Last_Year_Revenue, ROUND((Last_Year_Revenue-Revenue)/Last_Year_Revenue*-1,3) AS Growth FROM Last_Year;"
yty_growth <- function(statistic,table_name){
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name,") SELECT Company,Report_Date,",statistic, ",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS growth FROM Last_Year;",sep="" )
print(query)
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
## call function
yty_growth("OperatingExpenses","financials")
## [1] "WITH Last_Year AS (SELECT Company,Report_Date,OperatingExpenses, LAG(OperatingExpenses,4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_OperatingExpenses FROM financials) SELECT Company,Report_Date,OperatingExpenses,Last_Year_OperatingExpenses, ROUND((Last_Year_OperatingExpenses-OperatingExpenses)/Last_Year_OperatingExpenses*-1,3) AS growth FROM Last_Year;"
yty_ranking <- function(statistic,table_name,partition=0){
##Build connection
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
##Build query without partition
if (partition==0){
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",
statistic,", LAG(",statistic, ",4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic,
" FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,
",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,
"*-1,3) AS growth FROM Last_Year) SELECT *, (CASE WHEN growth IS NOT NULL then DENSE_RANK() OVER ( ORDER BY growth desc) end) AS ranked_growth FROM ",statistic,"_table;",sep="")
print(query)
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
##Build query with partition
else {
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS growth FROM Last_Year) SELECT *, (CASE WHEN growth IS NOT NULL then DENSE_RANK() OVER ( PARTITION BY Company ORDER BY growth desc) end) AS ranked_growth FROM ",statistic,"_table;",sep="")
print(query)
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
}
## call function
yty_ranking("Revenue","financials")
## [1] "WITH Last_Year AS (SELECT Company,Report_Date,Revenue, LAG(Revenue,4) Over( PARTITION BY Company ORDER BY Report_Date) Last_Year_Revenue FROM financials ),Revenue_table AS (SELECT Company,Report_Date,Revenue,Last_Year_Revenue, ROUND((Last_Year_Revenue-Revenue)/Last_Year_Revenue*-1,3) AS growth FROM Last_Year) SELECT *, (CASE WHEN growth IS NOT NULL then DENSE_RANK() OVER ( ORDER BY growth desc) end) AS ranked_growth FROM Revenue_table;"
same function as above function
yty_ranking <- function(statistic,table_name,partition=0){
##Build connection
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
##Build query without partition
if (partition==0){
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",
statistic,", LAG(",statistic, ",4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic,
" FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,
",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,
"*-1,3) AS growth FROM Last_Year) SELECT *, (case when growth is not null then dense_rank() OVER ( ORDER BY growth desc) end) AS ranked_growth FROM ",statistic,"_table;",sep="")
print(query)
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
##Build query with partition
else {
query <- paste("WITH Last_Year AS (SELECT Company,Report_Date,",statistic,", LAG(",statistic, ",4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_",statistic," FROM ",table_name," ),", statistic,"_table AS (SELECT Company,Report_Date,",statistic,",Last_Year_",statistic,", ROUND((Last_Year_",statistic,"-",statistic,")/Last_Year_",statistic,"*-1,3) AS growth FROM Last_Year) SELECT *, (case when growth is not null then dense_rank() OVER ( PARTITION BY Company ORDER BY growth desc) end) AS ranked_growth FROM ",statistic,"_table;",sep="")
print(query)
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
}
## call function
yty_ranking("Revenue","financials",partition=1)
## [1] "WITH Last_Year AS (SELECT Company,Report_Date,Revenue, LAG(Revenue,4) OVER( PARTITION BY Company ORDER BY Report_Date) Last_Year_Revenue FROM financials ),Revenue_table AS (SELECT Company,Report_Date,Revenue,Last_Year_Revenue, ROUND((Last_Year_Revenue-Revenue)/Last_Year_Revenue*-1,3) AS growth FROM Last_Year) SELECT *, (case when growth is not null then dense_rank() OVER ( PARTITION BY Company ORDER BY growth desc) end) AS ranked_growth FROM Revenue_table;"
yearly_moving_avg <- function()
{
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- paste(
" WITH yearly_revenue AS (SELECT Company, Year(Report_Date) AS years, sum(revenue) AS revenues FROM financials GROUP BY Company,",
"Year(Report_Date) ORDER BY Company,Year(Report_Date)) SELECT *, CASE WHEN years NOT IN (2009,2010) THEN SUM(revenues) OVER",
"(ORDER BY Company, years ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) ELSE NULL END AS 'Three_Year_Running_Avg' FROM yearly_revenue ORDER BY Company,years;", sep="" )
## Send query request and display result
res <- dbGetQuery(my_conn,query)
datatable(res)
gc()
dbDisconnect(my_conn)
return(datatable(res))
}
## call function
yearly_moving_avg()
This section is just an extremely fast overview of capabilities of R to pull in data and create tables and graphs. We are simply exploring the revenue column and some quick visualizations for EDA. Overall little time was invested, but I will gladly show more if asked
my_conn<-dbConnect(RMariaDB::MariaDB(),
default.file=db_credentials,
group=my_sql_db)
query <- "select revenue,RevenueGrowth, company, Report_date from financials"
## Send query request and display result
res <- dbGetQuery(my_conn,query)
revenue_df <- data.frame(res)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 1036662 55.4 2006685 107.2 1548993 82.8
## Vcells 1837535 14.1 8388608 64.0 2995184 22.9
Table of Revenue and Growth of Big Banks
Table of Total Revenue From 2009-2019 of Big Banks
company | total_revenue |
---|---|
WFC | 795758000000 |
USB | 184094000000 |
PNC | 161516000000 |
BBT | 91973000000 |
STI | 76026762000 |
FITB | 60527000000 |
KEY | 44725000000 |
MTB | 43827760000 |
HBAN | 29365887000 |
CMA | 26165000000 |
ZION | 21116056000 |
Table of Average Quarterly Revenue From 2009-2019 of Big Banks
company | avg_quarterly_revnue |
---|---|
WFC | 19893950000 |
USB | 4490097561 |
PNC | 3939414634 |
BBT | 2243243902 |
STI | 1900669050 |
FITB | 1513175000 |
MTB | 1095694000 |
KEY | 1090853659 |
HBAN | 716241146 |
CMA | 638170732 |
ZION | 527901400 |
Table of Cumulative Sum of Revenue From 2009-2019 of Big Banks
We can see that Wells Fargo is way too large and it effects our comparisons. Lets exclude Wells Fargo
While the cumulative sum can give us a picture of overall strength in the market and while it appears some banks broke away around 2011, looking at quarterly revenue could yield more fruitful.
The quarterly revenues give us a much clearer story. Almost all the banks have increased revenue from 2009. We can explore growth rates like those we made in SQL as well. Revenue Growth has always been present in the data, so we can just call the column and see that it matches up correctly off by less than .001 in 384/385 cases (rounding errors) with a custom built R function. I print the R code below so you can see the ease of code required
revenue_df <- revenue_df %>%
arrange(company, Report_date) %>%
group_by(company) %>%
mutate(last_year = lag(revenue,4)) %>%
ungroup() %>%
mutate(growth=round((last_year-revenue)/last_year*-1,3))
## Growth table
datatable(revenue_df)
## Compare growth table to tables existing growth rate
table((revenue_df$RevenueGrowth - revenue_df$growth)<.001)
##
## FALSE TRUE
## 1 365
Then we can graph growth rate
Our graph allows us to see that there were some intense swings in the growth rate of HBAN and KEY My assumption would be they have relatively small cap sizes, or they experienced prolonged damage from the recession. We can explore their cap size relative to other banks. Another thing to explore would be filtering this from 2012 forward to get a better idea of swings minus huge outliers and getting several years removed from recession. We can also explore all negative growth rate quarters and see if there were any patterns. We would likely want to build individual graphs of these banks as well.
company | sum_revenue |
---|---|
PNC | 42044000000 |
USB | 38519000000 |
BBT | 18535000000 |
STI | 15827762000 |
FITB | 14279000000 |
KEY | 8878000000 |
MTB | 8688000000 |
CMA | 5386000000 |
HBAN | 4532340000 |
ZION | 4147812000 |
Graphing HBAN and KEY only
Getting only negative growth rates
library(plotly)
library(gganimate)
cumsum_revnue <- cumsum_revnue %>%
ungroup() %>%
arrange(Report_date) %>%
mutate(company= as.factor(company)) %>%
filter(year(Report_date)<2019) %>%
group_by(Report_date) %>%
mutate(ordering = min_rank(-cum_sum_revenue_in_billions * 1.0)) %>%
ungroup()
# animated barplot Race for cumulative revenue:
my_animation <- cumsum_revnue %>%
ggplot( aes(x=ordering,group = company,fill=company)) +
# geom_bar(stat='identity')+
geom_tile(aes(y =cum_sum_revenue_in_billions/2 ,
height = cum_sum_revenue_in_billions,
width = 0.9), alpha = 0.9) +
geom_text(aes(y = cum_sum_revenue_in_billions, label = company), vjust = -0.5) +
# text in x-axis (requires clip = "off" in coord_cartesian)
geom_text(aes(y = 0, label = company), vjust = 2) +
ylab("Cumulative Revenue Race pre 2019") +
coord_cartesian(clip = "off", expand = FALSE) +
labs(title=paste("CUMULATIVE Revenue From {closest_state}"), x = "") +
theme(plot.title = element_text(hjust = 1, size = 22),
axis.ticks.x = element_blank(),
axis.text.x = element_blank()) +
transition_states(Report_date,
transition_length = 2, state_length = 1)
animate(my_animation, 200, fps = 10, duration = 25, width = 800, height = 600, renderer = gifski_renderer("animation/this.gif") )