Homework 2 is all about using ggplot2. You will use tech_co_cstat.dta (or .zip) data set you have used previously in Homework 1. You are aware of its structure and the meanings of the variables. Recall that you explored it in Homework 1. Knowing your data set well before you start exploring it is absolutely essential for data science.
d1 <- read_csv("tech_financials.csv", show_col_types = FALSE) %>%
filter(sale > 0) %>%
mutate(conm = stringr::str_to_title(conm), # Converts the string to title case
datadate = lubridate::ymd(datadate)) # Convert datadate into a date variable
View(d1)
Create a bar graph of the average sales of each company using the variable sale such that the bars are arranged in an ascending order. I’m giving you the code to prepare the data set to make the plot:
d1_1 <- d1 %>%
group_by(conm) %>%
summarize(avg_sale = mean(sale), .groups = "drop")
View(d1)
d1_1 has the appropriate variable you need to create the bar plot. Notice that I have used .groups = "drop" inside summarize(). This makes sure that the data you will use in ggplot() is not grouped.
d1_1 <- d1 %>%
group_by(conm) %>%
summarize(avg_sale = mean(sale),.groups = "drop") %>%
ggplot(aes(x = avg_sale, y = reorder(conm,(avg_sale))))+
geom_bar(stat = 'identity')+
labs(x = "Average Sale in $millions", y = "Company")+
scale_x_continuous(labels = scales::dollar_format(prefix = "$"))
d1_1
Modify the plot in Q1 to add text labels to the bars. Note that I used hjust = -0.2 and size = 3 for the text labels.
d1_2 <- d1 %>%
group_by(conm) %>%
summarize(avg_sale = mean(sale),.groups = "drop") %>%
ggplot(aes(x = avg_sale, y = reorder(conm,(avg_sale))))+
geom_bar(stat = 'identity', size = 3)+
geom_text(aes(label = round(avg_sale/1000), hjust=-0.2)) +
labs(x = "Average Sale in $millions", y = "Company")+
scale_x_continuous(labels = scales::dollar_format(prefix = "$"))
d1_2
In finance, it is widely believed that companies with more R&D prefer lower debt. Let’s explore whether we can observe this relationship in our data. Using mutate, first create these two variables. (I am giving you the exact formulas to put inside mutate().)
debt_ratio = (dlc + replace_na(dltt, 0)) / at
rnd_int = xrd / sale
Next, create a scatterplot with debt_ratio on the X axis and rnd_int on the Y axis.
d1_3 <- d1 %>% mutate(debtratio = (dlc + replace_na(dltt,0))/at)%>%
mutate(rnd_int=xrd/sale)%>%
ggplot(data = ., aes(x = debtratio, y = rnd_int)) +
geom_point(colour = "black", size = 2, pch = 21, fill = "grey")+
geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
labs(x = "Debt Ratio", y = "R&D to Sales Ratio")
d1_3
Profit margin is simply profits divided by sales. Compare profit margins of the following six companies - Apple, Tesla, Meta, Twitter, Amazon, and Qualcomm - over the full sample period. Use fyear on the X axis. fyear is the fiscal year.
Here I give you the code to get the data set in the required form. First, note that I am using the variable tic to filter certain “ticker symbols”, which are the IDs used by stock markets for companies. I am doing this simply to save on typing rather than writing out the entire company names! You could also use gvkey as it is a company identifier. But gvkey are not intuitive. Ticker symbols can help you guess (in most cases) what a stock is.
As you are using fyear for plotting a time series, we have to make sure that fyear is indeed interpreted by ggplot2 as a time variable. However, it’s not that straightforward. This is because fiscal years, unlike calendar years, don’t all end exactly on the same day! I know it sounds insane but that’s true. Think about this like school years in different school districts. If the fiscal year ends in different months, how can we create a valid comparison among these companies? Indeed, the variable datadate, which is the fiscal year end date, is not the same for all the companies for any given fiscal year. Luckily we are dealing with annual data and so we can artificially choose to pick a common year end date for the sake of making the plot. Note that this is not the right thing to do for statistical or financial analysis! This simply helps us in making a meaningful plot. As such, I am setting the year end date for all the fiscal years to December 31st. Below, paste function will create strings in “yyyy-mm-dd” format with mm being 12 and dd being 31 as show below. Next, as.Date() function from base R will convert it into an actual date format!
d1_4 <- d1 %>%
filter(tic %in% c("AAPL", "TSLA", "META", "TWTR", "AMZN", "QCOM")) %>%
mutate(pr_margin = oibdp / sale,
fyear = as.Date(paste(fyear, "12", "31", sep = "-")))
d1_4
## # A tibble: 71 x 982
## gvkey datadate fyear indfmt consol popsrc datafmt tic cusip conm
## <chr> <date> <date> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 001690 2010-09-30 2010-12-31 INDL C D STD AAPL 037833~ Appl~
## 2 001690 2011-09-30 2011-12-31 INDL C D STD AAPL 037833~ Appl~
## 3 001690 2012-09-30 2012-12-31 INDL C D STD AAPL 037833~ Appl~
## 4 001690 2013-09-30 2013-12-31 INDL C D STD AAPL 037833~ Appl~
## 5 001690 2014-09-30 2014-12-31 INDL C D STD AAPL 037833~ Appl~
## 6 001690 2015-09-30 2015-12-31 INDL C D STD AAPL 037833~ Appl~
## 7 001690 2016-09-30 2016-12-31 INDL C D STD AAPL 037833~ Appl~
## 8 001690 2017-09-30 2017-12-31 INDL C D STD AAPL 037833~ Appl~
## 9 001690 2018-09-30 2018-12-31 INDL C D STD AAPL 037833~ Appl~
## 10 001690 2019-09-30 2019-12-31 INDL C D STD AAPL 037833~ Appl~
## # ... with 61 more rows, and 972 more variables: acctchg <chr>, acctstd <chr>,
## # acqmeth <chr>, adrr <lgl>, ajex <dbl>, ajp <dbl>, bspr <lgl>, compst <chr>,
## # curcd <chr>, curncd <chr>, currtr <dbl>, curuscn <lgl>, final <chr>,
## # fyr <dbl>, ismod <dbl>, ltcm <dbl>, ogm <lgl>, pddur <dbl>, scf <dbl>,
## # src <dbl>, stalt <lgl>, udpl <lgl>, upd <dbl>, apdedate <dbl>, fdate <dbl>,
## # pdate <dbl>, acchg <dbl>, acco <lgl>, accrt <lgl>, acdo <dbl>, aco <dbl>,
## # acodo <dbl>, acominc <dbl>, acox <dbl>, acoxar <lgl>, acqao <dbl>, ...
Now use d1_4 to create the following plot.
d1_41 <- d1_4 %>%
ggplot(aes(x = fyear,y = pr_margin)) +
geom_line() +theme_bw()+
geom_hline(yintercept = 0, color="red")+
labs(x = "Fiscal Year", y = "Profit Margin" )+
scale_y_continuous(labels = c("-100%", "-50%","0%","50%"))
d1_41 <- d1_41 + facet_wrap(~conm)
print(d1_41)
Tesla is the largest car manufacturer in the world by market value. But what about sales? Let’s compare sales and market value over the 10/11 years period in our sample.
First create a data frame that you can use to create a plot where you can compare sales and market value in the same plot. This requires rearranging the data into “long” form, where we will stack Tesla’s sales and market value on top of each other.
Here is the code to create such a data set. Please read it carefully to understand all the steps.
d1_5 <- d1 %>%
filter(conm == "Tesla Inc") %>%
mutate(mkt_val = prcc_f * cshpri) %>% # Create market value
select(conm, datadate, mkt_val, sale) %>%
pivot_longer(cols = c(mkt_val, sale),
names_to = "fin_var",
values_to = "fin_value")
Print first few rows of d1_5 by using head() function to understand what this data set is.
Now using d1_5, create the following plot using datadate on the X axis:
d1_5
## # A tibble: 24 x 4
## conm datadate fin_var fin_value
## <chr> <date> <chr> <dbl>
## 1 Tesla Inc 2010-12-31 mkt_val 2481.
## 2 Tesla Inc 2010-12-31 sale 117.
## 3 Tesla Inc 2011-12-31 mkt_val 2867.
## 4 Tesla Inc 2011-12-31 sale 204.
## 5 Tesla Inc 2012-12-31 mkt_val 3636.
## 6 Tesla Inc 2012-12-31 sale 413.
## 7 Tesla Inc 2013-12-31 mkt_val 17964.
## 8 Tesla Inc 2013-12-31 sale 2013.
## 9 Tesla Inc 2014-12-31 mkt_val 27699.
## 10 Tesla Inc 2014-12-31 sale 3198.
## # ... with 14 more rows
d1_51 <- d1_5 %>%
mutate(datadate = as.Date(paste(datadate, "12", "31", sep = "-")))%>%
ggplot(aes(datadate,fin_value, color = fin_var)) +
geom_line(size = 1.5) +theme(legend.position = "top")+
labs(x = "Date", y =" " )+
scale_y_continuous(labels = scales::dollar_format(prefix = "$"))+
scale_color_discrete("Financial variable")+
ggtitle("Tesla's Sales and Market Value (Million USD)")+
theme(plot.title = element_text(face = "bold"))
d1_51
When the time variable is discrete, we can also show a time trend using a bar plot. This is quite common in practice. fyear is an integer so we can use it as a discrete variable and create a bar plot of profits for Facebook and IBM as shown below. Manually change the fill of bars using the following colors: c("#5cc9f5", "#b131a2")
d1_6 <- d1%>%filter(conm =="Meta Platforms Inc"| conm =="Microsoft Corp")%>%
ggplot()+
geom_col(aes(x=fyear,y = oibdp,fill = conm))+
facet_wrap(~conm, nrow = 2)+
theme(legend.position = "top", legend.title = element_blank())+
scale_fill_manual(values = alpha(c("#5cc9f5", "#b131a2"), 5))+
scale_x_continuous(breaks = c(2010, 2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022), labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021", "2022"))+
labs(y = "Profits in $million", x = "Fiscal Year")
d1_6
Use Mark Zuckerberg’s cutout to create the following visualization. You are free to position the picture anywhere and in any size you want. Just don’t cover the bars.
d1_7 <- d1%>%filter(conm =="Meta Platforms Inc")%>%
ggplot()+
geom_col(aes(x=fyear,y = oibdp,fill = conm))+
theme(legend.position = "top", legend.title = element_blank())+
scale_fill_manual(values = alpha(c("#5cc9f5"), 5))+
scale_x_continuous(breaks = c(2010, 2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022), labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021", "2022"))+
labs(y = "Profits in $million", x = "Fiscal Year")+
annotation_raster(readPNG("mark-zuckerberg-celebrity-mask.png"),
xmin = 2010, xmax = 2013, ymin = 22500, ymax = 35000, interpolate = T)
d1_7
You don’t have to actually make this plot but it’s fun to make it! I have shared the cutouts of Musk and Cook in case you want to give it a shot.
d1_8 <- d1 %>%
filter(conm == "Tesla Inc"|conm =="Meta Platforms Inc"| conm =="Apple Inc") %>%
mutate(mkt_val = prcc_f * cshpri) %>% # Create market value
mutate(datadate = as.Date(paste(datadate, "12", "31", sep = "-")))%>%
ggplot(aes(datadate,mkt_val, color = conm)) +
geom_line(size = 1.5) +theme(legend.position = "top")+
labs(x = " ", y ="Market Value in $million " )+
scale_y_continuous(labels = scales::dollar_format(prefix = "$"))
#annotation_raster(readPNG("tim_cook_headshot_glasses.jpg"),
# xmin = 2022, xmax = 2023, ymin = 22500, ymax = 35000, interpolate = T)
d1_8