Homework 2 is all about using ggplot2. You will use tech_co_cstat.dta (or .zip) data set you have used previously in Homework 1. You are aware of its structure and the meanings of the variables. Recall that you explored it in Homework 1. Knowing your data set well before you start exploring it is absolutely essential for data science.

d1 <- read_csv("tech_financials.csv", show_col_types = FALSE) %>% 
  filter(sale > 0) %>% 
  mutate(conm = stringr::str_to_title(conm), # Converts the string to title case
         datadate = lubridate::ymd(datadate)) # Convert datadate into a date variable
View(d1)

Q1 (3 points)

Create a bar graph of the average sales of each company using the variable sale such that the bars are arranged in an ascending order. I’m giving you the code to prepare the data set to make the plot:

d1_1 <- d1 %>%
  group_by(conm) %>% 
  summarize(avg_sale = mean(sale), .groups = "drop")
View(d1)

d1_1 has the appropriate variable you need to create the bar plot. Notice that I have used .groups = "drop" inside summarize(). This makes sure that the data you will use in ggplot() is not grouped.

d1_1 <- d1 %>% 
              group_by(conm) %>% 
              summarize(avg_sale = mean(sale),.groups = "drop") %>% 
              ggplot(aes(x = avg_sale, y = reorder(conm,(avg_sale))))+ 
                geom_bar(stat = 'identity')+
            labs(x = "Average Sale in $millions", y = "Company")+
            scale_x_continuous(labels = scales::dollar_format(prefix = "$"))
d1_1

Q2 (1 point)

Modify the plot in Q1 to add text labels to the bars. Note that I used hjust = -0.2 and size = 3 for the text labels.

d1_2 <- d1 %>% 
              group_by(conm) %>% 
              summarize(avg_sale = mean(sale),.groups = "drop") %>% 
              ggplot(aes(x = avg_sale, y = reorder(conm,(avg_sale))))+ 
                geom_bar(stat = 'identity', size = 3)+
            geom_text(aes(label = round(avg_sale/1000), hjust=-0.2)) +
            labs(x = "Average Sale in $millions", y = "Company")+
            scale_x_continuous(labels = scales::dollar_format(prefix = "$"))
            
                
d1_2

Q3 (2 points)

In finance, it is widely believed that companies with more R&D prefer lower debt. Let’s explore whether we can observe this relationship in our data. Using mutate, first create these two variables. (I am giving you the exact formulas to put inside mutate().)

debt_ratio = (dlc + replace_na(dltt, 0)) / at

rnd_int = xrd / sale

Next, create a scatterplot with debt_ratio on the X axis and rnd_int on the Y axis.

d1_3 <- d1 %>% mutate(debtratio = (dlc + replace_na(dltt,0))/at)%>%
                mutate(rnd_int=xrd/sale)%>%
                ggplot(data = ., aes(x = debtratio, y = rnd_int)) +
                geom_point(colour = "black", size = 2, pch = 21, fill = "grey")+
                geom_smooth(method = lm, se = TRUE, fullrange = TRUE) +
                labs(x = "Debt Ratio", y = "R&D to Sales Ratio")
 

            
d1_3

Q4 (2 points)

Profit margin is simply profits divided by sales. Compare profit margins of the following six companies - Apple, Tesla, Meta, Twitter, Amazon, and Qualcomm - over the full sample period. Use fyear on the X axis. fyear is the fiscal year.

Here I give you the code to get the data set in the required form. First, note that I am using the variable tic to filter certain “ticker symbols”, which are the IDs used by stock markets for companies. I am doing this simply to save on typing rather than writing out the entire company names! You could also use gvkey as it is a company identifier. But gvkey are not intuitive. Ticker symbols can help you guess (in most cases) what a stock is.

As you are using fyear for plotting a time series, we have to make sure that fyear is indeed interpreted by ggplot2 as a time variable. However, it’s not that straightforward. This is because fiscal years, unlike calendar years, don’t all end exactly on the same day! I know it sounds insane but that’s true. Think about this like school years in different school districts. If the fiscal year ends in different months, how can we create a valid comparison among these companies? Indeed, the variable datadate, which is the fiscal year end date, is not the same for all the companies for any given fiscal year. Luckily we are dealing with annual data and so we can artificially choose to pick a common year end date for the sake of making the plot. Note that this is not the right thing to do for statistical or financial analysis! This simply helps us in making a meaningful plot. As such, I am setting the year end date for all the fiscal years to December 31st. Below, paste function will create strings in “yyyy-mm-dd” format with mm being 12 and dd being 31 as show below. Next, as.Date() function from base R will convert it into an actual date format!

d1_4 <- d1 %>% 
  filter(tic %in% c("AAPL", "TSLA", "META", "TWTR", "AMZN", "QCOM")) %>% 
  mutate(pr_margin = oibdp / sale,
         fyear = as.Date(paste(fyear, "12", "31", sep = "-")))
d1_4
## # A tibble: 71 x 982
##    gvkey  datadate   fyear      indfmt consol popsrc datafmt tic   cusip   conm 
##    <chr>  <date>     <date>     <chr>  <chr>  <chr>  <chr>   <chr> <chr>   <chr>
##  1 001690 2010-09-30 2010-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  2 001690 2011-09-30 2011-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  3 001690 2012-09-30 2012-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  4 001690 2013-09-30 2013-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  5 001690 2014-09-30 2014-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  6 001690 2015-09-30 2015-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  7 001690 2016-09-30 2016-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  8 001690 2017-09-30 2017-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
##  9 001690 2018-09-30 2018-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
## 10 001690 2019-09-30 2019-12-31 INDL   C      D      STD     AAPL  037833~ Appl~
## # ... with 61 more rows, and 972 more variables: acctchg <chr>, acctstd <chr>,
## #   acqmeth <chr>, adrr <lgl>, ajex <dbl>, ajp <dbl>, bspr <lgl>, compst <chr>,
## #   curcd <chr>, curncd <chr>, currtr <dbl>, curuscn <lgl>, final <chr>,
## #   fyr <dbl>, ismod <dbl>, ltcm <dbl>, ogm <lgl>, pddur <dbl>, scf <dbl>,
## #   src <dbl>, stalt <lgl>, udpl <lgl>, upd <dbl>, apdedate <dbl>, fdate <dbl>,
## #   pdate <dbl>, acchg <dbl>, acco <lgl>, accrt <lgl>, acdo <dbl>, aco <dbl>,
## #   acodo <dbl>, acominc <dbl>, acox <dbl>, acoxar <lgl>, acqao <dbl>, ...

Now use d1_4 to create the following plot.

d1_41 <- d1_4 %>% 
  ggplot(aes(x = fyear,y = pr_margin)) +
  geom_line() +theme_bw()+
  geom_hline(yintercept = 0, color="red")+
  labs(x = "Fiscal Year", y = "Profit Margin" )+
   scale_y_continuous(labels = c("-100%", "-50%","0%","50%"))
d1_41 <- d1_41 + facet_wrap(~conm)




print(d1_41)

Q5 (3 points)

Tesla is the largest car manufacturer in the world by market value. But what about sales? Let’s compare sales and market value over the 10/11 years period in our sample.

First create a data frame that you can use to create a plot where you can compare sales and market value in the same plot. This requires rearranging the data into “long” form, where we will stack Tesla’s sales and market value on top of each other.

Here is the code to create such a data set. Please read it carefully to understand all the steps.

d1_5 <- d1 %>% 
  filter(conm == "Tesla Inc") %>% 
  mutate(mkt_val = prcc_f * cshpri) %>% # Create market value
  select(conm, datadate, mkt_val, sale) %>% 
  pivot_longer(cols = c(mkt_val, sale), 
               names_to = "fin_var", 
               values_to = "fin_value")

Print first few rows of d1_5 by using head() function to understand what this data set is.

Now using d1_5, create the following plot using datadate on the X axis:

d1_5 
## # A tibble: 24 x 4
##    conm      datadate   fin_var fin_value
##    <chr>     <date>     <chr>       <dbl>
##  1 Tesla Inc 2010-12-31 mkt_val     2481.
##  2 Tesla Inc 2010-12-31 sale         117.
##  3 Tesla Inc 2011-12-31 mkt_val     2867.
##  4 Tesla Inc 2011-12-31 sale         204.
##  5 Tesla Inc 2012-12-31 mkt_val     3636.
##  6 Tesla Inc 2012-12-31 sale         413.
##  7 Tesla Inc 2013-12-31 mkt_val    17964.
##  8 Tesla Inc 2013-12-31 sale        2013.
##  9 Tesla Inc 2014-12-31 mkt_val    27699.
## 10 Tesla Inc 2014-12-31 sale        3198.
## # ... with 14 more rows
d1_51 <- d1_5 %>%
  mutate(datadate = as.Date(paste(datadate, "12", "31", sep = "-")))%>%
  ggplot(aes(datadate,fin_value, color = fin_var)) +
  geom_line(size = 1.5) +theme(legend.position = "top")+
  labs(x = "Date", y =" " )+
  scale_y_continuous(labels = scales::dollar_format(prefix = "$"))+
  scale_color_discrete("Financial variable")+
  ggtitle("Tesla's Sales and Market Value (Million USD)")+
  theme(plot.title = element_text(face = "bold"))


d1_51

Q6 (2 points)

When the time variable is discrete, we can also show a time trend using a bar plot. This is quite common in practice. fyear is an integer so we can use it as a discrete variable and create a bar plot of profits for Facebook and IBM as shown below. Manually change the fill of bars using the following colors: c("#5cc9f5", "#b131a2")

d1_6 <- d1%>%filter(conm =="Meta Platforms Inc"| conm =="Microsoft Corp")%>%
            ggplot()+
            geom_col(aes(x=fyear,y = oibdp,fill = conm))+
            facet_wrap(~conm, nrow = 2)+
            theme(legend.position = "top", legend.title = element_blank())+
            scale_fill_manual(values = alpha(c("#5cc9f5", "#b131a2"), 5))+
            scale_x_continuous(breaks = c(2010, 2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022),                 labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021",                                                                                                            "2022"))+
             labs(y = "Profits in $million", x = "Fiscal Year")
  
  
d1_6

Q7 (2 points)

Use Mark Zuckerberg’s cutout to create the following visualization. You are free to position the picture anywhere and in any size you want. Just don’t cover the bars.

d1_7 <- d1%>%filter(conm =="Meta Platforms Inc")%>%
            ggplot()+
            geom_col(aes(x=fyear,y = oibdp,fill = conm))+
            theme(legend.position = "top", legend.title = element_blank())+
            scale_fill_manual(values = alpha(c("#5cc9f5"), 5))+
            scale_x_continuous(breaks = c(2010, 2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022),                 labels=c("2010","2011","2012","2013","2014","2015","2016","2017","2018","2019","2020","2021",                                                                                                            "2022"))+
             labs(y = "Profits in $million", x = "Fiscal Year")+
             annotation_raster(readPNG("mark-zuckerberg-celebrity-mask.png"),
                    xmin = 2010, xmax = 2013, ymin = 22500, ymax = 35000, interpolate = T) 
  
  
  
d1_7

Fun plot! (Not graded)

You don’t have to actually make this plot but it’s fun to make it! I have shared the cutouts of Musk and Cook in case you want to give it a shot.

d1_8 <- d1 %>% 
          filter(conm == "Tesla Inc"|conm =="Meta Platforms Inc"| conm =="Apple Inc") %>% 
          mutate(mkt_val = prcc_f * cshpri) %>% # Create market value
          mutate(datadate = as.Date(paste(datadate, "12", "31", sep = "-")))%>%
          ggplot(aes(datadate,mkt_val, color = conm)) +
          geom_line(size = 1.5) +theme(legend.position = "top")+
          labs(x = " ", y ="Market Value in $million " )+
          scale_y_continuous(labels = scales::dollar_format(prefix = "$"))
          #annotation_raster(readPNG("tim_cook_headshot_glasses.jpg"),
           #         xmin = 2022, xmax = 2023, ymin = 22500, ymax = 35000, interpolate = T) 

d1_8