This homework uses tech_financials.csv which we used in weeks 2 and 3.

Read tech_financials.csv into your R session using read.csv() function. Store the resulting object in d1.

library(tidyverse)

Load files

d1 = read.csv("tech_financials.csv")

Finally, before you begin, include only the rows with sale > 0.

d2 <- filter(d1, sale > 0)

This homework consists of 8 questions. Q1 carries 1 point. Q2 through Q8 carry two points each. We use d2 as the initial input. Your objective is to reproduce the output shown in the PDF file for Q1 through Q9.

Q1

Print a data frame with the medians of cogs, emp, and xrd.

cogs is Cost of goods sold (USD million) emp is Number of employees (In thousands) xrd is R&D expenses (USD million)

d3 = select(.data = d1, cogs, emp, xrd)
summarize(d3, 
          cogs_median = median(cogs),
          emp_median = median(emp, na.rm = TRUE),
          xrd_median = median(xrd))
##   cogs_median emp_median xrd_median
## 1        9348       34.6     3838.5

Q2

Print a data frame with the means of sale, oibdp, and xrd for Apple, Meta, and Tesla. For this, you will need to follow these steps:

  1. Filter only the observations pertaining to Apple, Meta, and Tesla
  2. Group by conm
  3. Summarize sale, oibdp, and xrd to get their means
  4. Output it as a data frame by using as.data.frame() function.

In Step 3, you may optionally use across() inside summarize(). This function will save you writing more code. Read more about it here: https://dplyr.tidyverse.org/reference/across.html

d1a<- d1 %>% 
  select(conm, sale, oibdp, xrd) %>% 
filter(conm == "APPLE INC" |
       conm == "META PLATFORMS INC" |
         conm == "TESLA INC")
summarize(group_by(d1, conm),
          sale_mean = mean(sale),
          oibdp_mean = mean(oibdp),
          xrd_mean = mean(xrd))
## # A tibble: 10 × 4
##    conm                   sale_mean oibdp_mean xrd_mean
##    <chr>                      <dbl>      <dbl>    <dbl>
##  1 ADVANCED MICRO DEVICES     6857        761.    1451.
##  2 ALPHABET INC             104849.     33849.   15244 
##  3 AMAZON.COM INC           174747.     15560.   19988.
##  4 APPLE INC                210633.     67246.    9910 
##  5 META PLATFORMS INC        37313.     18267.    7623 
##  6 MICROSOFT CORP           106529.     44964.   14041.
##  7 NETFLIX INC               11400.      1584.     919.
##  8 QUALCOMM INC              22139.      7043.    5017.
##  9 TESLA INC                 13346.      1431.     929.
## 10 TWITTER INC                2271.       246.     628.

Q3

Round all the numeric variables in the above data frame to 1 decimal place. Output as a data frame using as.data.frame() function.

For rounding, you will have to use mutate, across, and where functions from dplyr package. Check https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/ for more information.

d1b<- d1 %>%
   select(conm, sale, oibdp, xrd) %>% 
filter(conm == "APPLE INC" |
       conm == "META PLATFORMS INC" |
         conm == "TESLA INC")
summarize(group_by(d1, conm),
          sale_mean =round( mean(sale),1),
          oibdp_mean =round( mean(oibdp),1),
          xrd_mean =round( mean(xrd),1))
## # A tibble: 10 × 4
##    conm                   sale_mean oibdp_mean xrd_mean
##    <chr>                      <dbl>      <dbl>    <dbl>
##  1 ADVANCED MICRO DEVICES     6857        761.    1451.
##  2 ALPHABET INC             104849.     33849.   15244 
##  3 AMAZON.COM INC           174747.     15560.   19988.
##  4 APPLE INC                210633.     67246.    9910 
##  5 META PLATFORMS INC        37313.     18267.    7623 
##  6 MICROSOFT CORP           106529.     44964.   14041.
##  7 NETFLIX INC               11400.      1584      919 
##  8 QUALCOMM INC              22139.      7043.    5017.
##  9 TESLA INC                 13346.      1431.     929.
## 10 TWITTER INC                2271.       246.     628.

Q4

Many advertising values are missing. The missing code in R is NA. We can get the total number of missing values for advertising quite easily by running the following function:

sum(is.na(d2$xad))
## [1] 28

In the finance literature, a common (but incorrect) practice is to assume that the missing advertising is 0. We will use this adjustment to xad and create a new variable adv and save it in a new object d3.

The first six values of d3 when xad is NA are as follows:

d1<- mutate(d1, 
            adv = 0, xad = NA)
d3 = select(.data = d1, conm, datadate, xad, adv)
head(d3)
##                     conm datadate xad adv
## 1 ADVANCED MICRO DEVICES 20101231  NA   0
## 2 ADVANCED MICRO DEVICES 20111231  NA   0
## 3 ADVANCED MICRO DEVICES 20121231  NA   0
## 4 ADVANCED MICRO DEVICES 20131231  NA   0
## 5 ADVANCED MICRO DEVICES 20141231  NA   0
## 6 ADVANCED MICRO DEVICES 20151231  NA   0

Q5

Using d3, create the following variables and print first 8 rows for Netflix and the new columns along with conm and datadate:

  1. Return on assets (roa) = oibdp / at
  2. Free cash flow (fcf) = oancf / che
  3. Strategic emphasis (strat_emph) = (adv - xrd) / at
d3<- mutate(d1,
       roa = oibdp / at,
       fcf = oancf / che,
       strat_emph = adv - xrd / at) 
d4 <- d3 %>% 
  select(conm, datadate, roa, fcf, strat_emph) %>% 
filter(conm == "NETFLIX INC")
       summarize(group_by(d4, conm))
## # A tibble: 1 × 1
##   conm       
##   <chr>      
## 1 NETFLIX INC
head(d4,8)
##          conm datadate        roa         fcf  strat_emph
## 1 NETFLIX INC 20101231 0.32140984  0.78884491 -0.16631146
## 2 NETFLIX INC 20111231 0.13971574  0.39822966 -0.08439767
## 3 NETFLIX INC 20121231 0.02405838  0.03043132 -0.08291762
## 4 NETFLIX INC 20131231 0.05112569  0.08149833 -0.06997960
## 5 NETFLIX INC 20141231 0.06471568  0.01024746 -0.06693274
## 6 NETFLIX INC 20151231 0.03607896 -0.32433208 -0.06378479
## 7 NETFLIX INC 20161231 0.03218765 -0.85015533 -0.06271601
## 8 NETFLIX INC 20171231 0.04789367 -0.63268781 -0.05537223

Q6

You want to know how many profitable years each of the sample company experienced. For this follow these steps:

  1. Create an indicator variable (dummy variable) called profit_ind such that when oipbd > 0 this variable is 1. Otherwise it is 0.
  2. Group by company names
  3. Summarize profit_ind by taking its sum. Also, get the total number of observations for each company.
 d9 <- d3 %>% 
  select(conm,oibdp) %>% 
mutate(profit_ind=ifelse(oibdp>0,1,0)) %>% 
  group_by(conm) %>% 
       summarize(profit_years=sum(profit_ind), total_years=n())

Q7

Find the average annual stock returns of all the companies. Follow these steps:

  1. Arrange the data set by conm and datadate.
  2. Group by conm
  3. Calculate stock return stk_ret by taking the difference between mkvalt (Market value of equity) and its lag and then divide the difference by the lag of mkvalt
  4. Summarize to get the mean of the stock returns stk_ret_mean.
  5. Display the average stock returns in percentage format.
d5<-d1 %>% 
  arrange(d1,conm, datadate) 
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
d10 <- d3 %>%
  select(conm, datadate, mkvalt) %>% 
  group_by(conm) %>% 
  arrange(datadate, .by_group = T) %>% 
  mutate(stk_ret=mkvalt/ lag(mkvalt)-1) %>% 
  summarise(stk_ret_mean=round(mean(stk_ret, na.rm=T) *100,1)) %>%
  mutate(stk_ret_mean=label_percent()(stk_ret_mean/100))

d10
## # A tibble: 10 × 2
##    conm                   stk_ret_mean
##    <chr>                  <chr>       
##  1 ADVANCED MICRO DEVICES 69.1%       
##  2 ALPHABET INC           25.4%       
##  3 AMAZON.COM INC         37.2%       
##  4 APPLE INC              27.0%       
##  5 META PLATFORMS INC     40.0%       
##  6 MICROSOFT CORP         21.8%       
##  7 NETFLIX INC            58.2%       
##  8 QUALCOMM INC           9.1%        
##  9 TESLA INC              131.7%      
## 10 TWITTER INC            6.0%

Q8

In many statistical and machine learning applications, we use scaled variables instead of the original variables. A scaled variable is typically created by subtracting the sample mean of the variable from the variable and dividing it by its standard deviation. There is a scale() function in base R which can directly do it.

You want to create a scaled variable for sale but separately for each company. Therefore, you can’t use the mean and standard deviation of sale for the entire sample. Instead, you have to calculate these statistics for each company separately and then create a scaled variable. Follow these steps:

  1. Group by conm
  2. Summarize sale to get the mean (sale_mean) and the standard deviation (sale_sd)
  3. Assign this data frame to d3_sum
  4. Join d3 and d3_sum by conm
  5. Create sale_scaled by subtracting sale_mean from sale and dividing this difference by sale_sd

Print the first 10 rows for Twitter with conm, sale, sale_scaled, sale_mean, and sale_sd using as.data.frame()

d3_sum = d1 %>% 
  group_by(conm) %>% 
  summarize(sale_mean = mean(sale), sale_sd = sd(sale))
d8 = inner_join(d1, d3_sum, by = "conm")
d3_sum<- d8 %>% 
  mutate(d8,
            sale_scaled = sale - sale_mean / sale_sd) %>%
  select(conm, sale, sale_scaled, sale_mean, sale_sd) %>% 
filter(conm == "TWITTER INC")
head(d3_sum, 10)
##           conm     sale sale_scaled sale_mean  sale_sd
## 1  TWITTER INC  106.313    104.8445  2270.692 1546.258
## 2  TWITTER INC  316.933    315.4645  2270.692 1546.258
## 3  TWITTER INC  664.890    663.4215  2270.692 1546.258
## 4  TWITTER INC 1403.002   1401.5335  2270.692 1546.258
## 5  TWITTER INC 2218.032   2216.5635  2270.692 1546.258
## 6  TWITTER INC 2529.619   2528.1505  2270.692 1546.258
## 7  TWITTER INC 2443.299   2441.8305  2270.692 1546.258
## 8  TWITTER INC 3042.359   3040.8905  2270.692 1546.258
## 9  TWITTER INC 3459.329   3457.8605  2270.692 1546.258
## 10 TWITTER INC 3716.349   3714.8805  2270.692 1546.258