This homework uses tech_financials.csv which we used in
weeks 2 and 3.
Read tech_financials.csv into your R session using
read.csv() function. Store the resulting object in
d1.
library(tidyverse)
d1 = read.csv("tech_financials.csv")
Finally, before you begin, include only the rows with
sale > 0.
d2 <- filter(d1, sale > 0)
This homework consists of 8 questions. Q1 carries 1 point. Q2 through Q8
carry two points each. We use d2 as the initial input. Your
objective is to reproduce the output shown in the PDF file for Q1
through Q9.
Print a data frame with the medians of cogs,
emp, and xrd.
cogs is Cost of goods sold (USD million)
emp is Number of employees (In thousands) xrd
is R&D expenses (USD million)
d3 = select(.data = d1, cogs, emp, xrd)
summarize(d3,
cogs_median = median(cogs),
emp_median = median(emp, na.rm = TRUE),
xrd_median = median(xrd))
## cogs_median emp_median xrd_median
## 1 9348 34.6 3838.5
Print a data frame with the means of sale,
oibdp, and xrd for Apple, Meta, and Tesla. For
this, you will need to follow these steps:
conmsale, oibdp, and
xrd to get their meansas.data.frame()
function.In Step 3, you may optionally use
across() inside summarize(). This function
will save you writing more code. Read more about it here: https://dplyr.tidyverse.org/reference/across.html
d1a<- d1 %>%
select(conm, sale, oibdp, xrd) %>%
filter(conm == "APPLE INC" |
conm == "META PLATFORMS INC" |
conm == "TESLA INC")
summarize(group_by(d1, conm),
sale_mean = mean(sale),
oibdp_mean = mean(oibdp),
xrd_mean = mean(xrd))
## # A tibble: 10 × 4
## conm sale_mean oibdp_mean xrd_mean
## <chr> <dbl> <dbl> <dbl>
## 1 ADVANCED MICRO DEVICES 6857 761. 1451.
## 2 ALPHABET INC 104849. 33849. 15244
## 3 AMAZON.COM INC 174747. 15560. 19988.
## 4 APPLE INC 210633. 67246. 9910
## 5 META PLATFORMS INC 37313. 18267. 7623
## 6 MICROSOFT CORP 106529. 44964. 14041.
## 7 NETFLIX INC 11400. 1584. 919.
## 8 QUALCOMM INC 22139. 7043. 5017.
## 9 TESLA INC 13346. 1431. 929.
## 10 TWITTER INC 2271. 246. 628.
Round all the numeric variables in the above data frame to 1 decimal
place. Output as a data frame using as.data.frame()
function.
For rounding, you will have to use mutate,
across, and where functions from
dplyr package. Check https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/
for more information.
d1b<- d1 %>%
select(conm, sale, oibdp, xrd) %>%
filter(conm == "APPLE INC" |
conm == "META PLATFORMS INC" |
conm == "TESLA INC")
summarize(group_by(d1, conm),
sale_mean =round( mean(sale),1),
oibdp_mean =round( mean(oibdp),1),
xrd_mean =round( mean(xrd),1))
## # A tibble: 10 × 4
## conm sale_mean oibdp_mean xrd_mean
## <chr> <dbl> <dbl> <dbl>
## 1 ADVANCED MICRO DEVICES 6857 761. 1451.
## 2 ALPHABET INC 104849. 33849. 15244
## 3 AMAZON.COM INC 174747. 15560. 19988.
## 4 APPLE INC 210633. 67246. 9910
## 5 META PLATFORMS INC 37313. 18267. 7623
## 6 MICROSOFT CORP 106529. 44964. 14041.
## 7 NETFLIX INC 11400. 1584 919
## 8 QUALCOMM INC 22139. 7043. 5017.
## 9 TESLA INC 13346. 1431. 929.
## 10 TWITTER INC 2271. 246. 628.
Many advertising values are missing. The missing code in R is
NA. We can get the total number of missing values for
advertising quite easily by running the following function:
sum(is.na(d2$xad))
## [1] 28
In the finance literature, a common (but incorrect) practice is to
assume that the missing advertising is 0. We will use this adjustment to
xad and create a new variable adv and save it
in a new object d3.
The first six values of d3 when xad is NA
are as follows:
d1<- mutate(d1,
adv = 0, xad = NA)
d3 = select(.data = d1, conm, datadate, xad, adv)
head(d3)
## conm datadate xad adv
## 1 ADVANCED MICRO DEVICES 20101231 NA 0
## 2 ADVANCED MICRO DEVICES 20111231 NA 0
## 3 ADVANCED MICRO DEVICES 20121231 NA 0
## 4 ADVANCED MICRO DEVICES 20131231 NA 0
## 5 ADVANCED MICRO DEVICES 20141231 NA 0
## 6 ADVANCED MICRO DEVICES 20151231 NA 0
Using d3, create the following variables and print first
8 rows for Netflix and the new columns along with conm and
datadate:
roa) = oibdp /
atfcf) = oancf /
chestrat_emph) = (adv -
xrd) / atd3<- mutate(d1,
roa = oibdp / at,
fcf = oancf / che,
strat_emph = adv - xrd / at)
d4 <- d3 %>%
select(conm, datadate, roa, fcf, strat_emph) %>%
filter(conm == "NETFLIX INC")
summarize(group_by(d4, conm))
## # A tibble: 1 × 1
## conm
## <chr>
## 1 NETFLIX INC
head(d4,8)
## conm datadate roa fcf strat_emph
## 1 NETFLIX INC 20101231 0.32140984 0.78884491 -0.16631146
## 2 NETFLIX INC 20111231 0.13971574 0.39822966 -0.08439767
## 3 NETFLIX INC 20121231 0.02405838 0.03043132 -0.08291762
## 4 NETFLIX INC 20131231 0.05112569 0.08149833 -0.06997960
## 5 NETFLIX INC 20141231 0.06471568 0.01024746 -0.06693274
## 6 NETFLIX INC 20151231 0.03607896 -0.32433208 -0.06378479
## 7 NETFLIX INC 20161231 0.03218765 -0.85015533 -0.06271601
## 8 NETFLIX INC 20171231 0.04789367 -0.63268781 -0.05537223
You want to know how many profitable years each of the sample company experienced. For this follow these steps:
profit_ind such that when oipbd > 0 this
variable is 1. Otherwise it is 0.profit_ind by taking its sum. Also, get the
total number of observations for each company. d9 <- d3 %>%
select(conm,oibdp) %>%
mutate(profit_ind=ifelse(oibdp>0,1,0)) %>%
group_by(conm) %>%
summarize(profit_years=sum(profit_ind), total_years=n())
Find the average annual stock returns of all the companies. Follow these steps:
conm and
datadate.conmstk_ret by taking the difference
between mkvalt (Market value of equity) and its lag and
then divide the difference by the lag of mkvaltstk_ret_mean.d5<-d1 %>%
arrange(d1,conm, datadate)
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
d10 <- d3 %>%
select(conm, datadate, mkvalt) %>%
group_by(conm) %>%
arrange(datadate, .by_group = T) %>%
mutate(stk_ret=mkvalt/ lag(mkvalt)-1) %>%
summarise(stk_ret_mean=round(mean(stk_ret, na.rm=T) *100,1)) %>%
mutate(stk_ret_mean=label_percent()(stk_ret_mean/100))
d10
## # A tibble: 10 × 2
## conm stk_ret_mean
## <chr> <chr>
## 1 ADVANCED MICRO DEVICES 69.1%
## 2 ALPHABET INC 25.4%
## 3 AMAZON.COM INC 37.2%
## 4 APPLE INC 27.0%
## 5 META PLATFORMS INC 40.0%
## 6 MICROSOFT CORP 21.8%
## 7 NETFLIX INC 58.2%
## 8 QUALCOMM INC 9.1%
## 9 TESLA INC 131.7%
## 10 TWITTER INC 6.0%
In many statistical and machine learning applications, we use scaled
variables instead of the original variables. A scaled variable is
typically created by subtracting the sample mean of the variable from
the variable and dividing it by its standard deviation. There is a
scale() function in base R which can directly do it.
You want to create a scaled variable for sale but
separately for each company. Therefore, you can’t use the mean and
standard deviation of sale for the entire sample. Instead,
you have to calculate these statistics for each company separately and
then create a scaled variable. Follow these steps:
conmsale to get the mean (sale_mean)
and the standard deviation (sale_sd)d3_sumd3 and d3_sum by
conmsale_scaled by subtracting
sale_mean from sale and dividing this
difference by sale_sdPrint the first 10 rows for Twitter with conm,
sale, sale_scaled, sale_mean, and
sale_sd using as.data.frame()
d3_sum = d1 %>%
group_by(conm) %>%
summarize(sale_mean = mean(sale), sale_sd = sd(sale))
d8 = inner_join(d1, d3_sum, by = "conm")
d3_sum<- d8 %>%
mutate(d8,
sale_scaled = sale - sale_mean / sale_sd) %>%
select(conm, sale, sale_scaled, sale_mean, sale_sd) %>%
filter(conm == "TWITTER INC")
head(d3_sum, 10)
## conm sale sale_scaled sale_mean sale_sd
## 1 TWITTER INC 106.313 104.8445 2270.692 1546.258
## 2 TWITTER INC 316.933 315.4645 2270.692 1546.258
## 3 TWITTER INC 664.890 663.4215 2270.692 1546.258
## 4 TWITTER INC 1403.002 1401.5335 2270.692 1546.258
## 5 TWITTER INC 2218.032 2216.5635 2270.692 1546.258
## 6 TWITTER INC 2529.619 2528.1505 2270.692 1546.258
## 7 TWITTER INC 2443.299 2441.8305 2270.692 1546.258
## 8 TWITTER INC 3042.359 3040.8905 2270.692 1546.258
## 9 TWITTER INC 3459.329 3457.8605 2270.692 1546.258
## 10 TWITTER INC 3716.349 3714.8805 2270.692 1546.258