Reading the csv file
library(readr)
d1 <- read_csv("Week03/Week03/tech_financials.csv")
## Rows: 120 Columns: 981
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (35): gvkey, indfmt, consol, popsrc, datafmt, tic, cusip, conm, acctchg...
## dbl (448): datadate, fyear, ajex, ajp, currtr, fyr, ismod, ltcm, pddur, scf,...
## lgl (498): adrr, bspr, curuscn, ogm, stalt, udpl, acco, accrt, acoxar, acqln...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Filtering rows with sale>0
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.5 v dplyr 1.0.7
## v tibble 3.1.6 v stringr 1.4.0
## v tidyr 1.2.0 v forcats 0.5.1
## v purrr 0.3.4
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
d2<- filter(d1,sale>0)
Print a data frame with the medians of cogs, emp, and xrd.
cogs is Cost of goods sold (USD million) emp is Number of employees (In thousands) xrd is R&D expenses (USD million)
summarize(d2,
cogs_median = median(cogs),
emp_median = median(emp, na.rm = TRUE),
xrd_median = median(xrd))
## # A tibble: 1 x 3
## cogs_median emp_median xrd_median
## <dbl> <dbl> <dbl>
## 1 9348 34.6 3838.
Print a data frame with the means of sale, oibdp, and xrd for Apple, Meta, and Tesla. For this, you will need to follow these steps:
conmsale, oibdp, and xrd to get their meansas.data.frame() function.In Step 3, you may optionally use across() inside summarize(). This function will save you writing more code. Read more about it here: https://dplyr.tidyverse.org/reference/across.html
Step1: Using filter function to filter only the observations pertaining to Apple, Meta, and Tesla
d3 <-filter(d2,
conm == "APPLE INC"|
conm == "META PLATFORMS INC"|
conm == "TESLA INC")
Step2: Using groupby conm and step3: summarizing sale, oibdp,xrd Step4: the output is converted to dataframe in d5
d4 <- summarize(group_by(d3, conm),
sale_mean = mean(sale),
xrd_mean = mean(xrd),
oibdp_mean = mean(oibdp))
d5 <- as.data.frame(d4)
d5
## conm sale_mean xrd_mean oibdp_mean
## 1 APPLE INC 210632.67 9910.0000 67245.75
## 2 META PLATFORMS INC 37313.33 7623.0000 18266.75
## 3 TESLA INC 13345.77 929.2205 1430.59
Round all the numeric variables in the above data frame to 1 decimal place. Output as a data frame using as.data.frame() function.
For rounding, you will have to use mutate, across, and where functions from dplyr package. Check https://www.tidyverse.org/blog/2020/04/dplyr-1-0-0-colwise/ for more information.
d7 <- d3 %>%
select(.data = .,
conm,sale,oibdp,xrd) %>%
group_by(conm) %>%
summarize(
round(across(where(is.numeric), mean, na.rm = TRUE),1),
across(where(is.character),nlevels),
) %>% as.data.frame()
d7
## conm sale oibdp xrd
## 1 APPLE INC 210632.7 67245.8 9910.0
## 2 META PLATFORMS INC 37313.3 18266.8 7623.0
## 3 TESLA INC 13345.8 1430.6 929.2
Many advertising values are missing. The missing code in R is NA. We can get the total number of missing values for advertising quite easily by running the following function:
sum(is.na(d2$xad))
## [1] 28
In the finance literature, a common (but incorrect) practice is to assume that the missing advertising is 0. We will use this adjustment to xad and create a new variable adv and save it in a new object d3.
The new variable ‘adv’ is saved in d31 instead of d3. But the results are correct.
d31 <- d2 %>%
select(.data = .,
conm,datadate,xad) %>%
mutate(.data = .,adv = ifelse(is.na(xad),0,xad)
)
head(d31)
## # A tibble: 6 x 4
## conm datadate xad adv
## <chr> <dbl> <dbl> <dbl>
## 1 ADVANCED MICRO DEVICES 20101231 NA 0
## 2 ADVANCED MICRO DEVICES 20111231 NA 0
## 3 ADVANCED MICRO DEVICES 20121231 NA 0
## 4 ADVANCED MICRO DEVICES 20131231 NA 0
## 5 ADVANCED MICRO DEVICES 20141231 NA 0
## 6 ADVANCED MICRO DEVICES 20151231 NA 0
Using d3, create the following variables and print first 8 rows for Netflix and the new columns along with conm and datadate:
roa) = oibdp / atfcf) = oancf / chestrat_emph) = (adv - xrd) / atd32 <- d2 %>%
select(.data = .,conm,datadate,xad,oibdp,oancf,at,che,xrd) %>%
filter(.data = ., conm == "NETFLIX INC") %>%
mutate(.data = .,adv = ifelse(is.na(xad),0,xad)) %>%
mutate(.data=.,roa = oibdp/at) %>%
mutate(.data =.,fcf=oancf/che) %>%
mutate(.data =.,strat_emph =(adv-xrd)/at) %>%
select(.data = .,conm,datadate,roa,fcf,strat_emph)
d32
## # A tibble: 12 x 5
## conm datadate roa fcf strat_emph
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 NETFLIX INC 20101231 0.321 0.789 0.0500
## 2 NETFLIX INC 20111231 0.140 0.398 0.0131
## 3 NETFLIX INC 20121231 0.0241 0.0304 0.0121
## 4 NETFLIX INC 20131231 0.0511 0.0815 0.0109
## 5 NETFLIX INC 20141231 0.0647 0.0102 0.00861
## 6 NETFLIX INC 20151231 0.0361 -0.324 0.00622
## 7 NETFLIX INC 20161231 0.0322 -0.850 -0.000714
## 8 NETFLIX INC 20171231 0.0479 -0.633 0.00202
## 9 NETFLIX INC 20181231 0.0650 -0.706 0.0226
## 10 NETFLIX INC 20191231 0.0797 -0.575 0.00983
## 11 NETFLIX INC 20201231 0.125 0.296 -0.00974
## 12 NETFLIX INC 20211231 0.144 0.0651 -0.0136
You want to know how many profitable years each of the sample company experienced. For this follow these steps:
profit_ind such that when oibdp > 0 this variable is 1. Otherwise it is 0.profit_ind by taking its sum. Also, get the total number of observations for each company.d9 <- d2 %>%
select(.data = .,conm,sale,oibdp,xrd,fyear) %>%
mutate(.data = .,profit_ind = ifelse(oibdp>0,1,0)) %>%
group_by(conm) %>%
summarize(.data=.,profit_years = sum(profit_ind))
d10 <- as.data.frame(d9)
d101 <- d2 %>%
select(.data = .,conm,sale,oibdp,xrd,fyear) %>%
mutate(.data = .,year_ind = ifelse(fyear>0,1,0)) %>%
group_by(conm) %>%
summarize(.data=.,total_years = sum(year_ind))
d102 <- merge(d10,d101)
d103 <- as.data.frame(d102)
d103
## conm profit_years total_years
## 1 ADVANCED MICRO DEVICES 10 12
## 2 ALPHABET INC 12 12
## 3 AMAZON.COM INC 12 12
## 4 APPLE INC 12 12
## 5 META PLATFORMS INC 12 12
## 6 MICROSOFT CORP 13 13
## 7 NETFLIX INC 12 12
## 8 QUALCOMM INC 12 12
## 9 TESLA INC 8 12
## 10 TWITTER INC 6 11
Find the average annual stock returns of all the companies. Follow these steps:
conm and datadate.conmstk_ret by taking the difference between mkvalt (Market value of equity) and its lag and then divide the difference by the lag of mkvaltstk_ret_mean.library(tidyverse)
d11<- d2 %>%
select(.data = .,conm,datadate,mkvalt) %>%
arrange(conm,datadate) %>%
group_by(conm) %>%
mutate(.data=.,pre_mkvalt = lag(mkvalt))%>%
mutate(.data=.,stk_ret = ((mkvalt-pre_mkvalt)/pre_mkvalt)) %>%
summarize(.data=.,stk_ret_avg = mean(stk_ret,na.rm = TRUE),stk_ret_avg=scales::percent(stk_ret_avg,accuracy=0.1))
d11
## # A tibble: 10 x 2
## conm stk_ret_avg
## <chr> <chr>
## 1 ADVANCED MICRO DEVICES 69.1%
## 2 ALPHABET INC 25.4%
## 3 AMAZON.COM INC 37.2%
## 4 APPLE INC 27.0%
## 5 META PLATFORMS INC 40.0%
## 6 MICROSOFT CORP 21.8%
## 7 NETFLIX INC 58.2%
## 8 QUALCOMM INC 9.1%
## 9 TESLA INC 131.7%
## 10 TWITTER INC 6.0%
In many statistical and machine learning applications, we use scaled variables instead of the original variables. A scaled variable is typically created by subtracting the sample mean of the variable from the variable and dividing it by its standard deviation. There is a scale() function in base R which can directly do it.
You want to create a scaled variable for sale but separately for each company. Therefore, you can’t use the mean and standard deviation of sale for the entire sample. Instead, you have to calculate these statistics for each company separately and then create a scaled variable. Follow these steps:
conmsale to get the mean (sale_mean) and the standard deviation (sale_sd)d3_sumd3 and d3_sum by conmsale_scaled by subtracting sale_mean from sale and dividing this difference by sale_sdPrint the first 10 rows for Twitter with conm, sale, sale_scaled, sale_mean, and sale_sd using `as.data.frame()
d3_sum <- d2 %>%
select(.data = .,conm,datadate,sale) %>%
filter(.data = ., conm == "TWITTER INC") %>%
summarize(.data = .,conm = "TWITTER INC",sale_mean=mean(sale),sale_sd=sd(sale))
d33 <- d2 %>%
filter(.data = ., conm == "TWITTER INC")%>%
select(.data = .,conm,sale)
d34 = inner_join(d33, d3_sum, by = "conm")
d34 %>% mutate(.data = .,sale_scaled=((sale-sale_mean)/sale_sd))
## # A tibble: 11 x 5
## conm sale sale_mean sale_sd sale_scaled
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 TWITTER INC 106. 2271. 1546. -1.40
## 2 TWITTER INC 317. 2271. 1546. -1.26
## 3 TWITTER INC 665. 2271. 1546. -1.04
## 4 TWITTER INC 1403. 2271. 1546. -0.561
## 5 TWITTER INC 2218. 2271. 1546. -0.0341
## 6 TWITTER INC 2530. 2271. 1546. 0.167
## 7 TWITTER INC 2443. 2271. 1546. 0.112
## 8 TWITTER INC 3042. 2271. 1546. 0.499
## 9 TWITTER INC 3459. 2271. 1546. 0.769
## 10 TWITTER INC 3716. 2271. 1546. 0.935
## 11 TWITTER INC 5077. 2271. 1546. 1.82