Data Visualization with R

Initalizing

# Read data
d1 <- read_dta("tech_co_cstat_dta.zip")

#psych::describe(d1)
#glimpse(d1)
#names(d1)
#head(d1)
#attributes(d1$tic)


# put data for Sale>0 into d2
d2 <- filter(d1,sale>0)

Q1

Print a data frame with the medians of cogs, emp, and xrd.

Answer:

d2 %>% 
  select(cogs,emp,xrd) %>% 
summarize(cogs=median(cogs,na.rm = TRUE),emp=median(emp,na.rm = TRUE),xrd=median(xrd,na.rm = TRUE))

## # A tibble: 1 x 3
##    cogs   emp   xrd
##   <dbl> <dbl> <dbl>
## 1  9162  36.3  4475

Q2

Print a data frame with the means of sale, oibdp, and xrd for Apple, Facebook, and Tesla. For this, you will need to follow these steps:
1. Filter only the observations pertaining to Apple, Facebook, and Tesla
2. Group by conm
3. Summarize across sale, oibdp, and xrd to get their means
4. Output it as data frame by using as.data.frame() function.

Answer:

d2 %>% 
  select(conm,sale,oibdp,xrd) %>% 
  group_by(conm) %>% 
  summarize(across(c(sale,oibdp,xrd),mean)) %>% 
  filter(conm=="APPLE INC" | conm=="FACEBOOK INC" | conm=="TESLA INC") %>% 
  as.data.frame()

##           conm       sale      oibdp       xrd
## 1    APPLE INC 196525.000 62428.7273 8818.7273
## 2 FACEBOOK INC  29984.636 14952.8182 6074.6364
## 3    TESLA INC   9666.025   705.4621  777.9678

Q3

Round all the numeric variables in the above data frame to 1 decimal place. Output as a data frame using as.data.frame() function.

Answer:

d2 %>% 
  select(conm,sale,oibdp,xrd) %>% 
  group_by(conm) %>% 
  summarize(across(c(sale,oibdp,xrd),mean)) %>% 
  filter(conm=="APPLE INC" | conm=="FACEBOOK INC" | conm=="TESLA INC") %>% 
  mutate(across(where(is.numeric),round,1)) %>% 
  as.data.frame()

##           conm     sale   oibdp    xrd
## 1    APPLE INC 196525.0 62428.7 8818.7
## 2 FACEBOOK INC  29984.6 14952.8 6074.6
## 3    TESLA INC   9666.0   705.5  778.0

Q4

In the finance literature, a common (but incorrect) practice is to assume that the missing advertising is 0. We will use this adjustment to xad and create a new variable adv and save it in a new object d3.My own research shows that this is highly misleading. However, my solution to this issue is complex and requires application of machine learning.

The first six values of d3 when xad is NA are as follows:

Answer:

# Make d3 
d3 <- d2 %>% 
  mutate(adv=ifelse(is.na(xad),0,xad)) %>% 
  as.data.frame()

# Make output report 
d3 %>% 
  select(conm,datadate,xad,adv) %>% 
  filter(is.na(xad)==TRUE) %>% 
  as.data.frame() %>% 
  head()

##          conm   datadate xad adv
## 1   APPLE INC 2016-09-30  NA   0
## 2   APPLE INC 2017-09-30  NA   0
## 3   APPLE INC 2018-09-30  NA   0
## 4   APPLE INC 2019-09-30  NA   0
## 5   APPLE INC 2020-09-30  NA   0
## 6 TWITTER INC 2011-12-31  NA   0

Q5

Using d3, create the following variables and print first 8 rows for Netflix and the new columns along with conm and datadate:
1. Return on assets (roa) = oibdp / at
2. Free cash flow (fcf) = oancf / che
3. Strategic emphasis (strat_emph) = (adv - xrd) / at

Answer:

# make new columns in d3
f=d3 %>% 
  mutate(roa=oibdp/at,fcf=oancf/che,strat_emph=(adv-xrd)/at) %>% 
  arrange(conm,datadate) %>% 
  filter(conm=='NETFLIX INC') %>% 
  select(conm,datadate,roa,fcf,strat_emph) %>% 
  mutate(across(roa,signif,3),across(roa,as.numeric)) %>%
  as.data.frame() %>% 
  head(n=8)
  

d3 %>% 
  mutate(roa=oibdp/at,fcf=oancf/che,strat_emph=(adv-xrd)/at) %>% 
  arrange(conm,datadate) %>% 
  filter(conm=='NETFLIX INC') %>% 
  select(conm,datadate,roa,fcf,strat_emph) %>% 
  mutate(across(roa,round,4),across(fcf,round,3),across(strat_emph,round,4)) %>%
  as.data.frame() %>% 
  head(n=8)

##          conm   datadate    roa    fcf strat_emph
## 1 NETFLIX INC 2010-12-31 0.3214  0.789     0.0500
## 2 NETFLIX INC 2011-12-31 0.1397  0.398     0.0131
## 3 NETFLIX INC 2012-12-31 0.0241  0.030     0.0121
## 4 NETFLIX INC 2013-12-31 0.0511  0.081     0.0109
## 5 NETFLIX INC 2014-12-31 0.0647  0.010     0.0086
## 6 NETFLIX INC 2015-12-31 0.0361 -0.324     0.0062
## 7 NETFLIX INC 2016-12-31 0.0322 -0.850    -0.0007
## 8 NETFLIX INC 2017-12-31 0.0479 -0.633     0.0020

Q6

You want to know how many profitable years each of the sample company experienced. For this follow these steps:
1. Create an indicator variable (dummy variable) called profit_ind such that when oibdp > 0 this variable is 1. Otherwise it is 0.
2. Group by company names
3. Summarize profit_ind by taking its sum. Also, get the total number of observations for each company.

Answer:

d3 %>% 
  mutate(profit_ind=ifelse(oibdp>0,1,0)) %>% 
  group_by(conm) %>% 
  summarize(profit_years=sum(profit_ind),total_years=NROW(conm)) %>% 
  as.data.frame()

##                           conm profit_years total_years
## 1               AMAZON.COM INC           11          11
## 2                    APPLE INC           11          11
## 3                 FACEBOOK INC           11          11
## 4  INTL BUSINESS MACHINES CORP           11          11
## 5               MICROSOFT CORP           12          12
## 6                  NETFLIX INC           11          11
## 7          PAYPAL HOLDINGS INC            8           8
## 8                 QUALCOMM INC           11          11
## 9                    TESLA INC            7          11
## 10                 TWITTER INC            5          10

Q7

Find the average annual stock returns of all the companies. Follow these steps:
1. Arrange the data set by conm and datadate.
2. Group by conm
3. Calculate stock return stk_ret by taking the difference between prcc_f and its lag and then divide the difference by the lag of prcc_f
4. Summarize to get the mean of the stock returns stk_ret_mean.
5. Display the average stock returns in percentage format.

Answer:

d3 %>% 
  arrange(conm,datadate) %>% 
  group_by(conm) %>% 
  mutate(stk_ret=(prcc_f-lag(prcc_f))/lag(prcc_f)) %>% 
  select(conm, stk_ret) %>%
  #mutate(stk_ret=ifelse(is.na(stk_ret)==TRUE,0,stk_ret)) %>%
  summarize(stk_ret_mean=mean(stk_ret,na.rm=TRUE)*100) %>% 
  mutate(across(stk_ret_mean,round,3)) %>% 
  mutate(across(stk_ret_mean,paste0,'%')) %>% 
  as.data.frame()

##                           conm stk_ret_mean
## 1               AMAZON.COM INC      39.028%
## 2                    APPLE INC       4.763%
## 3                 FACEBOOK INC      38.682%
## 4  INTL BUSINESS MACHINES CORP      -0.234%
## 5               MICROSOFT CORP      25.883%
## 6                  NETFLIX INC       38.77%
## 7          PAYPAL HOLDINGS INC      50.985%
## 8                 QUALCOMM INC      12.919%
## 9                    TESLA INC      56.175%
## 10                 TWITTER INC        5.54%

Q8

In many statistical and machine learning applications, we use scaled variables instead of the original variables. A scaled variable is typically created by subtracting the sample mean of the variable from the variable and dividing it by its standard deviation. There is a scale() function in base R which can directly do it.

You want to create a scaled variable for sale but separately for each company. Therefore, you can’t use the mean and standard deviation of sale for the entire sample. Instead, you have to calculate these statistics for each company separately and then create a scaled variable. Follow these steps:

Group by conm
Summarize sale to get the mean (sale_mean) and the standard deviation (sale_sd)
Assign this data frame to d3_sum
Join d3 and d3_sum by conm
Create sale_scaled by subtracting sale_mean from sale and dividing this difference by sale_sd
Print the first 10 rows for Twitter with conm, sale, sale_scaled, sale_mean, and sale_sd using as.data.frame()

Answer:

d3_sum <- d3 %>% 
  group_by(conm) %>%
  summarize(sale_mean=mean(sale),sale_sd=sd(sale)) %>% 
  as.data.frame()

merge(x=d3,y=d3_sum,by="conm") %>% 
  mutate(sale_scaled=(sale-sale_mean)/sale_sd) %>% 
  filter(conm=="TWITTER INC") %>% 
  select(conm,sale,sale_scaled,sale_mean,sale_sd) %>% 
  as.data.frame() %>% 
  print()

##           conm     sale sale_scaled sale_mean sale_sd
## 1  TWITTER INC  106.313  -1.4474185  1990.013 1301.42
## 2  TWITTER INC  316.933  -1.2855799  1990.013 1301.42
## 3  TWITTER INC  664.890  -1.0182127  1990.013 1301.42
## 4  TWITTER INC 1403.002  -0.4510538  1990.013 1301.42
## 5  TWITTER INC 2218.032   0.1752082  1990.013 1301.42
## 6  TWITTER INC 2529.619   0.4146290  1990.013 1301.42
## 7  TWITTER INC 2443.299   0.3483014  1990.013 1301.42
## 8  TWITTER INC 3042.359   0.8086140  1990.013 1301.42
## 9  TWITTER INC 3459.329   1.1290102  1990.013 1301.42
## 10 TWITTER INC 3716.349   1.3265021  1990.013 1301.42

End of report

Data Visualization with R

Arash Hatamirad, ahatamirad@gmail.com

9/9/2021

Initalizing

Q1

Answer:

Q2

Answer:

Q3

Answer:

Q4

Answer:

Q5

Answer:

Q6

Answer:

Q7

Answer:

Q8

Answer: