Descriptive Statistics

  1. How many CEOs are in the sample? Hint: each row corresponds to a CEO, so you can use the functions summarize() and n().

  2. How many CEOs have a graduate degree? Hint: you can use the function filter().

  3. What is the percentage of CEOs with a graduate degree? Hint: you can use the functions summarize(), sum() and n().

  4. What is the average CEO salary? Hint: you can use the functions summarize() and mean().

  5. What is the mean CEO salary for those with a graduate degree? Hint: you can use the functions filter(), summarize() and mean().

  6. What is the mean CEO salary for those without a graduate degree? Hint: you can use the same functions.

  7. How many CEOs have/don’t have a college degree? Hint: you can use the functions group_by(), summarize() and n().

  8. How many CEOs have/don’t have a college degree and a graduate degree? Hint: you can use the same functions.

  9. Compute the mean, standard deviation, minimum, maximum and median of salary. Hint: you can use the functions summarize(), mean(), sd(), min(), max() and median().

  10. Compute the mean, standard deviation, minimum, maximum and median of salary for CEOs with/without a college and graduate degree. Hint: you can use the same functions with group_by().


Get started by loading libraries and reading data.

library(tidyverse)

tb.ceosal2 <- read_delim("data/ceosal2.csv", delim= ",") 

# or
tb.ceosal2 <- read_csv("data/ceosal2.csv") 


  1. How many CEOs are in the sample?
tb.ceosal2 %>% summarize(n_ceo = n())
## # A tibble: 1 × 1
##   n_ceo
##   <int>
## 1   177
# or
nrow(tb.ceosal2)
## [1] 177


  1. How many CEOs have a graduate degree?
# Whenever a CEO has a graduate degree the variable grad takes 
# the value of 1, and 0 otherwise. 

# We can count the number of rows where the variable grad takes the value of 1.
tb.ceosal2 %>% filter(grad == 1) %>% summarize(n_ceo = n())
## # A tibble: 1 × 1
##   n_ceo
##   <int>
## 1    94
# Alternatively, due to the binary nature of grad, we can count the number of 
# CEOs who have a graduate degree by summing the variable grad.
tb.ceosal2 %>% summarize(n_ceo = sum(grad))
## # A tibble: 1 × 1
##   n_ceo
##   <dbl>
## 1    94


  1. What is the percentage of CEOs with a graduate degree?
tb.ceosal2 %>% summarize(p_ceo = sum(grad)/n())
## # A tibble: 1 × 1
##   p_ceo
##   <dbl>
## 1 0.531
# another alternative
tb.ceosal2 %>% summarize(p_ceo = mean(grad))
## # A tibble: 1 × 1
##   p_ceo
##   <dbl>
## 1 0.531


  1. What is the average CEO salary?
tb.ceosal2 %>% summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
##   avg_salary
##        <dbl>
## 1       866.


  1. What is the mean CEO salary for those with a graduate degree?
tb.ceosal2 %>% filter(grad == 1) %>% 
  summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
##   avg_salary
##        <dbl>
## 1       864.


  1. What is the mean CEO salary for those without a graduate degree?
tb.ceosal2 %>% 
  filter(grad == 0) %>% 
  summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
##   avg_salary
##        <dbl>
## 1       868.

How can you answer the two previous questions (5 and 6) in one line?

tb.ceosal2 %>% group_by(grad) %>% summarize(avg_salary = mean(salary))
## # A tibble: 2 × 2
##    grad avg_salary
##   <dbl>      <dbl>
## 1     0       868.
## 2     1       864.


  1. How many CEOs have/don’t have a college degree?
tb.ceosal2 %>% group_by(college) %>% summarize(n_ceo = n())
## # A tibble: 2 × 2
##   college n_ceo
##     <dbl> <int>
## 1       0     5
## 2       1   172
# another alternative
tb.ceosal2 %>% select(college) %>% table()
## college
##   0   1 
##   5 172


  1. How many CEOs have/don’t have a college degree AND a graduate degree?
tb.ceosal2 %>% group_by(college, grad) %>% summarize(n_ceo = n())
## `summarise()` has grouped output by 'college'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 3
## # Groups:   college [2]
##   college  grad n_ceo
##     <dbl> <dbl> <int>
## 1       0     0     5
## 2       1     0    78
## 3       1     1    94


  1. Compute the mean, standard deviation, minimum, maximum and median of salary.
tb.ceosal2 %>% summarize(mean_salary = mean(salary),
                      sd_salary = sd(salary),
                      min_salary = min(salary), 
                      max_salary = max(salary),
                      median_salary = median(salary))
## # A tibble: 1 × 5
##   mean_salary sd_salary min_salary max_salary median_salary
##         <dbl>     <dbl>      <dbl>      <dbl>         <dbl>
## 1        866.      588.        100       5299           707


  1. Compute the mean, standard deviation, minimum, maximum and median of salary for CEOs with/without a college and graduate degree.
tb.ceosal2 %>%
  group_by (grad,college) %>% 
  summarize(mean_salary = mean(salary),
            sd_salary = sd(salary), 
            min_salary = min(salary), 
            max_salary = max(salary),
            median_salary = median(salary))
## # A tibble: 3 × 7
## # Groups:   grad [2]
##    grad college mean_salary sd_salary min_salary max_salary median_salary
##   <dbl>   <dbl>       <dbl>     <dbl>      <dbl>      <dbl>         <dbl>
## 1     0       0       1096.      633.        300       1738         1143 
## 2     0       1        853.      679.        174       5299          708.
## 3     1       1        864.      501.        100       2265          706.