Descriptive Statistics
How many CEOs are in the sample? Hint: each row corresponds to a
CEO, so you can use the functions summarize()
and
n()
.
How many CEOs have a graduate degree? Hint: you can use the
function filter()
.
What is the percentage of CEOs with a graduate degree? Hint: you
can use the functions summarize()
, sum()
and
n()
.
What is the average CEO salary? Hint: you can use the functions
summarize()
and mean()
.
What is the mean CEO salary for those with a graduate degree?
Hint: you can use the functions filter()
,
summarize()
and mean()
.
What is the mean CEO salary for those without a graduate degree? Hint: you can use the same functions.
How many CEOs have/don’t have a college degree? Hint: you can use
the functions group_by()
, summarize()
and
n()
.
How many CEOs have/don’t have a college degree and a graduate degree? Hint: you can use the same functions.
Compute the mean, standard deviation, minimum, maximum and median
of salary. Hint: you can use the functions summarize()
,
mean()
, sd()
, min()
,
max()
and median()
.
Compute the mean, standard deviation, minimum, maximum and median
of salary for CEOs with/without a college and graduate degree. Hint: you
can use the same functions with group_by()
.
Get started by loading libraries and reading data.
library(tidyverse)
tb.ceosal2 <- read_delim("data/ceosal2.csv", delim= ",")
# or
tb.ceosal2 <- read_csv("data/ceosal2.csv")
tb.ceosal2 %>% summarize(n_ceo = n())
## # A tibble: 1 × 1
## n_ceo
## <int>
## 1 177
# or
nrow(tb.ceosal2)
## [1] 177
# Whenever a CEO has a graduate degree the variable grad takes
# the value of 1, and 0 otherwise.
# We can count the number of rows where the variable grad takes the value of 1.
tb.ceosal2 %>% filter(grad == 1) %>% summarize(n_ceo = n())
## # A tibble: 1 × 1
## n_ceo
## <int>
## 1 94
# Alternatively, due to the binary nature of grad, we can count the number of
# CEOs who have a graduate degree by summing the variable grad.
tb.ceosal2 %>% summarize(n_ceo = sum(grad))
## # A tibble: 1 × 1
## n_ceo
## <dbl>
## 1 94
tb.ceosal2 %>% summarize(p_ceo = sum(grad)/n())
## # A tibble: 1 × 1
## p_ceo
## <dbl>
## 1 0.531
# another alternative
tb.ceosal2 %>% summarize(p_ceo = mean(grad))
## # A tibble: 1 × 1
## p_ceo
## <dbl>
## 1 0.531
tb.ceosal2 %>% summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
## avg_salary
## <dbl>
## 1 866.
tb.ceosal2 %>% filter(grad == 1) %>%
summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
## avg_salary
## <dbl>
## 1 864.
tb.ceosal2 %>%
filter(grad == 0) %>%
summarize(avg_salary = mean(salary))
## # A tibble: 1 × 1
## avg_salary
## <dbl>
## 1 868.
How can you answer the two previous questions (5 and 6) in one line?
tb.ceosal2 %>% group_by(grad) %>% summarize(avg_salary = mean(salary))
## # A tibble: 2 × 2
## grad avg_salary
## <dbl> <dbl>
## 1 0 868.
## 2 1 864.
tb.ceosal2 %>% group_by(college) %>% summarize(n_ceo = n())
## # A tibble: 2 × 2
## college n_ceo
## <dbl> <int>
## 1 0 5
## 2 1 172
# another alternative
tb.ceosal2 %>% select(college) %>% table()
## college
## 0 1
## 5 172
tb.ceosal2 %>% group_by(college, grad) %>% summarize(n_ceo = n())
## `summarise()` has grouped output by 'college'. You can override using the
## `.groups` argument.
## # A tibble: 3 × 3
## # Groups: college [2]
## college grad n_ceo
## <dbl> <dbl> <int>
## 1 0 0 5
## 2 1 0 78
## 3 1 1 94
tb.ceosal2 %>% summarize(mean_salary = mean(salary),
sd_salary = sd(salary),
min_salary = min(salary),
max_salary = max(salary),
median_salary = median(salary))
## # A tibble: 1 × 5
## mean_salary sd_salary min_salary max_salary median_salary
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 866. 588. 100 5299 707
tb.ceosal2 %>%
group_by (grad,college) %>%
summarize(mean_salary = mean(salary),
sd_salary = sd(salary),
min_salary = min(salary),
max_salary = max(salary),
median_salary = median(salary))
## # A tibble: 3 × 7
## # Groups: grad [2]
## grad college mean_salary sd_salary min_salary max_salary median_salary
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0 1096. 633. 300 1738 1143
## 2 0 1 853. 679. 174 5299 708.
## 3 1 1 864. 501. 100 2265 706.