Data Visualization

  1. Create the factor variables: grad_factor and college_factor that recode grad and college, respectively.

  2. What is the distribution (histogram) of CEOs’ salary and age?

  3. What can you say about the distribution of sales between graduated and non graduated CEOs? Can you identify any potential outliers in sales by visual inspection? Hint: use a boxplot.

  4. Plot the histogram and boxplot of log(salary). Compare this distribution to the distribution of salary.

  5. Use the function facet_wrap to plot the distribution (histogram) of log(salary) for CEOs with and without a graduate degree. Hint: use ?facet_wrap() to see some examples.

  6. Create a boxplot for the variable log(sales). Make it vertical. Make sure to label the axis appropriately and add a title to your plot. Add a custom theme of your choice to your plot.

  7. Create a scatter plot depicting the relationship between log(salary) and log(sales). Use different point shapes for CEOs with and without a graduate degree. Make sure to label your axis, point shape legend, and give an appropriate title to your plot. Hint: use the option shape =.

  8. Create a scatter plot depicting the relationship between salary and age. Use point size for representing the firm’s profits. Make sure to label your axis, size label, and give an appropriate title to your plot. Place the size legend at the bottom of your plot. Hint: use the option size =.


Get started by loading libraries and reading data.

library(tidyverse)
library(stargazer)

tb.ceosal2 <- read_delim("data/ceosal2.csv", delim= ",") 

# or
tb.ceosal2 <- read_csv("data/ceosal2.csv") 


  1. Create the factor variables: grad_factor and college_factor that recode grad and college, respectively.
tb.ceosal2 <- tb.ceosal2 %>% 
  mutate(college_factor = factor(x = college,
                               levels = c(0, 1),
                               labels = c("No College Degree","College Degree")),
         grad_factor = factor(x = grad,
                            levels = c(0, 1),
                            labels = c("No Grad. Degree","Grad. Degree")))


  1. What is the distribution (histogram) of CEOs’ salary and age?
ggplot(data = tb.ceosal2, aes(salary)) + 
  geom_histogram(color = 'white') + 
  theme(title = element_text(size = 16), text = element_text(size = 16))

ggplot(data = tb.ceosal2, aes(age)) +
  geom_histogram(color = 'white') + 
  theme(title = element_text(size = 16), text = element_text(size = 16))


  1. What can you say about the distribution of sales between graduated and non graduated CEOs? Can you identify any potential outliers in sales by visual inspection?
ggplot(data = tb.ceosal2, aes(x = grad_factor, y = sales)) + 
  geom_boxplot() + 
  theme(title = element_text(size = 16), text = element_text(size = 16))


  1. Plot the histogram and boxplot of log(salary). Compare this distribution to the distribution of salary.
# Histogram of lsalary
ggplot(tb.ceosal2, aes(x = lsalary)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "white") +
  labs(title = "Histogram of Log(CEO Salary)", x = "Log(Salary)", y = "Count") +
  theme_minimal()

# Boxplot of lsalary
ggplot(tb.ceosal2, aes(y = lsalary)) +
  geom_boxplot(fill = "skyblue") +
  scale_x_discrete( ) +
  labs(title = "Boxplot of Log(CEO Salary)", y = "Log(Salary)") +
  theme_minimal()


  1. Use the function facet_wrap to plot the distribution (histogram) of log(salary) for CEOs with and without a graduate degree.
ggplot(tb.ceosal2, aes(x = lsalary)) +
  geom_histogram(bins = 30, fill = "steelblue", color = "white") +
  facet_wrap(~ grad_factor) +
  labs(title = "Distribution of Log(CEO Salary) by Graduate Degree",
       x = "Log(Salary)", y = "Count") +
  theme_minimal()


  1. Create a boxplot for the variable log(sales). Make it vertical. Make sure to label the axis appropriately and add a title to your plot. Add a custom theme of your choice to your plot.
ggplot(tb.ceosal2, aes(y = lsales)) +
  geom_boxplot(fill = "lightblue") +
  scale_x_discrete( ) +
  labs(title = "Boxplot of Firm's log(Sales)",
       y = "Log(Sales, in millions USD)", x = "") +
  theme_classic()


  1. Create a scatter plot depicting the relationship between log(salary) and log(sales). Use different point shapes for CEOs with and without a graduate degree. Make sure to label your axis, point shape legend, and give an appropriate title to your plot.
ggplot(tb.ceosal2, aes(x = lsales, y = lsalary, shape = grad_factor)) +
  geom_point(size = 2) +
  labs(title = "CEO Salary vs. Firm Sales (1990)",
       x = "Log(Sales, in millions USD)",
       y = "Log(Salary, in thousands USD)",
       shape = "Graduate Degree") +
  theme_minimal()


  1. Create a scatter plot depicting the relationship between salary and age. Use point size for representing the firm’s profits. Make sure to label your axis, size label, and give an appropriate title to your plot. Place the size legend at the bottom of your plot.
ggplot(tb.ceosal2, aes(x = age, y = salary, size = profits)) +
  geom_point(alpha = 0.7) +
  labs(title = "CEO Salary vs. Age (1990)",
       x = "CEO Age",
       y = "Salary (in thousands USD)",
       size = "Firm Profits (in millions USD)") +
  theme_minimal() +
  theme(legend.position = "bottom")