Data Visualization
Create the factor variables: grad_factor
and
college_factor
that recode grad
and
college
, respectively.
What is the distribution (histogram) of CEOs’ salary and age?
What can you say about the distribution of sales between
graduated and non graduated CEOs? Can you identify any potential
outliers in sales by visual inspection? Hint: use a
boxplot
.
Plot the histogram and boxplot of log(salary). Compare this distribution to the distribution of salary.
Use the function facet_wrap
to plot the distribution
(histogram) of log(salary) for CEOs with and without a graduate degree.
Hint: use ?facet_wrap()
to see some examples.
Create a boxplot for the variable log(sales). Make it vertical. Make sure to label the axis appropriately and add a title to your plot. Add a custom theme of your choice to your plot.
Create a scatter plot depicting the relationship between
log(salary) and log(sales). Use different point shapes for CEOs with and
without a graduate degree. Make sure to label your axis, point shape
legend, and give an appropriate title to your plot. Hint: use the option
shape =
.
Create a scatter plot depicting the relationship between salary
and age. Use point size for representing the firm’s profits. Make sure
to label your axis, size label, and give an appropriate title to your
plot. Place the size legend at the bottom of your plot. Hint: use the
option size =
.
Get started by loading libraries and reading data.
library(tidyverse)
library(stargazer)
tb.ceosal2 <- read_delim("data/ceosal2.csv", delim= ",")
# or
tb.ceosal2 <- read_csv("data/ceosal2.csv")
grad_factor
and
college_factor
that recode grad
and
college
, respectively.tb.ceosal2 <- tb.ceosal2 %>%
mutate(college_factor = factor(x = college,
levels = c(0, 1),
labels = c("No College Degree","College Degree")),
grad_factor = factor(x = grad,
levels = c(0, 1),
labels = c("No Grad. Degree","Grad. Degree")))
ggplot(data = tb.ceosal2, aes(salary)) +
geom_histogram(color = 'white') +
theme(title = element_text(size = 16), text = element_text(size = 16))
ggplot(data = tb.ceosal2, aes(age)) +
geom_histogram(color = 'white') +
theme(title = element_text(size = 16), text = element_text(size = 16))
ggplot(data = tb.ceosal2, aes(x = grad_factor, y = sales)) +
geom_boxplot() +
theme(title = element_text(size = 16), text = element_text(size = 16))
# Histogram of lsalary
ggplot(tb.ceosal2, aes(x = lsalary)) +
geom_histogram(bins = 30, fill = "steelblue", color = "white") +
labs(title = "Histogram of Log(CEO Salary)", x = "Log(Salary)", y = "Count") +
theme_minimal()
# Boxplot of lsalary
ggplot(tb.ceosal2, aes(y = lsalary)) +
geom_boxplot(fill = "skyblue") +
scale_x_discrete( ) +
labs(title = "Boxplot of Log(CEO Salary)", y = "Log(Salary)") +
theme_minimal()
facet_wrap
to plot the distribution
(histogram) of log(salary) for CEOs with and without a graduate
degree.ggplot(tb.ceosal2, aes(x = lsalary)) +
geom_histogram(bins = 30, fill = "steelblue", color = "white") +
facet_wrap(~ grad_factor) +
labs(title = "Distribution of Log(CEO Salary) by Graduate Degree",
x = "Log(Salary)", y = "Count") +
theme_minimal()
ggplot(tb.ceosal2, aes(y = lsales)) +
geom_boxplot(fill = "lightblue") +
scale_x_discrete( ) +
labs(title = "Boxplot of Firm's log(Sales)",
y = "Log(Sales, in millions USD)", x = "") +
theme_classic()
ggplot(tb.ceosal2, aes(x = lsales, y = lsalary, shape = grad_factor)) +
geom_point(size = 2) +
labs(title = "CEO Salary vs. Firm Sales (1990)",
x = "Log(Sales, in millions USD)",
y = "Log(Salary, in thousands USD)",
shape = "Graduate Degree") +
theme_minimal()
ggplot(tb.ceosal2, aes(x = age, y = salary, size = profits)) +
geom_point(alpha = 0.7) +
labs(title = "CEO Salary vs. Age (1990)",
x = "CEO Age",
y = "Salary (in thousands USD)",
size = "Firm Profits (in millions USD)") +
theme_minimal() +
theme(legend.position = "bottom")