knitr::opts_chunk$set(echo = TRUE,
fig.align = "center")
# Load the tidyverse and gt packages
pacman::p_load(tidyverse, gt)
The supers2.csv file has 12 variables on 1899 superheroes from Marvel and DC Comics. We will focus on 3 variables:
Alignment: If the character is considered a “Good Guy” (Good), Neutral (Neutral), or “Bad Guy” (Bad)
hair: The hair color of the character
eye: The eye color of the character
Read in the “supers2.csv” data set and save it as a global
object named supers. After you read it in, change the order of the
alignment groups to be Good, Neutral, Bad. To confirm it is done
correctly, use levels(supers$Alignment)
# Reading in the csv file
supers <- read.csv("supers2.csv")
# Changing the order of the levels
supers$Alignment <-
factor(supers$Alignment,
levels = c("Good", "Neutral", "Bad"))
levels(supers$Alignment)
## [1] "Good" "Neutral" "Bad"
For each of the 3 variables in the data:
i) Create a table for the groups of the variable that includes both the counts and percentages (rounded to 1 decimal places)
ii) A bar chart - Choose a suitable theme for the bar charts (don’t just go with the default option, try to make the graphs look nice!)
For all bar charts, add
scale_y_continuous(expand = c(0, 0, 0.05, 0))
to the graph
to remove the extra space at the bottom of the graph
# Forming the table for the counts and proportions
align_tab <-
supers |>
# Counting how many rows are in good, neutral, and bad
count(Alignment) |>
# Calculating the proportions
mutate(
proportion = n / sum(n),
percentage = paste0(round(proportion*100, digits = 1), "%")
)
gt(align_tab)
Alignment | n | proportion | percentage |
---|---|---|---|
Good | 727 | 0.3828331 | 38.3% |
Neutral | 579 | 0.3048973 | 30.5% |
Bad | 593 | 0.3122696 | 31.2% |
Create the bar chart using the supers data frame (not the table created in part 1i). Display the counts on the y-axis. Add a title that says “Comics Superhero Alignments” and remove the label on the x-axis
ggplot(
data = supers,
mapping = aes(x = Alignment)
) +
# Creating a bar chart using geom_bar() and using fill to change the color
geom_bar(
fill = "forestgreen",
color = "black"
) +
# Changing the label on the x-axis
labs(
title = "Comics Superhero Alignments",
x = NULL
) +
# Changing the theme
theme_classic() +
# Adding the code chunk specified at the beginning of the question:
scale_y_continuous(expand = c(0, 0, 0.05, 0))
eye_tab <-
supers |>
# Counting the number of supers for each eye color
count(eye) |>
# Calculating the proportion and percentage
mutate(
eye_prop = n / sum(n),
eye_perc = paste0(round(eye_prop * 100, 1), "%")
)
# Display, but don't save, the data.frame (table) using gt() from the gt() package
gt(eye_tab)
eye | n | eye_prop | eye_perc |
---|---|---|---|
Blue | 377 | 0.19852554 | 19.9% |
Brown | 348 | 0.18325434 | 18.3% |
Green | 124 | 0.06529753 | 6.5% |
None | 686 | 0.36124276 | 36.1% |
Other | 227 | 0.11953660 | 12% |
Red | 137 | 0.07214323 | 7.2% |
Using the data frame created in 1B i), create a bar chart that displays the percentages on the y-axis. Make sure to choose appropriate colors for each of the bars! Make sure to add a title and remove the labels on the x-axis
# Need to specify what y is if we want the proportions on the y-axis
# which requires us to use the summarized data
ggplot(
data = eye_tab,
mapping = aes(
x = eye,
y = eye_prop
)
) +
# Creating a bar chart using geom_col() since we need to specify the bar heights
geom_col(
fill = c("blue", "tan4", "forestgreen", "white", "black", "red"),
color = "black"
) +
# Changing the label on the x-axis
labs(
x = NULL,
title = "Comic Superhero Eye Colors"
) +
# Changing the theme
theme_classic() +
# Adding the code chunk specified at the beginning of the question:
scale_y_continuous(
expand = c(0, 0, 0.05, 0),
labels = scales::label_percent()
)
# Creating the table of counts for hair color
hair_tab <-
supers |>
# Counting the number of supers for each hair color
count(hair) |>
# Calculating the proportion and percentage
mutate(
hair_prop = n / sum(n),
hair_perc = paste0(round(hair_prop * 100, 1), "%")
)
# Using gt() to display the results
gt(hair_tab)
hair | n | hair_prop | hair_perc |
---|---|---|---|
Black | 354 | 0.1864139 | 18.6% |
Blond | 209 | 0.1100579 | 11% |
Brown | 194 | 0.1021590 | 10.2% |
None | 890 | 0.4686677 | 46.9% |
Other | 252 | 0.1327014 | 13.3% |
Create a bar chart that displays the counts on the x-axis and the hair colors on the y-axis. Make sure to choose appropriate colors for each of the bars! Add a title and remove the labels on the y-axis
# Need to specify what y is if we want the proportions on the y-axis
# which requires us to use the summarized data
ggplot(
data = supers,
mapping = aes(y = hair)
) +
# Creating a bar chart using geom_col() since we need to specify the bar heights
geom_bar(
fill = c("black", "yellow2", "tan4", "white", "violet"),
color = "black"
) +
# Changing the label on the x-axis
labs(
y = NULL,
title = "Comic Superheroes Hair Colors"
) +
# Changing the theme
theme_classic() +
scale_x_continuous(
expand = c(0, 0, 0.05, 0)
)
Using the other 2 variables (hair and eye color) individually, calculate the percentages of supers that are Good, Neutral, and Bad.
i) Present the percentages in a table, rounding to 1 decimal
place. You can use either a contingency (two-way) table or by converting
the table to a data.frame()
ii) Present the proportions or percentages in the specified bar chart - Have the colors for Good, Neutral, and Bad be “steelblue”, “grey70”, and “tomato”, respectively.
Keep adding
scale_y_continuous(expand = c(0, 0, 0.05, 0))
to the bar
charts to remove the extra space!
align_eye_df <-
supers |>
# Getting the combination of alignment and eye color
count(eye, Alignment) |>
# Calculating the alignment prop by eye color group
mutate(
.by = eye,
prop = n / sum(n),
percent = round(prop*100, 1)
)
gt(align_eye_df)
eye | Alignment | n | prop | percent |
---|---|---|---|---|
Blue | Good | 228 | 0.6047745 | 60.5 |
Blue | Neutral | 43 | 0.1140584 | 11.4 |
Blue | Bad | 106 | 0.2811671 | 28.1 |
Brown | Good | 176 | 0.5057471 | 50.6 |
Brown | Neutral | 66 | 0.1896552 | 19.0 |
Brown | Bad | 106 | 0.3045977 | 30.5 |
Green | Good | 59 | 0.4758065 | 47.6 |
Green | Neutral | 19 | 0.1532258 | 15.3 |
Green | Bad | 46 | 0.3709677 | 37.1 |
None | Good | 160 | 0.2332362 | 23.3 |
None | Neutral | 368 | 0.5364431 | 53.6 |
None | Bad | 158 | 0.2303207 | 23.0 |
Other | Good | 68 | 0.2995595 | 30.0 |
Other | Neutral | 60 | 0.2643172 | 26.4 |
Other | Bad | 99 | 0.4361233 | 43.6 |
Red | Good | 36 | 0.2627737 | 26.3 |
Red | Neutral | 23 | 0.1678832 | 16.8 |
Red | Bad | 78 | 0.5693431 | 56.9 |
The students can choose which of the two above to use, they don’t need to do both. Same is true for all part i) for question 2
Create a stacked bar chart displaying the alignment percentage for each eye color group. Which eye color group is most likely to be a hero (alignment = Good)? What about a villain (alignment = Bad)?
ggplot(
data = supers,
mapping = aes(
x = eye,
fill = Alignment
)
) +
# Creating a bar chart using geom_bar() since it is from the original data
geom_bar(
position = "fill",
color = "black"
) +
# Changing the label on the x-axis
labs(
title = "Alignment by Eye Color",
x = NULL,
y = "Percentage"
) +
# Changing the theme
theme_classic() +
# Changing the colors to darkblue and darkred
scale_fill_manual(values = c("steelblue", "grey70", "tomato")) +
# Changing the y-axis to be percentages and removing the extra space
scale_y_continuous(
labels = scales::label_percent(),
expand = c(0, 0, 0.05, 0)
)
Supers with blue eyes are the most likely to be a hero. Supers with red eyes are the most likely to be villains
ONLY APPLY THE BONUS POINT ONCE, NOT ON ALL 3 GRAPHS
Repeat part 2A, but with hair color instead of eye color.
align_hair_df <-
supers |>
# Getting the combination of alignment and eye color
count(hair, Alignment) |>
# Calculating the alignment prop by eye color group
mutate(
.by = hair,
prop = n / sum(n),
percent = round(prop*100, 1)
)
gt(align_hair_df)
hair | Alignment | n | prop | percent |
---|---|---|---|---|
Black | Good | 184 | 0.5197740 | 52.0 |
Black | Neutral | 61 | 0.1723164 | 17.2 |
Black | Bad | 109 | 0.3079096 | 30.8 |
Blond | Good | 130 | 0.6220096 | 62.2 |
Blond | Neutral | 30 | 0.1435407 | 14.4 |
Blond | Bad | 49 | 0.2344498 | 23.4 |
Brown | Good | 107 | 0.5515464 | 55.2 |
Brown | Neutral | 35 | 0.1804124 | 18.0 |
Brown | Bad | 52 | 0.2680412 | 26.8 |
None | Good | 197 | 0.2213483 | 22.1 |
None | Neutral | 411 | 0.4617978 | 46.2 |
None | Bad | 282 | 0.3168539 | 31.7 |
Other | Good | 109 | 0.4325397 | 43.3 |
Other | Neutral | 42 | 0.1666667 | 16.7 |
Other | Bad | 101 | 0.4007937 | 40.1 |
Create a side-by-side bar chart. Supers with what hair color are the most likely to be heroes? What hair color is the most likely to be a villain?
ggplot(
data = align_hair_df,
mapping = aes(
x = hair,
fill = Alignment,
y = percent
)
) +
# Creating a bar chart using geom_bar() since it is from the original data
geom_col(
position = "dodge",
color = "black"
) +
# Changing the label on the x-axis
labs(
title = "Alignment by Hair Color",
x = NULL,
y = "Percentage"
) +
# Changing the theme
theme_classic() +
# Changing the colors to darkblue and darkred
scale_fill_manual(values = c("steelblue", "grey70", "tomato")) +
# Changing the y-axis to be percentages and removing the extra space
scale_y_continuous(expand = c(0, 0, 0.05, 0))
If a hero has blond hair, they are the most likely to be a hero. If a hero has a non-black, non-blond, or non-brown hair color (just saying other is fine), they are the most likely to be a villain.
Which of the two style of graphs do you prefer: Segmented/Stacked or side-by-side? Briefly explain why!
Personally, I find the stacked bar chart to be better since it is easier to compare the groups on the x-axis across the levels of the y-axis, but as long as they justify their answer, you can give them full credit!