knitr::opts_chunk$set(echo = TRUE,
                      fig.align = "center")
# Load the tidyverse and gt packages
pacman::p_load(tidyverse, gt)
The supers2.csv file has 12 variables on 1899 superheroes from Marvel and DC Comics. We will focus on 3 variables:
Alignment: If the character is considered a “Good Guy” (Good), Neutral (Neutral), or “Bad Guy” (Bad)
hair: The hair color of the character
eye: The eye color of the character
Read in the “supers2.csv” data set and save it as a global
object named supers. After you read it in, change the order of the
alignment groups to be Good, Neutral, Bad. To confirm it is done
correctly, use levels(supers$Alignment)
# Reading in the csv file
supers <- read.csv("supers2.csv")
# Changing the order of the levels
supers$Alignment <- 
  factor(supers$Alignment,
         levels = c("Good", "Neutral", "Bad"))
levels(supers$Alignment)
## [1] "Good"    "Neutral" "Bad"
For each of the 3 variables in the data:
i) Create a table for the groups of the variable that includes both the counts and percentages (rounded to 1 decimal places)
ii) A bar chart - Choose a suitable theme for the bar charts (don’t just go with the default option, try to make the graphs look nice!)
For all bar charts, add
scale_y_continuous(expand = c(0, 0, 0.05, 0)) to the graph
to remove the extra space at the bottom of the graph
# Forming the table for the counts and proportions
align_tab <- 
  supers |> 
  # Counting how many rows are in good, neutral, and bad
  count(Alignment) |> 
  # Calculating the proportions
  mutate(
    proportion = n / sum(n),
    percentage = paste0(round(proportion*100, digits = 1), "%")
  )
gt(align_tab)
| Alignment | n | proportion | percentage | 
|---|---|---|---|
| Good | 727 | 0.3828331 | 38.3% | 
| Neutral | 579 | 0.3048973 | 30.5% | 
| Bad | 593 | 0.3122696 | 31.2% | 
Create the bar chart using the supers data frame (not the table created in part 1i). Display the counts on the y-axis. Add a title that says “Comics Superhero Alignments” and remove the label on the x-axis
ggplot(
  data = supers,
  mapping = aes(x = Alignment)
) + 
  
  # Creating a bar chart using geom_bar() and using fill to change the color
  geom_bar(
    fill = "forestgreen",
    color = "black"
  ) + 
  
  # Changing the label on the x-axis
  labs(
    title = "Comics Superhero Alignments",
    x = NULL
  ) + 
  
  # Changing the theme
  theme_classic() +
  
  # Adding the code chunk specified at the beginning of the question:
  scale_y_continuous(expand = c(0, 0, 0.05, 0))
eye_tab <- 
  supers |> 
  # Counting the number of supers for each eye color
  count(eye) |> 
  # Calculating the proportion and percentage
  mutate(
    eye_prop = n / sum(n),
    eye_perc = paste0(round(eye_prop * 100, 1), "%")
  )
# Display, but don't save, the data.frame (table) using gt() from the gt() package
gt(eye_tab)
| eye | n | eye_prop | eye_perc | 
|---|---|---|---|
| Blue | 377 | 0.19852554 | 19.9% | 
| Brown | 348 | 0.18325434 | 18.3% | 
| Green | 124 | 0.06529753 | 6.5% | 
| None | 686 | 0.36124276 | 36.1% | 
| Other | 227 | 0.11953660 | 12% | 
| Red | 137 | 0.07214323 | 7.2% | 
Using the data frame created in 1B i), create a bar chart that displays the percentages on the y-axis. Make sure to choose appropriate colors for each of the bars! Make sure to add a title and remove the labels on the x-axis
# Need to specify what y is if we want the proportions on the y-axis
# which requires us to use the summarized data
ggplot(
  data = eye_tab,
  mapping = aes(
    x = eye,
    y = eye_prop
  )
) + 
  
  # Creating a bar chart using geom_col() since we need to specify the bar heights
  geom_col(
    fill = c("blue", "tan4", "forestgreen", "white", "black", "red"),
    color = "black"
  ) + 
  
  # Changing the label on the x-axis
  labs(
    x = NULL,
    title = "Comic Superhero Eye Colors"
  ) + 
  
  # Changing the theme
  theme_classic() +
  
  # Adding the code chunk specified at the beginning of the question:
  scale_y_continuous(
    expand = c(0, 0, 0.05, 0),
    labels = scales::label_percent()
  )
# Creating the table of counts for hair color
hair_tab <- 
  supers |> 
  # Counting the number of supers for each hair color
  count(hair) |> 
  # Calculating the proportion and percentage
  mutate(
    hair_prop = n / sum(n),
    hair_perc = paste0(round(hair_prop * 100, 1), "%")
  )
# Using gt() to display the results
gt(hair_tab)
| hair | n | hair_prop | hair_perc | 
|---|---|---|---|
| Black | 354 | 0.1864139 | 18.6% | 
| Blond | 209 | 0.1100579 | 11% | 
| Brown | 194 | 0.1021590 | 10.2% | 
| None | 890 | 0.4686677 | 46.9% | 
| Other | 252 | 0.1327014 | 13.3% | 
Create a bar chart that displays the counts on the x-axis and the hair colors on the y-axis. Make sure to choose appropriate colors for each of the bars! Add a title and remove the labels on the y-axis
# Need to specify what y is if we want the proportions on the y-axis
# which requires us to use the summarized data
ggplot(
  data = supers,
  mapping = aes(y = hair)
) + 
  
  # Creating a bar chart using geom_col() since we need to specify the bar heights
  geom_bar(
    fill = c("black", "yellow2", "tan4", "white", "violet"),
    color = "black"
  ) + 
  
  # Changing the label on the x-axis
  labs(
    y = NULL,
    title = "Comic Superheroes Hair Colors"
  ) + 
  
  # Changing the theme
  theme_classic() +
  
  scale_x_continuous(
    expand = c(0, 0, 0.05, 0)
  )
Using the other 2 variables (hair and eye color) individually, calculate the percentages of supers that are Good, Neutral, and Bad.
i) Present the percentages in a table, rounding to 1 decimal
place. You can use either a contingency (two-way) table or by converting
the table to a data.frame()
ii) Present the proportions or percentages in the specified bar chart - Have the colors for Good, Neutral, and Bad be “steelblue”, “grey70”, and “tomato”, respectively.
Keep adding
scale_y_continuous(expand = c(0, 0, 0.05, 0)) to the bar
charts to remove the extra space!
align_eye_df <- 
  supers |> 
  # Getting the combination of alignment and eye color
  count(eye, Alignment) |> 
  # Calculating the alignment prop by eye color group
  mutate(
    .by = eye,
    prop = n / sum(n),
    percent = round(prop*100, 1)
  )
gt(align_eye_df)
| eye | Alignment | n | prop | percent | 
|---|---|---|---|---|
| Blue | Good | 228 | 0.6047745 | 60.5 | 
| Blue | Neutral | 43 | 0.1140584 | 11.4 | 
| Blue | Bad | 106 | 0.2811671 | 28.1 | 
| Brown | Good | 176 | 0.5057471 | 50.6 | 
| Brown | Neutral | 66 | 0.1896552 | 19.0 | 
| Brown | Bad | 106 | 0.3045977 | 30.5 | 
| Green | Good | 59 | 0.4758065 | 47.6 | 
| Green | Neutral | 19 | 0.1532258 | 15.3 | 
| Green | Bad | 46 | 0.3709677 | 37.1 | 
| None | Good | 160 | 0.2332362 | 23.3 | 
| None | Neutral | 368 | 0.5364431 | 53.6 | 
| None | Bad | 158 | 0.2303207 | 23.0 | 
| Other | Good | 68 | 0.2995595 | 30.0 | 
| Other | Neutral | 60 | 0.2643172 | 26.4 | 
| Other | Bad | 99 | 0.4361233 | 43.6 | 
| Red | Good | 36 | 0.2627737 | 26.3 | 
| Red | Neutral | 23 | 0.1678832 | 16.8 | 
| Red | Bad | 78 | 0.5693431 | 56.9 | 
The students can choose which of the two above to use, they don’t need to do both. Same is true for all part i) for question 2
Create a stacked bar chart displaying the alignment percentage for each eye color group. Which eye color group is most likely to be a hero (alignment = Good)? What about a villain (alignment = Bad)?
ggplot(
  data = supers,
  mapping = aes(
    x = eye,
    fill = Alignment
  )
) + 
  
  # Creating a bar chart using geom_bar() since it is from the original data
  geom_bar(
    position = "fill",
    color = "black"
  ) + 
  
  # Changing the label on the x-axis
  labs(
    title = "Alignment by Eye Color",
    x = NULL,
    y = "Percentage"
  ) + 
  
  # Changing the theme
  theme_classic() + 
  
  # Changing the colors to darkblue and darkred
  scale_fill_manual(values = c("steelblue", "grey70", "tomato")) +
  
  # Changing the y-axis to be percentages and removing the extra space
  scale_y_continuous(
    labels = scales::label_percent(),
    expand = c(0, 0, 0.05, 0)
  )
Supers with blue eyes are the most likely to be a hero. Supers with red eyes are the most likely to be villains
ONLY APPLY THE BONUS POINT ONCE, NOT ON ALL 3 GRAPHS
Repeat part 2A, but with hair color instead of eye color.
align_hair_df <- 
  supers |> 
  # Getting the combination of alignment and eye color
  count(hair, Alignment) |> 
  # Calculating the alignment prop by eye color group
  mutate(
    .by = hair,
    prop = n / sum(n),
    percent = round(prop*100, 1)
  )
gt(align_hair_df)
| hair | Alignment | n | prop | percent | 
|---|---|---|---|---|
| Black | Good | 184 | 0.5197740 | 52.0 | 
| Black | Neutral | 61 | 0.1723164 | 17.2 | 
| Black | Bad | 109 | 0.3079096 | 30.8 | 
| Blond | Good | 130 | 0.6220096 | 62.2 | 
| Blond | Neutral | 30 | 0.1435407 | 14.4 | 
| Blond | Bad | 49 | 0.2344498 | 23.4 | 
| Brown | Good | 107 | 0.5515464 | 55.2 | 
| Brown | Neutral | 35 | 0.1804124 | 18.0 | 
| Brown | Bad | 52 | 0.2680412 | 26.8 | 
| None | Good | 197 | 0.2213483 | 22.1 | 
| None | Neutral | 411 | 0.4617978 | 46.2 | 
| None | Bad | 282 | 0.3168539 | 31.7 | 
| Other | Good | 109 | 0.4325397 | 43.3 | 
| Other | Neutral | 42 | 0.1666667 | 16.7 | 
| Other | Bad | 101 | 0.4007937 | 40.1 | 
Create a side-by-side bar chart. Supers with what hair color are the most likely to be heroes? What hair color is the most likely to be a villain?
ggplot(
  data = align_hair_df,
  mapping = aes(
    x = hair,
    fill = Alignment,
    y = percent
  )
) + 
  
  # Creating a bar chart using geom_bar() since it is from the original data
  geom_col(
    position = "dodge",
    color = "black"
  ) + 
  
  # Changing the label on the x-axis
  labs(
    title = "Alignment by Hair Color",
    x = NULL,
    y = "Percentage"
  ) + 
  
  # Changing the theme
  theme_classic() + 
  
  # Changing the colors to darkblue and darkred
  scale_fill_manual(values = c("steelblue", "grey70", "tomato")) +
  
  # Changing the y-axis to be percentages and removing the extra space
  scale_y_continuous(expand = c(0, 0, 0.05, 0))
If a hero has blond hair, they are the most likely to be a hero. If a hero has a non-black, non-blond, or non-brown hair color (just saying other is fine), they are the most likely to be a villain.
Which of the two style of graphs do you prefer: Segmented/Stacked or side-by-side? Briefly explain why!
Personally, I find the stacked bar chart to be better since it is easier to compare the groups on the x-axis across the levels of the y-axis, but as long as they justify their answer, you can give them full credit!