library(tidyverse)
library(palmerpenguins)HDS 3.3-3.4
Begin by loading the tidyverse and palmerpenguins packages in the code chunk above and adding your name as the author.
Visualizing the penguins Data
Two Categorical Variables
Let’s start by making a stacked bar chart of the island variable by sex. Fill in the missing code:
ggplot(penguins, aes(x = island, fill = sex)) +
geom_bar() +
labs(x = "Island",
y = "Number of Penguins",
fill = "Sex",
title = "Distribution of Penguin Species by Island and Sex"
)Modify the previous bar chart to make a side-by-side bar chart of the two variables:
ggplot(penguins, aes(x = island, fill = sex)) +
geom_bar(position = "dodge") +
labs(x = "Island",
y = "Number of Penguins",
fill = "Sex",
title = "Distribution of Penguin Species by Island and Sex"
)One Quantitative/One Categorical Variables
Now let’s make side-by-side boxplots of body_mass_g by sex. Put the quantitative variable on the y-axis and the categorical variable on the x-axis.
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot() +
labs(x = "Sex",
y = "Body Mass (g)",
title = "Distribution of Penguin Body Mass by Sex"
)One Quantitative/Two Categorical Variables
Is there a difference in body_mass_g between sexes? Use facet_wrap() to modify the previous plot to compare the sexes separately for each species. Is the sex difference more or less clear now?
ggplot(penguins, aes(x = sex, y = body_mass_g)) +
geom_boxplot() +
facet_wrap( ~ species) +
labs(x = "Sex",
y = "Body Mass (g)",
title = "Distribution of Penguin Body Mass by Sex and Species"
)Two Quantitative Variables
Is there a relationship between flipper_length_mm and set the body_mass_g? Make a scatterplot of the two variables:
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
labs(x = "Flipper Length (mm)",
y = "Body Mass (g)",
title = "Flipper Length vs. Body Mass"
)Now add species to the plot by using different colored points:
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
labs(x = "Flipper Length (mm)",
y = "Body Mass (g)",
title = "Flipper Length vs. Body Mass by Species",
color = "Species"
)Now make the size of the points proportional to bill_depth_mm.
ggplot(penguins,
aes(x = flipper_length_mm,
y = body_mass_g,
color = species,
size = bill_depth_mm)
) +
geom_point() +
labs(x = "Flipper Length (mm)",
y = "Body Mass (g)",
title = "Flipper Length vs. Body Mass by Species",
color = "Species",
size = "Bill Depth (mm)"
)What additional insight do you get from this?
I learned how to visualize more than two variables by using color and size