library(tidyverse)
library(palmerpenguins)HDS 3.3-3.4
Begin by loading the tidyverse and palmerpenguins packages in the code chunk above and adding your name as the author.
Visualizing the penguins Data
Two Categorical Variables
Let’s start by making a stacked bar chart of the island variable by sex. Fill in the missing code:
glimpse(penguins)Rows: 344
Columns: 8
$ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex <fct> male, female, female, NA, female, male, female, male…
$ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
# Insert your code here to make the stacked bar chart
ggplot(data = penguins,
aes(x = island, fill = sex)
) +
geom_bar() +
labs(
x = "Island",
y = "Number of Penguins",
fill = "Sex",
title = "Distribution of Penguin Species by Island and Sex"
) Modify the previous bar chart to make a side-by-side bar chart of the two variables:
# Insert your code here to make the stacked bar chart
ggplot(data = penguins,
aes(x = island, fill = sex)
) +
geom_bar(position = "dodge") +
labs(
x = "Island",
y = "Number of Penguins",
fill = "Sex",
title = "Distribution of Penguin Species by Island and Sex"
)One Quantitative/One Categorical Variables
Now let’s make side-by-side boxplots of body_mass_g by sex. Put the quantitative variable on the y-axis and the categorical variable on the x-axis.
ggplot(penguins,
aes(x = sex, y =body_mass_g)
) +
geom_boxplot()Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
One Quantitative/Two Categorical Variables
Is there a difference in body_mass_g between sexes?
- yes, clearly the male is heavier because the male box is higher on the chart
Use facet_wrap() to modify the previous plot to compare the sexes separately for each species. Is the sex difference more or less clear now?
- the sex difference is much clealer now, We can see Gentoo is significantly
heavier
#| label: Side-By-Side Boxplots
ggplot(penguins,
aes(x = sex, y =body_mass_g)
) +
geom_boxplot() +
facet_wrap(~species)Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Two Quantitative Variables
Is there a relationship between flipper_length_mm and set the body_mass_g? Make a scatterplot of the two variables:
- the scatter plot shows a positive relation between body mass and flipper
length, as body mass increases flipper length also increases
ggplot (penguins,
aes(x = body_mass_g, y = flipper_length_mm)
) +
geom_point()Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Now add species to the plot by using different colored points:
ggplot (penguins,
aes(x = body_mass_g, y = flipper_length_mm, color = species)
) +
geom_point()Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Now make the size of the points proportional to bill_depth_mm.
ggplot (penguins,
aes(x = body_mass_g, y = flipper_length_mm, color = species, size = bill_length_mm)
) +
geom_point()Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
What additional insight do you get from this?
Adelie and Chinstrap penguins appear similar in body mass and bill depth, with
considerable overlap, whereas Gentoo penguins are distinctly heavier and have
longer flippers, making them clearly separated from the other two species.also there seems to be constant linear relationship between mass and flipper
length, making it possible to predict one given the other, with furhter study
of course