HDS 3.3-3.4

Author

Michael Ernst

library(palmerpenguins) 
library(tidyverse)

Begin by loading the tidyverse and palmerpenguins packages in the code chunk above and adding your name as the author.

Visualizing the penguins Data

Two Categorical Variables

Let’s start by making a stacked bar chart of the island variable by sex. Fill in the missing code:

# Insert your code here to make the stacked bar chart

ggplot(data = penguins, mapping = aes(
    x = island,
    fill = sex,  
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Island and Sex" ) +
  geom_bar(position = "dodge")

Modify the previous bar chart to make a side-by-side bar chart of the two variables:

ggplot(data = penguins, mapping = aes(
    x = island,
    fill = sex,  
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Island and Sex" ) +
  geom_bar(position = "dodge") 

One Quantitative/One Categorical Variables

Now let’s make side-by-side boxplots of body_mass_g by sex. Put the quantitative variable on the y-axis and the categorical variable on the x-axis.

ggplot(data = penguins, mapping = aes(
    x = sex,
    y = body_mass_g,  
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Body Mass and Sex" ) +
  geom_boxplot(position = "dodge") 
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

One Quantitative/Two Categorical Variables

Is there a difference in body_mass_g between sexes?

Yes, the males tend to be heavier than the females.

Use facet_wrap() to modify the previous plot to compare the sexes separately for each species. Is the sex difference more or less clear now?

ggplot(data = penguins, mapping = aes(
    x = sex,
    y = body_mass_g,  
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Body Mass and Sex" ) +
  geom_boxplot(position = "dodge") + 
  facet_wrap(~ species)
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_boxplot()`).

Two Quantitative Variables

Is there a relationship between flipper_length_mm and set the body_mass_g?

Yes. There is most definitely a strong relationship between the two, as indicated by the correlation in the plot. The plot indicates that the heavier the penguin is, the longer its flippers are.

Make a scatterplot of the two variables:

ggplot(data = penguins, mapping = aes(
    x = flipper_length_mm,
    y = body_mass_g,  
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Body Mass and Flipper Length" ) +
  geom_point()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Now add species to the plot by using different colored points:

ggplot(data = penguins, mapping = aes(
    x = flipper_length_mm,
    y = body_mass_g,  
    color = species, 
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Body Mass and Flipper Length by Species" ) +
  geom_point() 
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

Now make the size of the points proportional to bill_depth_mm.

ggplot(data = penguins, mapping = aes(
    x = flipper_length_mm,
    y = body_mass_g,  
    color = species, 
    size = bill_depth_mm
  )
) + 
  labs( 
    title = "Distribution of Penguin Species by Body Mass and Flipper Length by Species" ) +
  geom_point() 
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).

What additional insight do you get from this?

Surprisingly, the lighter penguins with the shorter flippers (Adelie and Chinstrap) have deeper bills, while the Gentoo have shallower bills even though they are heavier and have longer flippers.