library(tidyverse)
library(palmerpenguins)
library(ggiraph)
# interactive plot package; thanks to Ellie Ryan for showing me this!
library(PNWColors)
# great color palette package by Jake Lawlor
# create color palette for use in plots
pal1 <- pnw_palette("Lake", n = 2)
# look at structure of data
## first few rows
head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>
## variables and their classes
str(penguins)
## tibble [344 × 8] (S3: tbl_df/tbl/data.frame)
##  $ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ bill_length_mm   : num [1:344] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
##  $ bill_depth_mm    : num [1:344] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
##  $ flipper_length_mm: int [1:344] 181 186 195 NA 193 190 181 195 193 190 ...
##  $ body_mass_g      : int [1:344] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
##  $ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
##  $ year             : int [1:344] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...

After looking through the numerical variables in the Palmer penguins dataset (bill length, bill depth, flipper length, and body mass), I decided to visualize the distribution of bill length mainly because the distributions were normal-ish but had more unique shapes and were more fun to visualize. Also, this was the only metric that separated Adelies from Chinstraps, so it felt a little more interesting.

penguins_bill <- penguins %>% 
  # wrangle data to columns of interest
  select(species, bill_length_mm, island, sex, year) %>%
  # rename bill_length_mm for convenience
  rename(bill = bill_length_mm) %>%
  # remove NAs in bill length and sex
  drop_na(bill, sex) 

Adelie penguins occur on all three islands, while Chinstrap and Gentoo penguins each are local to a single island in the dataset. I did not see any major differences in bill length distribution across years or among Adelies across different islands. The biggest difference appears to be within sex, with males typically having a slightly larger bill length.

1 - QQ Plot

# save plot
penguins_qq <- penguins_bill %>%
  ggplot(aes(sample = bill, color = sex, shape = sex)) +
  # adjust color palettes to manual input - use scale_xxx_manual for discrete variables
  scale_color_manual(values = pal1) +
  scale_shape_manual(values = c(15, 16)) +
  # add QQ plot geom (points) and theoretical expectation (line)
  geom_qq(distribution = qnorm) + geom_qq_line(distribution = qnorm) +
  # separate plots by species
  facet_wrap(~species) +
  labs(x = "Theoretical Quantiles", y = "Bill Length (mm)",
       title = "QQ Plot of Penguin Bill Length") +
  theme(plot.title = element_text(hjust = 0.5))
# print plot
penguins_qq

Figure 1: Quantile-Quantile plots of Bill Length (mm) for male and female Adelie, Chinstrap, and Gentoo penguins.

The distributions of bill length for three penguin species by sex are all roughly normal, as the majority of points align with the theoretical values of a normal distribution. Most of these distributions do have a few outliers, primarily at the upper end of bill length. One group, female Chinstrap penguins, tail off quite a bit from normal on both extremes.

I tested some other typical probability distributions (log-normal and gamma - I could not get beta distributions to work right nor find how to use the alpha distribution easily), and normal was the best fit for all groups.

2 - Distribution Plots

penguins_dens <- penguins_bill %>%
  ggplot() +
  # geom_density to create density functions for bill length by each group
  geom_density(aes(x = bill, color = sex, fill = sex), alpha = 0.5) +
  scale_color_manual(values = rev(pal1)) +
  scale_fill_manual(values = rev(pal1)) +
  facet_wrap(~species, ncol = 1) +
  labs(x = "Bill Length (mm)", y = "Density",
       title = "Density Plot of Penguin Bill Length") +
  theme(plot.title = element_text(hjust = 0.5))
penguins_dens

Figure 2: Density distribution plots of bill length (mm) for male and female Adelie, Chinstrap, and Gentoo penguins.

penguins_viol <- penguins_bill %>%
  ggplot() +
  # violin plots of distributions
  geom_violin(aes(x = sex, y = bill, color = sex, fill = sex), alpha = 0.25) +
  # overlaying points for each measurement; adding interactive allows hover-over effects
    # tooltip adds pop-up with value; data_id changes point color when hovered
  geom_point_interactive(aes(x = sex, y = bill, color = sex, shape = sex, tooltip = bill, data_id = bill),
             position = position_jitter(width = 0.2),
             ) +
  scale_color_manual(values = pal1) +
  # add manual point shapes (pch in base plot)
  scale_shape_manual(values = c(15, 16)) +
  scale_fill_manual(values = pal1) +
  facet_wrap(~species, nrow = 1) +
  labs(x = "Sex", y = "Bill Length (mm)",
       title = "Violin Plot of Penguin Bill Length") +
  # remove excess axis labels and ticks
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5))

# create interactive ggplot object with girafe() function from ggiraph package
penguins_viol_int <- girafe(ggobj = penguins_viol, 
       options = list(
         # change fill and stroke of points when hovered over
         opts_hover(css = "fill:lightblue;stroke:black")
         ))
# print interactive plot
penguins_viol_int

Figure 3: Violin plots of bill length (mm) distributions for male and female penguins by species. Violin plots show density of points along the range for each group. Points show individual measurements within each group. Hovering over points displays the bill length in millimeters, and highlights other points with an equal value.