Research question

Which factors are most associated with penguin body mass?
Focusing on: species, sex, and flipper length.

Data & cases

# install.packages("palmerpenguins") # if needed
library(palmerpenguins); library(dplyr); library(ggplot2); library(tidyr)
peng <- penguins |>
  select(species, sex, flipper_length_mm, body_mass_g) |>
  drop_na()
nrow(peng)
## [1] 333

Method of data collection

Field researchers measured wild penguins (standard morphometrics) over multiple seasons. Data curated for teaching via the palmerpenguins package.

Study type

Observational study — describing associations only (no random assignment).

Data source

Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. https://allisonhorst.github.io/palmerpenguins/

Variables

Summary statistics

peng |>
  summarise(
    n = n(),
    mean_mass = mean(body_mass_g),
    sd_mass = sd(body_mass_g),
    mean_flipper = mean(flipper_length_mm)
  )
## # A tibble: 1 Ɨ 4
##       n mean_mass sd_mass mean_flipper
##   <int>     <dbl>   <dbl>        <dbl>
## 1   333     4207.    805.         201.

Visuals (for the presentation)

1) Boxplot: body mass by species & sex

ggplot(peng, aes(x = species, y = body_mass_g, fill = sex)) +
  geom_boxplot() +
  labs(x = "Species", y = "Body mass (g)", title = "Body mass differs by species and sex") +
  theme_minimal()

2) Scatter + line: body mass vs flipper length

ggplot(peng, aes(x = flipper_length_mm, y = body_mass_g)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Flipper length (mm)", y = "Body mass (g)", title = "Heavier penguins tend to have longer flippers") +
  theme_minimal()

Takeaways

Limitations

Observational data; missingness handled by dropping rows; results specific to seasons/locations sampled.

Reproducibility

Knit as HTML on Posit Cloud. Packages: palmerpenguins, dplyr, ggplot2.

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.9.0
## 
## locale:
##  [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
##  [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
##  [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
## [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
## 
## time zone: UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] tidyr_1.3.1          ggplot2_4.0.0        dplyr_1.1.4         
## [4] palmerpenguins_0.1.1
## 
## loaded via a namespace (and not attached):
##  [1] Matrix_1.7-3       gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.1    
##  [5] tidyselect_1.2.1   jquerylib_0.1.4    splines_4.5.1      scales_1.4.0      
##  [9] yaml_2.3.10        fastmap_1.2.0      lattice_0.22-7     R6_2.6.1          
## [13] labeling_0.4.3     generics_0.1.4     knitr_1.50         tibble_3.3.0      
## [17] bslib_0.9.0        pillar_1.11.1      RColorBrewer_1.1-3 rlang_1.1.6       
## [21] cachem_1.1.0       xfun_0.53          sass_0.4.10        S7_0.2.0          
## [25] cli_3.6.5          withr_3.0.2        magrittr_2.0.4     mgcv_1.9-3        
## [29] digest_0.6.37      grid_4.5.1         nlme_3.1-168       lifecycle_1.0.4   
## [33] vctrs_0.6.5        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
## [37] rmarkdown_2.30     purrr_1.1.0        tools_4.5.1        pkgconfig_2.0.3   
## [41] htmltools_0.5.8.1