Which factors are most associated with penguin body
mass?
Focusing on: species, sex, and flipper length.
# install.packages("palmerpenguins") # if needed
library(palmerpenguins); library(dplyr); library(ggplot2); library(tidyr)
peng <- penguins |>
select(species, sex, flipper_length_mm, body_mass_g) |>
drop_na()
nrow(peng)
## [1] 333
Field researchers measured wild penguins (standard morphometrics) over multiple seasons. Data curated for teaching via the palmerpenguins package.
Observational study ā describing associations only (no random assignment).
Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer Archipelago (Antarctica) penguin data. https://allisonhorst.github.io/palmerpenguins/
body_mass_g (numeric)species (categorical),
sex (categorical), flipper_length_mm
(numeric)peng |>
summarise(
n = n(),
mean_mass = mean(body_mass_g),
sd_mass = sd(body_mass_g),
mean_flipper = mean(flipper_length_mm)
)
## # A tibble: 1 Ć 4
## n mean_mass sd_mass mean_flipper
## <int> <dbl> <dbl> <dbl>
## 1 333 4207. 805. 201.
1) Boxplot: body mass by species & sex
ggplot(peng, aes(x = species, y = body_mass_g, fill = sex)) +
geom_boxplot() +
labs(x = "Species", y = "Body mass (g)", title = "Body mass differs by species and sex") +
theme_minimal()
2) Scatter + line: body mass vs flipper length
ggplot(peng, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE) +
labs(x = "Flipper length (mm)", y = "Body mass (g)", title = "Heavier penguins tend to have longer flippers") +
theme_minimal()
Observational data; missingness handled by dropping rows; results specific to seasons/locations sampled.
Knit as HTML on Posit Cloud. Packages: palmerpenguins,
dplyr, ggplot2.
sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 20.04.6 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3; LAPACK version 3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] tidyr_1.3.1 ggplot2_4.0.0 dplyr_1.1.4
## [4] palmerpenguins_0.1.1
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-3 gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.1
## [5] tidyselect_1.2.1 jquerylib_0.1.4 splines_4.5.1 scales_1.4.0
## [9] yaml_2.3.10 fastmap_1.2.0 lattice_0.22-7 R6_2.6.1
## [13] labeling_0.4.3 generics_0.1.4 knitr_1.50 tibble_3.3.0
## [17] bslib_0.9.0 pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.6
## [21] cachem_1.1.0 xfun_0.53 sass_0.4.10 S7_0.2.0
## [25] cli_3.6.5 withr_3.0.2 magrittr_2.0.4 mgcv_1.9-3
## [29] digest_0.6.37 grid_4.5.1 nlme_3.1-168 lifecycle_1.0.4
## [33] vctrs_0.6.5 evaluate_1.0.5 glue_1.8.0 farver_2.1.2
## [37] rmarkdown_2.30 purrr_1.1.0 tools_4.5.1 pkgconfig_2.0.3
## [41] htmltools_0.5.8.1