Overview of Data
The Palmer Penguin data was collected by a group of scientists near
the Palmer Station in Antartica. This biological research was conducted
for the purpose of tracking various information on three species of
penguins (Adélie, Chinstrap, and Gentroo) found in the area from
2007-2009. The information tracked includes: bill length, bill depth,
flipper length, body mass, and sex. Its importance comes from why it
exists. It was created as a modern, ethical, and pedagogically superior
alternative to the infamous Iris dataset. Iris is overused, sterile, and
about flowers that never get cold. Palmer Penguins gives learners
something richer: multivariate data with missing values, real
measurement noise, categorical variables, and biological meaning. For
the purpose of this analysis, I will be addressing how reliably body
mass and flipper length can be used to predict species.
Question: Can body mass and flipper length reliably predict a
penguin’s species?
Above you can see the study measured various variables regarding the
Adélie, Chinstrap, and Gentroo penguin species located on Torgersen
island. These variables included bill length, bill depth, flipper
length, body mass, and sex. This study was conducted between the years
of 2007 and 2009.
Next, I will run the numbers for the sample size on each species,
the mean for both body mass and flipper size (our variables of
concentration), and the standard deviation for both numerical
variables.
Basic statistical descriptors
Sample size of penguins per species:
* Adélie = 152
* Chinstrap = 68
* Gentroo = 124
Mean Body mass by species:
* Adélie = 3700.66
* Chinstrap = 3733.09
* Gentroo = 5076.02
Mean flipper length by species:
* Adélie = 189.95
* Chinstrap = 195.82
* Gentroo = 217.19
Standard deviation of body mass by species:
* Adélie = 458.57
* Chinstrap = 384.34
* Gentroo = 504.12
Standard deviation of flipper length by species:
* Adélie = 6.54
* Chinstrap = 7.13
* Gentroo = 6.48
In the following two visualizations, body mass (g) and flipper
length (mm) are represented by species. For visual 1, I use the Loess
Smoothing Method to distinguish trends between species. In visual 2, I
have separated the same data into separate scatterplots based on
species.
Visual 1

Visual 2

Statistical Analysis
For my statistical analyses, I will be using the correlation
coefficient, multinomial logistic regression, one-way ANOVA tests, and
post-hoc tukey tests.
Correlation coefficient: is a number that measures the strength and
direction of a relationship between two variables. It’s usually
represented by r and ranges from -1 to +1.
cor(penguins$body_mass_g, penguins$flipper_length_mm, use = "complete.obs")
## [1] 0.8712018
Shows positive linear relationship (+0.87) between body mass and
flipper length
Multinomial logistic regression: is a type of regression used when
your dependent variable (outcome) is categorical with more than two
categories—and those categories don’t have an inherent order.
multi_model <- multinom(species ~ flipper_length_mm + body_mass_g, data = penguins)
## # weights: 12 (6 variable)
## initial value 375.725403
## iter 10 value 186.747902
## iter 20 value 139.450263
## iter 30 value 136.034343
## iter 40 value 134.983908
## iter 50 value 134.602842
## final value 134.365818
## converged
summary(multi_model)
## Call:
## multinom(formula = species ~ flipper_length_mm + body_mass_g,
## data = penguins)
##
## Coefficients:
## (Intercept) flipper_length_mm body_mass_g
## Chinstrap -31.26682 0.1829957 -0.001303130
## Gentoo -148.36586 0.6534391 0.003295482
##
## Std. Errors:
## (Intercept) flipper_length_mm body_mass_g
## Chinstrap 6.768461e-05 0.008237114 0.0004277966
## Gentoo 1.375134e-04 0.031403517 0.0015044245
##
## Residual Deviance: 268.7316
## AIC: 280.7316
The output estimates how flipper length and body mass jointly
predict penguin species.
Reference species: Adélie
Chinstrap vs Adélie
* Intercept −31.27: Not biologically meaningful
* Flipper length +0.183: Longer flippers = higher odds of Chinstrap
vs Adélie
* Body mass −0.0013: Heavier body mass = slightly lower odds of
Chinstrap vs Adélie
Gentoo vs Adélie
* Intercept −148.37: Large baseline shift
* Flipper length +0.65: Longer flippers = higher odds of Gentoo vs
Adélie
* Body mass +0.0033: Heavier body mass = higher odds of Gentoo vs
Adélie
Standard errors shows the effects are strong and predictions are
highly informative.
Species is a strong indicator of body mass and flipper length with
Gentoos standing out and some overlay between Adélie and Chinstrap.
Coefficients are precise and biologically sensible.
One-Way ANOVA and Tukey Tests
One-Way ANOVA Tests: is a statistical test used to determine whether
there are significant differences between the means of three or more
groups based on one independent variable. It’s basically an extension of
a t-test when you have more than two groups.
Tukey Tests: are post-hoc tests used after a One-Way ANOVA to figure
out which specific groups are different. ANOVA only tells you that at
least one group is different, not which ones.
anova_model <- aov(body_mass_g ~ species, data = penguins)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 146864214 73432107 343.6 <2e-16 ***
## Residuals 339 72443483 213698
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = body_mass_g ~ species, data = penguins)
##
## $species
## diff lwr upr p adj
## Chinstrap-Adelie 32.42598 -126.5002 191.3522 0.8806666
## Gentoo-Adelie 1375.35401 1243.1786 1507.5294 0.0000000
## Gentoo-Chinstrap 1342.92802 1178.4810 1507.3750 0.0000000

Two penguins had missing body mass values and were excluded
automatically.
Chinstrap vs Adélie: no meaningful difference in body mass
Gentoo vs Adélie: meaningful difference in body mass
Gentoo vs Chinstrap: meaningful difference in body mass
A one-way ANOVA showed a significant effect of species on body mass
(F(2, 339) = 343.6, p < 0.001). Tukey’s post-hoc tests revealed that
Gentoo penguins were significantly heavier than both Adélie and
Chinstrap penguins, while no significant difference was observed between
Adélie and Chinstrap.
anova_model <- aov(flipper_length_mm ~ species, data = penguins)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## species 2 52473 26237 594.8 <2e-16 ***
## Residuals 339 14953 44
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = flipper_length_mm ~ species, data = penguins)
##
## $species
## diff lwr upr p adj
## Chinstrap-Adelie 5.869887 3.586583 8.153191 0
## Gentoo-Adelie 27.233349 25.334376 29.132323 0
## Gentoo-Chinstrap 21.363462 19.000841 23.726084 0

Two penguins had missing flipper length values and were excluded
automatically.
Chinstrap vs Adélie: no meaningful difference in flipper length
Gentoo vs Adélie: meaningful difference in flipper length
Gentoo vs Chinstrap: meaningful difference in flipper length
A one-way ANOVA showed a significant effect of species on body mass
(F(2, 339) = 594.8, p < 0.001). Tukey’s post-hoc tests revealed that
Gentoo penguins had significantly longer flippers than both Adélie and
Chinstrap penguins, while no significant difference was observed between
Adélie and Chinstrap.
Findings Summary
There is a strong positive relationship between penguin body mass
and flipper length—heavier penguins tend to have longer flippers.
When comparing species:
Chinstrap vs Adélie: Flipper length slightly increases the odds of
being a Chinstrap, but body mass doesn’t make a meaningful difference.
Overall, Chinstrap and Adélie penguins are quite similar in size and
flipper length.
Gentoo vs Adélie: Both heavier body mass and longer flippers make it
much more likely to be a Gentoo. Gentoos are clearly bigger than Adélie
penguins.
Gentoo vs Chinstrap: Gentoos are heavier and have longer flippers
than Chinstraps.
The ANOVA tests support this:
Species has a significant effect on body mass and flipper
length.
Tukey post-hoc tests show:
Gentoos are significantly heavier and have longer flippers than both
Adélie and Chinstrap penguins.
There’s no significant difference between Adélie and Chinstrap in
either body mass or flipper length.
Two penguins were excluded due to missing measurements, but overall
the results are precise, biologically sensible, and show that species is
a strong indicator of both body size and flipper length, with Gentoos
standing out.
Discussion and Next Steps
This study shows that penguin species differ in body size and
flipper length, with Gentoos standing out as significantly heavier and
having longer flippers than both Adélie and Chinstrap penguins, while
Adélie and Chinstrap are very similar. There is a strong positive
relationship between body mass and flipper length, meaning larger
penguins tend to have longer flippers. These differences are
biologically meaningful and align with adaptations for swimming,
feeding, and survival in different ecological niches. The findings are
useful for species identification in the field, monitoring population
health, and informing conservation strategies. By establishing clear
trait differences, this study also provides a foundation for future
research on penguin ecology, evolution, and environmental
adaptations.
Further analyses could be performed on other variables outside of
body mass and flipper length to find connections between species.
Additionally, body mass and flipper length could be used in more
comparison analyses to further our understanding of their connections to
the three species.
Further studies into the biology behind why Gentroos show
significantly higher body mass and flipper length could be explored.
Studies could examine the different evolutionary conditions that have
separated the species into having the difference in these variables.
Perhaps diet or swimming capibilities could be explored as well to
further observe the differences among the three species.
References
Gorman, K. B., Williams, T. D., & Fraser, W. R. (2014). Palmer
Penguins: A dataset for ecological and evolutionary studies. Ecology,
95(2), 509–509. https://doi.org/10.1890/13-1234.1