Overview of Data

The Palmer Penguin data was collected by a group of scientists near the Palmer Station in Antartica. This biological research was conducted for the purpose of tracking various information on three species of penguins (Adélie, Chinstrap, and Gentroo) found in the area from 2007-2009. The information tracked includes: bill length, bill depth, flipper length, body mass, and sex. Its importance comes from why it exists. It was created as a modern, ethical, and pedagogically superior alternative to the infamous Iris dataset. Iris is overused, sterile, and about flowers that never get cold. Palmer Penguins gives learners something richer: multivariate data with missing values, real measurement noise, categorical variables, and biological meaning. For the purpose of this analysis, I will be addressing how reliably body mass and flipper length can be used to predict species.

Question: Can body mass and flipper length reliably predict a penguin’s species?

Above you can see the study measured various variables regarding the Adélie, Chinstrap, and Gentroo penguin species located on Torgersen island. These variables included bill length, bill depth, flipper length, body mass, and sex. This study was conducted between the years of 2007 and 2009.

Next, I will run the numbers for the sample size on each species, the mean for both body mass and flipper size (our variables of concentration), and the standard deviation for both numerical variables.

Basic statistical descriptors

Sample size of penguins per species:

* Adélie = 152

* Chinstrap = 68

* Gentroo = 124

Mean Body mass by species:

* Adélie = 3700.66

* Chinstrap = 3733.09

* Gentroo = 5076.02

Mean flipper length by species:

* Adélie = 189.95

* Chinstrap = 195.82

* Gentroo = 217.19

Standard deviation of body mass by species:

* Adélie = 458.57

* Chinstrap = 384.34

* Gentroo = 504.12

Standard deviation of flipper length by species:

* Adélie = 6.54

* Chinstrap = 7.13

* Gentroo = 6.48

Statistical Analysis

For my statistical analyses, I will be using the correlation coefficient, multinomial logistic regression, one-way ANOVA tests, and post-hoc tukey tests.

Correlation coefficient: is a number that measures the strength and direction of a relationship between two variables. It’s usually represented by r and ranges from -1 to +1.

cor(penguins$body_mass_g, penguins$flipper_length_mm, use = "complete.obs")
## [1] 0.8712018

Shows positive linear relationship (+0.87) between body mass and flipper length

Multinomial logistic regression: is a type of regression used when your dependent variable (outcome) is categorical with more than two categories—and those categories don’t have an inherent order.

multi_model <- multinom(species ~ flipper_length_mm + body_mass_g, data = penguins)
## # weights:  12 (6 variable)
## initial  value 375.725403 
## iter  10 value 186.747902
## iter  20 value 139.450263
## iter  30 value 136.034343
## iter  40 value 134.983908
## iter  50 value 134.602842
## final  value 134.365818 
## converged
summary(multi_model)
## Call:
## multinom(formula = species ~ flipper_length_mm + body_mass_g, 
##     data = penguins)
## 
## Coefficients:
##           (Intercept) flipper_length_mm  body_mass_g
## Chinstrap   -31.26682         0.1829957 -0.001303130
## Gentoo     -148.36586         0.6534391  0.003295482
## 
## Std. Errors:
##            (Intercept) flipper_length_mm  body_mass_g
## Chinstrap 6.768461e-05       0.008237114 0.0004277966
## Gentoo    1.375134e-04       0.031403517 0.0015044245
## 
## Residual Deviance: 268.7316 
## AIC: 280.7316

The output estimates how flipper length and body mass jointly predict penguin species.

Reference species: Adélie

Chinstrap vs Adélie

* Intercept −31.27: Not biologically meaningful

* Flipper length +0.183: Longer flippers = higher odds of Chinstrap vs Adélie

* Body mass −0.0013: Heavier body mass = slightly lower odds of Chinstrap vs Adélie

Gentoo vs Adélie

* Intercept −148.37: Large baseline shift

* Flipper length +0.65: Longer flippers = higher odds of Gentoo vs Adélie

* Body mass +0.0033: Heavier body mass = higher odds of Gentoo vs Adélie

Standard errors shows the effects are strong and predictions are highly informative.

Species is a strong indicator of body mass and flipper length with Gentoos standing out and some overlay between Adélie and Chinstrap. Coefficients are precise and biologically sensible.

One-Way ANOVA and Tukey Tests

One-Way ANOVA Tests: is a statistical test used to determine whether there are significant differences between the means of three or more groups based on one independent variable. It’s basically an extension of a t-test when you have more than two groups.

Tukey Tests: are post-hoc tests used after a One-Way ANOVA to figure out which specific groups are different. ANOVA only tells you that at least one group is different, not which ones.

anova_model <- aov(body_mass_g ~ species, data = penguins)
summary(anova_model)
##              Df    Sum Sq  Mean Sq F value Pr(>F)    
## species       2 146864214 73432107   343.6 <2e-16 ***
## Residuals   339  72443483   213698                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = body_mass_g ~ species, data = penguins)
## 
## $species
##                        diff       lwr       upr     p adj
## Chinstrap-Adelie   32.42598 -126.5002  191.3522 0.8806666
## Gentoo-Adelie    1375.35401 1243.1786 1507.5294 0.0000000
## Gentoo-Chinstrap 1342.92802 1178.4810 1507.3750 0.0000000

Two penguins had missing body mass values and were excluded automatically.

Chinstrap vs Adélie: no meaningful difference in body mass

Gentoo vs Adélie: meaningful difference in body mass

Gentoo vs Chinstrap: meaningful difference in body mass

A one-way ANOVA showed a significant effect of species on body mass (F(2, 339) = 343.6, p < 0.001). Tukey’s post-hoc tests revealed that Gentoo penguins were significantly heavier than both Adélie and Chinstrap penguins, while no significant difference was observed between Adélie and Chinstrap.

anova_model <- aov(flipper_length_mm ~ species, data = penguins)
summary(anova_model)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## species       2  52473   26237   594.8 <2e-16 ***
## Residuals   339  14953      44                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 2 observations deleted due to missingness
TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = flipper_length_mm ~ species, data = penguins)
## 
## $species
##                       diff       lwr       upr p adj
## Chinstrap-Adelie  5.869887  3.586583  8.153191     0
## Gentoo-Adelie    27.233349 25.334376 29.132323     0
## Gentoo-Chinstrap 21.363462 19.000841 23.726084     0

Two penguins had missing flipper length values and were excluded automatically.

Chinstrap vs Adélie: no meaningful difference in flipper length

Gentoo vs Adélie: meaningful difference in flipper length

Gentoo vs Chinstrap: meaningful difference in flipper length

A one-way ANOVA showed a significant effect of species on body mass (F(2, 339) = 594.8, p < 0.001). Tukey’s post-hoc tests revealed that Gentoo penguins had significantly longer flippers than both Adélie and Chinstrap penguins, while no significant difference was observed between Adélie and Chinstrap.

Findings Summary

There is a strong positive relationship between penguin body mass and flipper length—heavier penguins tend to have longer flippers.

When comparing species:

Chinstrap vs Adélie: Flipper length slightly increases the odds of being a Chinstrap, but body mass doesn’t make a meaningful difference. Overall, Chinstrap and Adélie penguins are quite similar in size and flipper length.

Gentoo vs Adélie: Both heavier body mass and longer flippers make it much more likely to be a Gentoo. Gentoos are clearly bigger than Adélie penguins.

Gentoo vs Chinstrap: Gentoos are heavier and have longer flippers than Chinstraps.

The ANOVA tests support this:

Species has a significant effect on body mass and flipper length.

Tukey post-hoc tests show:

Gentoos are significantly heavier and have longer flippers than both Adélie and Chinstrap penguins.

There’s no significant difference between Adélie and Chinstrap in either body mass or flipper length.

Two penguins were excluded due to missing measurements, but overall the results are precise, biologically sensible, and show that species is a strong indicator of both body size and flipper length, with Gentoos standing out.

Discussion and Next Steps

This study shows that penguin species differ in body size and flipper length, with Gentoos standing out as significantly heavier and having longer flippers than both Adélie and Chinstrap penguins, while Adélie and Chinstrap are very similar. There is a strong positive relationship between body mass and flipper length, meaning larger penguins tend to have longer flippers. These differences are biologically meaningful and align with adaptations for swimming, feeding, and survival in different ecological niches. The findings are useful for species identification in the field, monitoring population health, and informing conservation strategies. By establishing clear trait differences, this study also provides a foundation for future research on penguin ecology, evolution, and environmental adaptations.

Further analyses could be performed on other variables outside of body mass and flipper length to find connections between species. Additionally, body mass and flipper length could be used in more comparison analyses to further our understanding of their connections to the three species.

Further studies into the biology behind why Gentroos show significantly higher body mass and flipper length could be explored. Studies could examine the different evolutionary conditions that have separated the species into having the difference in these variables. Perhaps diet or swimming capibilities could be explored as well to further observe the differences among the three species.

References

Gorman, K. B., Williams, T. D., & Fraser, W. R. (2014). Palmer Penguins: A dataset for ecological and evolutionary studies. Ecology, 95(2), 509–509. https://doi.org/10.1890/13-1234.1