Part 1 - Introduction

At Palmer Station in Antarctica, three species of penguins–Adelie, Gentoo, and Chinstrap–are observed within the surrounding islands. The researchers at Palmer Station have sampled over 300 of these penguins, gathering data for specific physical measurements, species classification, and observed location. For this project, our goal is to analyze how the biological sex of the Palmer penguins within each species affects body mass. In doing so, we aim to better understand how the complex relationship between sex and species influence physical traits, contributing to the overall understanding of the ecological dynamics of the Palmer penguins. To achieve this goal, we hope to complete a Two-Way ANOVA to determine the impacts of the categorical, explanatory variables (species and biological sex) on the numerical, response variable (body mass).

The Diverse Penguin Species Around Palmer Station
The Diverse Penguin Species Around Palmer Station

Part 2 - Exploring the Data (Descriptive Statistics)

Loading the data:

#install.packages("tidytuesdayR")
tuesdata <- tidytuesdayR::tt_load('2020-07-28')
## ---- Compiling #TidyTuesday Information for 2020-07-28 ----
## --- There are 2 files available ---
## 
## 
## ── Downloading files ───────────────────────────────────────────────────────────
## 
##   1 of 2: "penguins.csv"
##   2 of 2: "penguins_raw.csv"
PenguinData <- tuesdata$penguins

To remove penguins whose sex could not be identified from our data set, we must omit any missing values within the Palmer penguins data frame

PenguinData <- na.omit(PenguinData)

Before further exploration, let’s observe the body mass from each penguin group within our sample of 333 penguins with a confirmed sex:

is.numeric(PenguinData$body_mass_g)
## [1] TRUE
PenguinData %>% group_by(species,sex) %>% summarize(mean = mean(body_mass_g),
                                                    median=median(body_mass_g),
                                                    sd=sd(body_mass_g),
                                                    se=(sd(body_mass_g)/333))
## `summarise()` has grouped output by 'species'. You can override using the
## `.groups` argument.
## # A tibble: 6 × 6
## # Groups:   species [3]
##   species   sex     mean median    sd    se
##   <chr>     <chr>  <dbl>  <dbl> <dbl> <dbl>
## 1 Adelie    female 3369.   3400  269. 0.809
## 2 Adelie    male   4043.   4000  347. 1.04 
## 3 Chinstrap female 3527.   3550  285. 0.857
## 4 Chinstrap male   3939.   3950  362. 1.09 
## 5 Gentoo    female 4680.   4700  282. 0.846
## 6 Gentoo    male   5485.   5500  313. 0.940

The Two-Way ANOVA test assumes that the data has equal variance and normal distribution, so before we can run our test, we must confirm that these assumptions match our data.

To test for equal variance, we may use Levene’s test:

leveneTest(body_mass_g ~ sex*species, data = PenguinData)
## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   5  1.3908 0.2272
##       327
# The variances are not statistically significant (P=0.2272)

To test for normal distribution, let’s explore the residuals from a linear model through a histogram, Q-Q plot, and a shapiro-wilk:

model.1 <- lm(body_mass_g ~ sex*species, data = PenguinData)
hist(resid(model.1))

# This histogram appears to have a normal distribution, appearing in a standard bell-curve
qqnorm(model.1$residuals)
qqline(model.1$residuals)

# Points in this Q-Q plot appear linear, signifying again that our data is normal
shapiro.test(model.1$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  model.1$residuals
## W = 0.99776, p-value = 0.9367
# The result from the shapiro-wilk test also confirms that the distribution between our groups is normal (P=0.9367)

Let’s visualize the data in a box plot to better show the differences between groups before running our statistical test.

Unfinished.boxplot <- ggplot(PenguinData, aes(x=species, y=body_mass_g, fill=sex, na.rm=TRUE)) + geom_boxplot() 
Penguin.boxplot <- Unfinished.boxplot + geom_jitter(shape=1, position=position_jitter(0.2)) + theme_classic() + labs(x="Species", y="Body Mass (g)", fill="Sex") 
Penguin.boxplot

# On initial observation, the Gentoo species appears to have a significantly higher body mass then the Adelie and Chinstrap species in both sexes. 

Part 3 - Statistical Test (Inferential Statistics)

Now that we have met the assumptions for a Two-Way ANOVA, we are able to run the test. But first, we must understand our hypotheses.

H0 states: within each species, sex has no effect on body mass. HA states: within each species, sex has an effect on body mass.

anova.2 <- aov(model.1)
summary(anova.2)
##              Df    Sum Sq  Mean Sq F value   Pr(>F)    
## sex           1  38878897 38878897 406.145  < 2e-16 ***
## species       2 143401584 71700792 749.016  < 2e-16 ***
## sex:species   2   1676557   838278   8.757 0.000197 ***
## Residuals   327  31302628    95727                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# According to the ANOVA, within each species, sex has an effect on body mass (F(2)=8.757, P=0.000197)

Post Hoc - pairwise comparisons to determine which specific groups are statistically significant from each other:

Custom comparisons need to be made in order to ensure that only valid comparisons (comparisons that differ by one variable). A Bonferroni correction also needs to be included to adjust the p-values when using a statistical test where many individual tests are being performed at once.

To compare the effects of sex within each species on body mass, we set up this comparison: Comparisons within each species:

SexContrasts <- PenguinData %>% 
  group_by(species) %>%
  emmeans_test(body_mass_g ~ sex, p.adjust.method = "bonferroni") 
SexContrasts
## # A tibble: 3 × 10
##   species   term  .y.         group1 group2    df statistic        p    p.adj
## * <fct>     <chr> <chr>       <chr>  <chr>  <dbl>     <dbl>    <dbl>    <dbl>
## 1 Adelie    sex   body_mass_g female male     327    -13.2  4.37e-32 4.37e-32
## 2 Chinstrap sex   body_mass_g female male     327     -5.49 8.19e- 8 8.19e- 8
## 3 Gentoo    sex   body_mass_g female male     327    -14.2  6.13e-36 6.13e-36
## # ℹ 1 more variable: p.adj.signif <chr>
# Within each species, male and female penguins statistically significantly differ in their body mass (P<0.0001)

For further analysis, we’ll compare each species to one another within each sex Comparisons of species across sexes:

SpeciesContrasts <- PenguinData %>% 
  group_by(sex) %>%
  emmeans_test(body_mass_g ~ species, p.adjust.method = "bonferroni") 
SpeciesContrasts
## # A tibble: 6 × 10
##   sex   term  .y.   group1 group2    df statistic        p    p.adj p.adj.signif
## * <fct> <chr> <chr> <chr>  <chr>  <dbl>     <dbl>    <dbl>    <dbl> <chr>       
## 1 fema… spec… body… Adelie Chins…   327     -2.47 1.42e- 2 4.26e- 2 *           
## 2 fema… spec… body… Adelie Gentoo   327    -24.1  1.92e-74 5.76e-74 ****        
## 3 fema… spec… body… Chins… Gentoo   327    -17.2  7.42e-48 2.22e-47 ****        
## 4 male  spec… body… Adelie Chins…   327      1.63 1.05e- 1 3.14e- 1 ns          
## 5 male  spec… body… Adelie Gentoo   327    -26.9  1.03e-84 3.08e-84 ****        
## 6 male  spec… body… Chins… Gentoo   327    -23.3  1.26e-71 3.78e-71 ****

Comparisons that were strongly statistically significant: 1. Female Adelie to Gentoo (P<0.0001) 2. Female Chinstrap to Gentoo (P<0.0001) 3. Male Adelie to Gentoo (P<0.0001) 4. Male Chinstrap to Gentoo (P<0.0001) Comparisons that were weakly statistically significant: 1. Female Adelie to Chinstrap (P=0.00426) Comparisons that were NOT statistically significant: 1. Male Adelie to Chinstrap (P=0.314)

From these comparisons, we can see that species has a strong influence on body mass between both sexes, especially when comparing to the larger Gentoo penguins. Additionally, the difference between Adelie and Chinstrap penguins is heavily impacted by the sex.

Part 4 - Results, Analysis, and Conclusion

The goal of our project was to analyze how the biological sex of the Palmer penguins within each species affects body mass. To do so, we decided to test two hypotheses: H0 states: within each species, sex has no effect on body mass. HA states: within each species, sex has an effect on body mass. To then test our hypotheses, we had to prepare our data to be analysed. We first eliminated any data set where the sex could not be identified. We then tested for equal variance and normal distribution, as we needed to run a Two-Way ANOVA, a statistical test that assumes equal variance. We did a Levene’s test and found that the variance was not statistically significant (P=0.2272). We also did a histogram, Q-Q plot, and finally a Shapiro test, which showed us our data had a normal distribution(P=0.9367). Our data was now ready for a Two-Way ANOVA. Based on our Two-Way ANOVA analysis, we can reject the null hypothesis that both species and biological sex don’t affect the body mass of Palmer penguins. We can also conclude that both species and biological sex significantly affect the body mass of Palmer penguins. The interaction between sex and species was statistically significant (F(2) = 8.757, P = 0.000197), meaning that the effect of sex on body mass varies by species. Post-hoc comparisons with Bonferroni corrections revealed that male and female body mass sizes vary widely by species (P < 0.0001). Moreover, there are statistically significant differences in the sizes of male and female penguins between the species. For example, we found that Gentoo penguins were significantly larger than Adelie and Chinstrap penguins across both sexes. Female Adelie to Chinstrap penguins were weakly significant (P=0.00426), and male Adelie to Chinstrap, were not statistically significant (P = 0.314).

This analysis is important because it provides insight into the physical differences among penguin species at Palmer Station. It helps us to better understand how the complex relationship between sex and species influences physical traits, contributing to the overall understanding of the ecological dynamics of the Palmer penguins. This better understanding of the relationship between body mass and sex goes a long way when looking at larger biological questions like why they differ in size evolutionarily. What pros and cons does their size produce? It also serves as a basis for how we monitor these creatures, as body mass is a key indicator of health.

However, the analysis has limitations. The dataset was filtered to exclude missing values which reduced the sample size and could reduce its accuracy. This data also did not consider other factors, such as age, diet, or season, which may influence body mass. Regardless of these limitations, the results are a good starting point for many studies to come after it.

References

Gorman, K. (2020, July 28). Palmer Penguins. GitHub. https://github.com/rfordatascience/tidytuesday/blob/main/data/2020/2020-07-28/readme.md Horst, A. (n.d.). Palmerpenguins R Data Package. GitHub. https://allisonhorst.github.io/palmerpenguins/