Background

An animal’s sex can determine its characteristics, and in many, but not all cases, males have larger physical features than females. To determine if this general rule of thumb applies to Adelie penguins, an analysis of bill length and bill depth was conducted.

Artwork by @allison_horst

The analysis was conducted using the palmerpenguins data set from Horst, Hill, and Gorman (2020). The data set can be accessed through Github.

# Sub-setting dataframe for Adelie penguins
adelie <- subset(penguins, species == "Adelie")

# Sub-setting dataframe for male penguins
adelie_male <- subset(adelie, sex == "male")

# Sub-setting dataframe for female penguins
adelie_female <- subset(adelie, sex == "female")

Normality of Data

To examine the normality of the data, Q-Q plots were generated using the ggplot2 package. If the majority of data points fell on or near the line y = x, then the data were determined to be normal. Based on the Q-Q plots, it appears that the data are normally distributed.

# QQ plot for male bill length
m_bl_qq <- ggplot(adelie_male, aes(sample = bill_length_mm)) + 
  geom_qq(color = "red") + 
  geom_qq_line(color = "red") + 
  ggtitle("Male Bill Length") +
  theme_bw()

# QQ plot for female bill length
f_bl_qq <- ggplot(adelie_female, aes(sample = bill_length_mm)) + 
  geom_qq(color = "blue") + 
  geom_qq_line(color = "blue") + 
  ggtitle("Female Bill Length") +
  theme_bw()

# QQ plot for male bill length
m_bd_qq <- ggplot(adelie_male, aes(sample = bill_depth_mm)) + 
  geom_qq(color = "red") + 
  geom_qq_line(color = "red") + 
  ggtitle("Male Bill Depth") + 
  theme_bw()

# QQ plot for female bill length
f_bd_qq <- ggplot(adelie_female, aes(sample = bill_depth_mm)) + 
  geom_qq(color = "blue") + 
  geom_qq_line(color = "blue") + 
  ggtitle("Female Bill Depth") + 
  theme_bw()

# Side-by-side QQ plots
grid.arrange(m_bl_qq, f_bl_qq, m_bd_qq, f_bd_qq, ncol=2)

Variance

To compare the variance of the two groups an F ratio was constructed using the var.test function with a significance level of 0.05.

The null hypothesis of the variance test was

\[\frac{S^2_{male}}{S^2_{female}} = 1\] and the alternative hypothesis of the variance test was

\[\frac{S^2_{male}}{S^2_{female}} \ne 1\]

If the p-value was less than the level of significance, the null hypothesis was rejected in favor of the alternative hypothesis.

Bill Length

# Variance test for bill length
var.test(adelie_male$bill_length_mm, adelie_female$bill_length_mm)
## 
##  F test to compare two variances
## 
## data:  adelie_male$bill_length_mm and adelie_female$bill_length_mm
## F = 1.2597, num df = 72, denom df = 72, p-value = 0.3296
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7907415 2.0067311
## sample estimates:
## ratio of variances 
##           1.259685

Bill Depth

# Variance test for bill depth
var.test(adelie_male$bill_depth_mm, adelie_female$bill_depth_mm)
## 
##  F test to compare two variances
## 
## data:  adelie_male$bill_depth_mm and adelie_female$bill_depth_mm
## F = 1.1674, num df = 72, denom df = 72, p-value = 0.513
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.7328362 1.8597799
## sample estimates:
## ratio of variances 
##           1.167439

The test produced p-values of 0.3296 and 0.513 for bill length and bill depth, respectively. The p-values for both tests are greater than the significance level; therefore, the null hypotheses were not rejected, indicating the variance for both pairings are approximately equal. Due to the equal variances across groups, the var.equal parameter was scored as TRUE in the subsequent t-tests.

Comparison of Means

Bill Length

# Box plot for bill length in male Adelie penguins
bl_male <- ggplot(adelie_male, aes(y=bill_length_mm)) + 
  geom_boxplot(color = "red") +
  ylab("Bill Length in MM") +
  ylim(32, 46) +
  ggtitle("Male") + 
  theme_bw()

# Box plot for bill length in female Adelie penguins
bl_female <- ggplot(adelie_female, aes(y=bill_length_mm)) + 
  geom_boxplot(color = "blue") + 
  ylab("Bill Length in MM") +
  ylim(32, 46) +
  ggtitle("Female") + 
  theme_bw()

# Side-by-side box plots
grid.arrange(bl_male, bl_female, ncol=2)

A visual inspection of the box plots indicated that mean bill length for female Adelie penguins was less than the mean bill length for male Adelie penguins. The information was then used to generate the null and alternative hypotheses for a two sample t-test using a significance level of 0.05.

The null hypothesis for the bill length t-test was \[mean_{male} = mean_{female}\]

The alternative hypothesis for the bill length t-test was \[mean_{male} > mean_{female}\]

t.test(adelie_male$bill_length_mm, adelie_female$bill_length_mm, paired = F, var.equal = TRUE, alternative = "g")
## 
##  Two Sample t-test
## 
## data:  adelie_male$bill_length_mm and adelie_female$bill_length_mm
## t = 8.7765, df = 144, p-value = 2.22e-15
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  2.541928      Inf
## sample estimates:
## mean of x mean of y 
##  40.39041  37.25753

The p-value of the t-test is 2.22e-15, which is less than the significance level of the test, indicating the null hypothesis should be rejected in favor of the alternative hypothesis. More specifically, the mean bill length for males is greater than the mean bill length for females in Adelie penguins.

Bill Depth

# Box plot for bill depth in male Adelie penguins
bd_male <- ggplot(adelie_male, aes(y=bill_depth_mm)) + 
  geom_boxplot(color = "red") +
  ylab("Bill Depth in MM") +
  ylim(15, 22) +
  ggtitle("Male") + 
  theme_bw()

# Box plot for bill depth in female Adelie penguins
bd_female <- ggplot(adelie_female, aes(y=bill_depth_mm)) + 
  geom_boxplot(color = "blue") + 
  ylab("Bill Depth in MM") +
  ylim(15, 22) +
  ggtitle("Female") + 
  theme_bw()

# Side-by-side box plots
grid.arrange(bd_male, bd_female, ncol=2)

A visual inspection of the box plots indicated that mean bill depth for female Adelie penguins was less than the mean bill depth for male Adelie penguins. The information was used to generate the null and alternative hypotheses for a two sample t-test using a significance level of 0.05.

The null hypothesis for the bill depth t-test was \[mean_{male} = mean_{female}\]

The alternative hypothesis for the bill depth t-test was \[mean_{male} > mean_{female}\]

t.test(adelie_male$bill_depth_mm, adelie_female$bill_depth_mm, paired = F, var.equal = TRUE, alternative = "g")
## 
##  Two Sample t-test
## 
## data:  adelie_male$bill_depth_mm and adelie_female$bill_depth_mm
## t = 8.928, df = 144, p-value = 9.221e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  1.181686      Inf
## sample estimates:
## mean of x mean of y 
##  19.07260  17.62192

The p-value of the t-test was 9.221e-16, which is less than the significance level of the test, indicating the null hypothesis should be rejected in favor of the alternative hypothesis. In other words, the mean bill depth for males is greater than the mean bill depth for females in Adelie penguins.

Conclusion

The results of the analysis correspond with the general rule of thumb that males tend to have larger physical characteristics than females. However, due to the low number of characteristics examined the results should interpreted cautiously.

Citations

citation("palmerpenguins")
## 
## To cite palmerpenguins in publications use:
## 
##   Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer
##   Archipelago (Antarctica) penguin data. R package version 0.1.0.
##   https://allisonhorst.github.io/palmerpenguins/. doi:
##   10.5281/zenodo.3960218.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data},
##     author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman},
##     year = {2020},
##     note = {R package version 0.1.0},
##     doi = {10.5281/zenodo.3960218},
##     url = {https://allisonhorst.github.io/palmerpenguins/},
##   }

Session Info

sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] gridExtra_2.3        ggplot2_3.3.5        palmerpenguins_0.1.1
## 
## loaded via a namespace (and not attached):
##  [1] highr_0.9        bslib_0.3.1      compiler_4.0.4   pillar_1.7.0    
##  [5] jquerylib_0.1.4  tools_4.0.4      digest_0.6.29    jsonlite_1.8.0  
##  [9] evaluate_0.15    lifecycle_1.0.1  tibble_3.1.6     gtable_0.3.0    
## [13] pkgconfig_2.0.3  rlang_1.0.2      DBI_1.1.2        cli_3.2.0       
## [17] rstudioapi_0.13  yaml_2.3.5       xfun_0.30        fastmap_1.1.0   
## [21] withr_2.5.0      dplyr_1.0.8      stringr_1.4.0    knitr_1.38      
## [25] generics_0.1.2   sass_0.4.1       vctrs_0.3.8      tidyselect_1.1.2
## [29] grid_4.0.4       glue_1.6.2       R6_2.5.1         fansi_1.0.3     
## [33] rmarkdown_2.13   farver_2.1.0     purrr_0.3.4      magrittr_2.0.3  
## [37] scales_1.2.0     htmltools_0.5.2  ellipsis_0.3.2   assertthat_0.2.1
## [41] colorspace_2.0-3 labeling_0.4.2   utf8_1.2.2       stringi_1.7.6   
## [45] munsell_0.5.0    crayon_1.5.1