An animal’s sex can determine its characteristics, and in many, but not all cases, males have larger physical features than females. To determine if this general rule of thumb applies to Adelie penguins, an analysis of bill length and bill depth was conducted.
Artwork by @allison_horst
The analysis was conducted using the palmerpenguins data
set from Horst, Hill, and Gorman (2020). The data set can be accessed
through Github.
# Sub-setting dataframe for Adelie penguins
adelie <- subset(penguins, species == "Adelie")
# Sub-setting dataframe for male penguins
adelie_male <- subset(adelie, sex == "male")
# Sub-setting dataframe for female penguins
adelie_female <- subset(adelie, sex == "female")
To examine the normality of the data, Q-Q plots were generated using
the ggplot2 package. If the majority of data points fell
on or near the line y = x, then the data were determined to be normal.
Based on the Q-Q plots, it appears that the data are normally
distributed.
# QQ plot for male bill length
m_bl_qq <- ggplot(adelie_male, aes(sample = bill_length_mm)) +
geom_qq(color = "red") +
geom_qq_line(color = "red") +
ggtitle("Male Bill Length") +
theme_bw()
# QQ plot for female bill length
f_bl_qq <- ggplot(adelie_female, aes(sample = bill_length_mm)) +
geom_qq(color = "blue") +
geom_qq_line(color = "blue") +
ggtitle("Female Bill Length") +
theme_bw()
# QQ plot for male bill length
m_bd_qq <- ggplot(adelie_male, aes(sample = bill_depth_mm)) +
geom_qq(color = "red") +
geom_qq_line(color = "red") +
ggtitle("Male Bill Depth") +
theme_bw()
# QQ plot for female bill length
f_bd_qq <- ggplot(adelie_female, aes(sample = bill_depth_mm)) +
geom_qq(color = "blue") +
geom_qq_line(color = "blue") +
ggtitle("Female Bill Depth") +
theme_bw()
# Side-by-side QQ plots
grid.arrange(m_bl_qq, f_bl_qq, m_bd_qq, f_bd_qq, ncol=2)
To compare the variance of the two groups an F ratio was constructed
using the var.test function with a significance level of
0.05.
The null hypothesis of the variance test was
\[\frac{S^2_{male}}{S^2_{female}} = 1\] and the alternative hypothesis of the variance test was
\[\frac{S^2_{male}}{S^2_{female}} \ne 1\]
If the p-value was less than the level of significance, the null hypothesis was rejected in favor of the alternative hypothesis.
# Variance test for bill length
var.test(adelie_male$bill_length_mm, adelie_female$bill_length_mm)
##
## F test to compare two variances
##
## data: adelie_male$bill_length_mm and adelie_female$bill_length_mm
## F = 1.2597, num df = 72, denom df = 72, p-value = 0.3296
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7907415 2.0067311
## sample estimates:
## ratio of variances
## 1.259685
# Variance test for bill depth
var.test(adelie_male$bill_depth_mm, adelie_female$bill_depth_mm)
##
## F test to compare two variances
##
## data: adelie_male$bill_depth_mm and adelie_female$bill_depth_mm
## F = 1.1674, num df = 72, denom df = 72, p-value = 0.513
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7328362 1.8597799
## sample estimates:
## ratio of variances
## 1.167439
The test produced p-values of 0.3296 and 0.513 for bill length and
bill depth, respectively. The p-values for both tests are greater than
the significance level; therefore, the null hypotheses were not
rejected, indicating the variance for both pairings are approximately
equal. Due to the equal variances across groups, the
var.equal parameter was scored as TRUE in
the subsequent t-tests.
# Box plot for bill length in male Adelie penguins
bl_male <- ggplot(adelie_male, aes(y=bill_length_mm)) +
geom_boxplot(color = "red") +
ylab("Bill Length in MM") +
ylim(32, 46) +
ggtitle("Male") +
theme_bw()
# Box plot for bill length in female Adelie penguins
bl_female <- ggplot(adelie_female, aes(y=bill_length_mm)) +
geom_boxplot(color = "blue") +
ylab("Bill Length in MM") +
ylim(32, 46) +
ggtitle("Female") +
theme_bw()
# Side-by-side box plots
grid.arrange(bl_male, bl_female, ncol=2)
A visual inspection of the box plots indicated that mean bill length for female Adelie penguins was less than the mean bill length for male Adelie penguins. The information was then used to generate the null and alternative hypotheses for a two sample t-test using a significance level of 0.05.
The null hypothesis for the bill length t-test was \[mean_{male} = mean_{female}\]
The alternative hypothesis for the bill length t-test was \[mean_{male} > mean_{female}\]
t.test(adelie_male$bill_length_mm, adelie_female$bill_length_mm, paired = F, var.equal = TRUE, alternative = "g")
##
## Two Sample t-test
##
## data: adelie_male$bill_length_mm and adelie_female$bill_length_mm
## t = 8.7765, df = 144, p-value = 2.22e-15
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 2.541928 Inf
## sample estimates:
## mean of x mean of y
## 40.39041 37.25753
The p-value of the t-test is 2.22e-15, which is less than the significance level of the test, indicating the null hypothesis should be rejected in favor of the alternative hypothesis. More specifically, the mean bill length for males is greater than the mean bill length for females in Adelie penguins.
# Box plot for bill depth in male Adelie penguins
bd_male <- ggplot(adelie_male, aes(y=bill_depth_mm)) +
geom_boxplot(color = "red") +
ylab("Bill Depth in MM") +
ylim(15, 22) +
ggtitle("Male") +
theme_bw()
# Box plot for bill depth in female Adelie penguins
bd_female <- ggplot(adelie_female, aes(y=bill_depth_mm)) +
geom_boxplot(color = "blue") +
ylab("Bill Depth in MM") +
ylim(15, 22) +
ggtitle("Female") +
theme_bw()
# Side-by-side box plots
grid.arrange(bd_male, bd_female, ncol=2)
A visual inspection of the box plots indicated that mean bill depth for female Adelie penguins was less than the mean bill depth for male Adelie penguins. The information was used to generate the null and alternative hypotheses for a two sample t-test using a significance level of 0.05.
The null hypothesis for the bill depth t-test was \[mean_{male} = mean_{female}\]
The alternative hypothesis for the bill depth t-test was \[mean_{male} > mean_{female}\]
t.test(adelie_male$bill_depth_mm, adelie_female$bill_depth_mm, paired = F, var.equal = TRUE, alternative = "g")
##
## Two Sample t-test
##
## data: adelie_male$bill_depth_mm and adelie_female$bill_depth_mm
## t = 8.928, df = 144, p-value = 9.221e-16
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 1.181686 Inf
## sample estimates:
## mean of x mean of y
## 19.07260 17.62192
The p-value of the t-test was 9.221e-16, which is less than the significance level of the test, indicating the null hypothesis should be rejected in favor of the alternative hypothesis. In other words, the mean bill depth for males is greater than the mean bill depth for females in Adelie penguins.
The results of the analysis correspond with the general rule of thumb that males tend to have larger physical characteristics than females. However, due to the low number of characteristics examined the results should interpreted cautiously.
citation("palmerpenguins")
##
## To cite palmerpenguins in publications use:
##
## Horst AM, Hill AP, Gorman KB (2020). palmerpenguins: Palmer
## Archipelago (Antarctica) penguin data. R package version 0.1.0.
## https://allisonhorst.github.io/palmerpenguins/. doi:
## 10.5281/zenodo.3960218.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {palmerpenguins: Palmer Archipelago (Antarctica) penguin data},
## author = {Allison Marie Horst and Alison Presmanes Hill and Kristen B Gorman},
## year = {2020},
## note = {R package version 0.1.0},
## doi = {10.5281/zenodo.3960218},
## url = {https://allisonhorst.github.io/palmerpenguins/},
## }
sessionInfo()
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] gridExtra_2.3 ggplot2_3.3.5 palmerpenguins_0.1.1
##
## loaded via a namespace (and not attached):
## [1] highr_0.9 bslib_0.3.1 compiler_4.0.4 pillar_1.7.0
## [5] jquerylib_0.1.4 tools_4.0.4 digest_0.6.29 jsonlite_1.8.0
## [9] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.6 gtable_0.3.0
## [13] pkgconfig_2.0.3 rlang_1.0.2 DBI_1.1.2 cli_3.2.0
## [17] rstudioapi_0.13 yaml_2.3.5 xfun_0.30 fastmap_1.1.0
## [21] withr_2.5.0 dplyr_1.0.8 stringr_1.4.0 knitr_1.38
## [25] generics_0.1.2 sass_0.4.1 vctrs_0.3.8 tidyselect_1.1.2
## [29] grid_4.0.4 glue_1.6.2 R6_2.5.1 fansi_1.0.3
## [33] rmarkdown_2.13 farver_2.1.0 purrr_0.3.4 magrittr_2.0.3
## [37] scales_1.2.0 htmltools_0.5.2 ellipsis_0.3.2 assertthat_0.2.1
## [41] colorspace_2.0-3 labeling_0.4.2 utf8_1.2.2 stringi_1.7.6
## [45] munsell_0.5.0 crayon_1.5.1