Lab_6_Assignment

Author

Maya Frey

Essentials

1.

Load PalmerPenguins and other necessary packages

library(palmerpenguins)
library(ggplot2)
library(tidyverse) 
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ lubridate 1.9.2     ✔ tibble    3.1.8
✔ purrr     1.0.1     ✔ tidyr     1.3.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(ggsci)
library(patchwork)
library(performance)
library(see)

Attaching package: 'see'

The following objects are masked from 'package:ggsci':

    scale_color_material, scale_colour_material, scale_fill_material
library(car)
Loading required package: carData

Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some
library(rstatix)

Attaching package: 'rstatix'

The following object is masked from 'package:stats':

    filter

2.

Using the penguins data, perform a 1-way ANOVA involving the effect of a categorical variable (x) on a numerical variable (y). Group and filter the data (remove NA, for example), calculate means and error, then make a graph. Pair that graph with an ANOVA test. Use the graph + statistical test to assess the null hypothesis of ANOVA.

penguin <- penguins %>%
  drop_na()
# ANOVA test
anov_p <- aov(flipper_length_mm ~ species, data = penguin)
summary(anov_p)
             Df Sum Sq Mean Sq F value Pr(>F)    
species       2  50526   25263   567.4 <2e-16 ***
Residuals   330  14693      45                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Graph data
penguins_fl <- penguin %>%
  group_by(species) %>%
  drop_na() %>%
  summarize(mean = mean(flipper_length_mm), sd = sd(flipper_length_mm), n = n(), se = sd/sqrt(n))
ggplot(data = penguins_fl, aes(x = species, y = mean, color = species)) + 
  geom_point() +
  geom_errorbar(data = penguins_fl, aes(x = species, ymin = mean - se, ymax = mean + se)) +
  theme_bw()

\(H_0\): There is no difference in mean flipper length between penguin species. \(H_A\): At least one species has a significantly different mean flipper length from the other species. Interpretation: The p-value from the ANOVA test is less than 0.05, indicating that at least one species has a significantly different mean flipper length from the other species. Therefore, we reject our null hypothesis.

3.

Test your assumptions (individually– do not use check_model) and interpret your assumption checks.

# Outliers
ggplot(data = penguin, aes(x = species, y = flipper_length_mm)) + 
  geom_boxplot() + 
  theme_bw()

# Normality
penguin %>%
  group_by(species) %>%
  shapiro_test(flipper_length_mm)
# A tibble: 3 × 4
  species   variable          statistic       p
  <fct>     <chr>                 <dbl>   <dbl>
1 Adelie    flipper_length_mm     0.993 0.743  
2 Chinstrap flipper_length_mm     0.989 0.811  
3 Gentoo    flipper_length_mm     0.961 0.00176
# Homoscedasticity
leveneTest(flipper_length_mm ~ species, data = penguin)
Levene's Test for Homogeneity of Variance (center = median)
       Df F value Pr(>F)
group   2  0.4428 0.6426
      330               

Independence: We do not know enough about the experimental design to evaluate the independence of the observations in this data set. However, since this is a lab we will continue with the test anyway.

Outliers: There are two outliers in the Adelie species. I am keeping these outliers in the data set as they most likely represent penguins that had abnormally small or large flippers.

Normality: We cannot assume normality for flipper length for all of the species, since the p-value for the Gentoo species is less than 0.05. However, the sample sizes are likely large enough that we can ignore the fact that normality is violated.

Homoscedasticity: The levene test has a p-value greater than 0.05. This shows that there is not a significant difference in variances between the groups and we can continue the ANOVA without violating this assumption.

4.

Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).

TukeyHSD(anov_p)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = flipper_length_mm ~ species, data = penguin)

$species
                     diff       lwr       upr p adj
Chinstrap-Adelie  5.72079  3.414364  8.027215     0
Gentoo-Adelie    27.13255 25.192399 29.072709     0
Gentoo-Chinstrap 21.41176 19.023644 23.799885     0

The p-values for each of the comparisons between groups is less than 0.05, therefore all of the groups are significantly different from each other.

Depth

Repeat 1-4 above using 2 or more explanatory variables from Palmer Penguins. Assess the effecs of multiple variable on a single numerical variable in the data frame. Make a graph or graphs (if needed). You will need to group, filter, and summarize your data to do this. Perform the necessary ANOVA and TukeyHSD test. Interpret your results (using the graph(s) and stats outputs). Check your assumptions and interpret your assumption check (you can do this individually or with check_model() if you can get the later to work with your ANOVA– it is not optimized for ANOVA and often does not work)

2

Perform ANOVA and make a graph

# ANOVA
anov_mp <- aov(body_mass_g ~ species * sex, data = penguin)
summary(anov_mp)
             Df    Sum Sq  Mean Sq F value   Pr(>F)    
species       2 145190219 72595110 758.358  < 2e-16 ***
sex           1  37090262 37090262 387.460  < 2e-16 ***
species:sex   2   1676557   838278   8.757 0.000197 ***
Residuals   327  31302628    95727                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Graph
penguins_bm <- penguin %>%
  group_by(species, sex) %>%
  drop_na() %>%
  summarize(mean = mean(body_mass_g), sd = sd(body_mass_g), n = n(), se = sd/sqrt(n))
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
ggplot(data = penguins_bm, aes(x = species, y = mean, color = sex)) +
  geom_point() +
  geom_errorbar(data = penguins_bm, aes(x = species, ymin = mean - se, ymax = mean + se)) +
  theme_bw()

\(H_0\): There is no interactive effect between species and sex on mean body mass or an effect of species on mean body mass or an effect of sex on mean body mass. \(H_A\): There is an interactive effect between species and sex on mean body mass.

Interpretation: The p-values for the effect of species and sex on mean body mass, both separately and as an interactive effect are less than 0.05. Therefore, we reject the null hypotheses as there is a significant interactive effect between species and sex on mean body mass.

3. Test your assumptions and interpret your assumption checks.

check_model(anov_mp)
Variable `Component` is not in your data frame :/

# Outliers
ggplot(data = penguin, aes(x = species, y = body_mass_g, color = sex)) +
  geom_boxplot() +
  theme_bw()

Independence: We do not know enough about the experimental design to evaluate the independence of the observations in this data set. However, since this is a lab we will continue with the test anyway.

Outliers: The boxplot shows two outliers in the Chinstrap species. I am keeping these outliers in the data set as they most likely represent abnormally large and small penguins.

Normality: The normality of residuals plot in the check model output shows that we can assume normality for the data in the model, as the dots fall along the line.

Homoscedasticity: The homogeneity of variance plot in the check model output shows relatively flat and horizontal reference line. This shows that there is not a significant difference in variances between the groups and we can continue the ANOVA without violating this assumption.

4.

Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).

TukeyHSD(anov_mp)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = body_mass_g ~ species * sex, data = penguin)

$species
                       diff       lwr       upr     p adj
Chinstrap-Adelie   26.92385  -80.0258  133.8735 0.8241288
Gentoo-Adelie    1386.27259 1296.3070 1476.2382 0.0000000
Gentoo-Chinstrap 1359.34874 1248.6108 1470.0866 0.0000000

$sex
                diff      lwr      upr p adj
male-female 667.4577 600.7462 734.1692     0

$`species:sex`
                                     diff       lwr       upr     p adj
Chinstrap:female-Adelie:female   158.3703  -25.7874  342.5279 0.1376213
Gentoo:female-Adelie:female     1310.9058 1154.8934 1466.9181 0.0000000
Adelie:male-Adelie:female        674.6575  527.8486  821.4664 0.0000000
Chinstrap:male-Adelie:female     570.1350  385.9773  754.2926 0.0000000
Gentoo:male-Adelie:female       2116.0004 1962.1408 2269.8601 0.0000000
Gentoo:female-Chinstrap:female  1152.5355  960.9603 1344.1107 0.0000000
Adelie:male-Chinstrap:female     516.2873  332.1296  700.4449 0.0000000
Chinstrap:male-Chinstrap:female  411.7647  196.6479  626.8815 0.0000012
Gentoo:male-Chinstrap:female    1957.6302 1767.8040 2147.4564 0.0000000
Adelie:male-Gentoo:female       -636.2482 -792.2606 -480.2359 0.0000000
Chinstrap:male-Gentoo:female    -740.7708 -932.3460 -549.1956 0.0000000
Gentoo:male-Gentoo:female        805.0947  642.4300  967.7594 0.0000000
Chinstrap:male-Adelie:male      -104.5226 -288.6802   79.6351 0.5812048
Gentoo:male-Adelie:male         1441.3429 1287.4832 1595.2026 0.0000000
Gentoo:male-Chinstrap:male      1545.8655 1356.0392 1735.6917 0.0000000

Interpretation:

  • For species, there is a significant difference between the mean body mass of Gentoo and Adelie, as well as Gentoo and Chinstrap as the p-value is less than 0.05. However, there is not a significant difference between the mean body mass of Chinstrap and Adelie as the p-value is greater than 0.05.

  • For sex, there is a significant difference in mean body mass between male and female as the p-value is less than 0.05. There are significant interactive effects between sex and species on mean body mass for all groups except Chinstrap and Adelie males/males and females/females. The p-values for the interactions of all groups except those two are less than 0.05.

  • This makes sense as Chinstrap and Adelie penguins are very similar in size, and when the effect of species alone on body mass for these two species is considered, the interaction is not significant. It also makes sense that the interactive effect is significant for males/females for Chinstrap and Adelie, since I would expect males and females to be the most different even if the species are very similar.