Attaching package: 'see'
The following objects are masked from 'package:ggsci':
scale_color_material, scale_colour_material, scale_fill_material
library(rstatix)
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
library(car)
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
Using the penguins data, perform a 1-way ANOVA involving the effect of a categorical variable (x) on a numerical variable (y). Group and filter the data (remove NA, for example), calculate means and error, then make a graph. Pair that graph with an ANOVA test. Use the graph + statistical test to assess the null hypothesis of ANOVA.
H0: no difference in the average body masses between the islands HA: there is some difference in the body masses between the islands alpha = 0.05
Our H0 would be that there is no difference in the average body masses of penguins between islands. The graph visually indicates that we might be able to reject the null because of the difference between the average body mass on Biscoe compared to the other two islands. The ANOVA further confirms this: the computed p-value is less than 0.05, and the F value is 110. This tells us that there is some statistically significant difference between average body masses among the island, though we can’t say specifically between which ones without a posthoc test.
3.
Test your assumptions (individually– do not use check_model) and interpret your assumption checks 1. Independence We cannot really tell if there is independence without knowing more about the experimental design for this data set. That said, I will continue anyways as this is a lab assignment. Also I trust palmerpenguins with my life.
The outlier test and the graph indicate that there is an outlier. However, I will keep this in the data because the mass of the penguin is not anything particularly crazy for a penguin. It is around the same as the median on Biscoe island, which tells me it is not an outlandish mass for the penguin. You go chunky penguin you’re my hero.
# A tibble: 3 × 4
island variable statistic p
<fct> <chr> <dbl> <dbl>
1 Biscoe body_mass_g 0.971 0.00159
2 Dream body_mass_g 0.988 0.367
3 Torgersen body_mass_g 0.973 0.341
According to this test for normality, we cannot assume normality for the entire data set as p<0.05. However, the Shapiro-Wilk assessment for normality by groups tells us that we can assume normality for the Torgersen and Dream island data based on those p being >0.05, but we cannot assume this for the data from Biscoe. Despite the violation of this assumption, the dataset has a sufficiently large sample size, so I think that we can proceed regardless.
Homogeneity of Variance
leveneTest(body_mass_g~island, data=penguins)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 25.969 3.358e-11 ***
330
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to the Levene test, we do not have homogeneity of variance with this data as indicated by p<0.05. Therefore, we should run a Welch test one-way ANOVA. See below for this.
# A tibble: 1 × 7
.y. n statistic DFn DFd p method
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <chr>
1 body_mass_g 333 102. 2 137. 6.06e-28 Welch ANOVA
According to the Welch test, there is a statistically significant difference in body mass by island, as indicated by the p<0.05.
4.
Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).
TukeyHSD(pengaov)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = body_mass_g ~ island, data = penguins)
$island
diff lwr upr p adj
Dream-Biscoe -1000.2693 -1177.7875 -822.7512 0.000000
Torgersen-Biscoe -1010.6611 -1256.7392 -764.5830 0.000000
Torgersen-Dream -10.3918 -265.2678 244.4842 0.994933
According to the Tukey post-hoc test, there is a significant difference in average body mass between the penguins on the Dream and Biscoe islands (p-value=0.000), as well as between the Torgersen and Biscoe islands (p-value=0.000). There is no statistically significant difference between the average body masses for penguins on Torgersen and Dream islands (p-value=0.997). This is consistent with the graph that we made earlier!
Depth
Repeat 1-4 above using 2 or more explanatory variables from Palmer Penguins. Assess the effecs of multiple variable on a single numerical variable in the data frame. Make a graph or graphs (if needed). You will need to group, filter, and summarize your data to do this. Perform the necessary ANOVA and TukeyHSD test. Interpret your results (using the graph(s) and stats outputs). Check your assumptions and interpret your assumption check (you can do this individually or with check_model() if you can get the later to work with your ANOVA– it is not optimized for ANOVA and often does not work)
H0: there is no effect of species on the average body mass, no effect of sex on average body mass, and no interactive effect of sex and species on average body mass
HA: there is an interactive effect of species and sex on average body mass
For independence, we cannot definitively say that these data are independent without knowing the experimental design of the dataset. We shall persist nonetheless. According to check_model, homogeneity of variance and normality/homodescascity appear to look ok. For the former, there is a bit of a wave in the line, but it is mostly ok. For normality, the dots fall along the line, indicating that this assumption is met. Finally, for outliers, the boxplot above indicates that there are two of them. However, I feel it is ok to move forward with this dataset because they represent one large and one small penguin that are still overall ok sizes for penguins to be. In other words, they do not appear to be the results of experimental error.
I did this wildy out of order so my interpretation is all going to be here.
According to the ANOVA, there is a significant effect of species and sex separately on body mass, as well as an interactive effect of species and sex on body mass (all p-values are <0.05). This tells us that there is at least one statistically significant difference in the body mass of different species. We cannot say specifically WHERE this difference lies until we run a posthoc test. Additionally, there is a statistically significant difference in the body mass of male and female penguins. Finally, there is a significant interactive effect of species and sex on body mass.
The Tukey HSD test indicates that the body masses are significantly different between Gentoo and Adelie (p-value=0.000) and Gentoo and Chinstrap penguins (p-value=0.000), respectively. There is not a significant effect on species for Chinstrap and Adelie penguins (p-value=0.824). In terms of the interactive effect, every specific interaction of species and sex has a significant effect on average body mass except for Chinstrap males vs Adelie males and Chinstrap females vs Adelie females. Given that the effect of species was not significant for this pairing of species, this result makes sense. These differences are visually represented in the graph above! It demonstrates differences between species, as well as within species as delineated by sex. While maybe a bit harder, it also visualizes the interactive effect.