Attaching package: 'see'
The following objects are masked from 'package:ggsci':
scale_color_material, scale_colour_material, scale_fill_material
library(car)
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
library(rstatix)
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
2.
Using the penguins data, perform a 1-way ANOVA involving the effect of a categorical variable (x) on a numerical variable (y). Group and filter the data (remove NA, for example), calculate means and error, then make a graph. Pair that graph with an ANOVA test. Use the graph + statistical test to assess the null hypothesis of ANOVA.
Df Sum Sq Mean Sq F value Pr(>F)
species 2 50526 25263 567.4 <2e-16 ***
Residuals 330 14693 45
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Graph datapenguins_fl <- penguin %>%group_by(species) %>%drop_na() %>%summarize(mean =mean(flipper_length_mm), sd =sd(flipper_length_mm), n =n(), se = sd/sqrt(n))ggplot(data = penguins_fl, aes(x = species, y = mean, color = species)) +geom_point() +geom_errorbar(data = penguins_fl, aes(x = species, ymin = mean - se, ymax = mean + se)) +theme_bw()
\(H_0\): There is no difference in mean flipper length between penguin species. \(H_A\): At least one species has a significantly different mean flipper length from the other species. Interpretation: The p-value from the ANOVA test is less than 0.05, indicating that at least one species has a significantly different mean flipper length from the other species. Therefore, we reject our null hypothesis.
3.
Test your assumptions (individually– do not use check_model) and interpret your assumption checks.
# A tibble: 3 × 4
species variable statistic p
<fct> <chr> <dbl> <dbl>
1 Adelie flipper_length_mm 0.993 0.743
2 Chinstrap flipper_length_mm 0.989 0.811
3 Gentoo flipper_length_mm 0.961 0.00176
# HomoscedasticityleveneTest(flipper_length_mm ~ species, data = penguin)
Levene's Test for Homogeneity of Variance (center = median)
Df F value Pr(>F)
group 2 0.4428 0.6426
330
Independence: We do not know enough about the experimental design to evaluate the independence of the observations in this data set. However, since this is a lab we will continue with the test anyway.
Outliers: There are two outliers in the Adelie species. I am keeping these outliers in the data set as they most likely represent penguins that had abnormally small or large flippers.
Normality: We cannot assume normality for flipper length for all of the species, since the p-value for the Gentoo species is less than 0.05. However, the sample sizes are likely large enough that we can ignore the fact that normality is violated.
Homoscedasticity: The levene test has a p-value greater than 0.05. This shows that there is not a significant difference in variances between the groups and we can continue the ANOVA without violating this assumption.
4.
Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).
TukeyHSD(anov_p)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = flipper_length_mm ~ species, data = penguin)
$species
diff lwr upr p adj
Chinstrap-Adelie 5.72079 3.414364 8.027215 0
Gentoo-Adelie 27.13255 25.192399 29.072709 0
Gentoo-Chinstrap 21.41176 19.023644 23.799885 0
The p-values for each of the comparisons between groups is less than 0.05, therefore all of the groups are significantly different from each other.
Depth
Repeat 1-4 above using 2 or more explanatory variables from Palmer Penguins. Assess the effecs of multiple variable on a single numerical variable in the data frame. Make a graph or graphs (if needed). You will need to group, filter, and summarize your data to do this. Perform the necessary ANOVA and TukeyHSD test. Interpret your results (using the graph(s) and stats outputs). Check your assumptions and interpret your assumption check (you can do this individually or with check_model() if you can get the later to work with your ANOVA– it is not optimized for ANOVA and often does not work)
2
Perform ANOVA and make a graph
# ANOVAanov_mp <-aov(body_mass_g ~ species * sex, data = penguin)summary(anov_mp)
Df Sum Sq Mean Sq F value Pr(>F)
species 2 145190219 72595110 758.358 < 2e-16 ***
sex 1 37090262 37090262 387.460 < 2e-16 ***
species:sex 2 1676557 838278 8.757 0.000197 ***
Residuals 327 31302628 95727
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Graphpenguins_bm <- penguin %>%group_by(species, sex) %>%drop_na() %>%summarize(mean =mean(body_mass_g), sd =sd(body_mass_g), n =n(), se = sd/sqrt(n))
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
ggplot(data = penguins_bm, aes(x = species, y = mean, color = sex)) +geom_point() +geom_errorbar(data = penguins_bm, aes(x = species, ymin = mean - se, ymax = mean + se)) +theme_bw()
\(H_0\): There is no interactive effect between species and sex on mean body mass or an effect of species on mean body mass or an effect of sex on mean body mass. \(H_A\): There is an interactive effect between species and sex on mean body mass.
Interpretation: The p-values for the effect of species and sex on mean body mass, both separately and as an interactive effect are less than 0.05. Therefore, we reject the null hypotheses as there is a significant interactive effect between species and sex on mean body mass.
3. Test your assumptions and interpret your assumption checks.
check_model(anov_mp)
Variable `Component` is not in your data frame :/
# Outliersggplot(data = penguin, aes(x = species, y = body_mass_g, color = sex)) +geom_boxplot() +theme_bw()
Independence: We do not know enough about the experimental design to evaluate the independence of the observations in this data set. However, since this is a lab we will continue with the test anyway.
Outliers: The boxplot shows two outliers in the Chinstrap species. I am keeping these outliers in the data set as they most likely represent abnormally large and small penguins.
Normality: The normality of residuals plot in the check model output shows that we can assume normality for the data in the model, as the dots fall along the line.
Homoscedasticity: The homogeneity of variance plot in the check model output shows relatively flat and horizontal reference line. This shows that there is not a significant difference in variances between the groups and we can continue the ANOVA without violating this assumption.
4.
Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).
For species, there is a significant difference between the mean body mass of Gentoo and Adelie, as well as Gentoo and Chinstrap as the p-value is less than 0.05. However, there is not a significant difference between the mean body mass of Chinstrap and Adelie as the p-value is greater than 0.05.
For sex, there is a significant difference in mean body mass between male and female as the p-value is less than 0.05. There are significant interactive effects between sex and species on mean body mass for all groups except Chinstrap and Adelie males/males and females/females. The p-values for the interactions of all groups except those two are less than 0.05.
This makes sense as Chinstrap and Adelie penguins are very similar in size, and when the effect of species alone on body mass for these two species is considered, the interaction is not significant. It also makes sense that the interactive effect is significant for males/females for Chinstrap and Adelie, since I would expect males and females to be the most different even if the species are very similar.