Attaching package: 'see'
The following objects are masked from 'package:ggsci':
scale_color_material, scale_colour_material, scale_fill_material
library(rstatix) #test for outliers, welch_anova_test
Attaching package: 'rstatix'
The following object is masked from 'package:stats':
filter
library(car) #levenes test
Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
library(palmerpenguins)
2.
Using the penguins data, perform a 1-way ANOVA involving the effect of a categorical variable (x) on a numerical variable (y). Group and filter the data (remove NA, for example), calculate means and error, then make a graph. Pair that graph with an ANOVA test. Use the graph + statistical test to assess the null hypothesis of ANOVA.
#Null hypothesis: There is no difference in flipper length among species#Alternate hypothesis: There are differences in flipper length among specieshead(penguins)
# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_l…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
4 Adelie Torgersen NA NA NA NA <NA> 2007
5 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
6 Adelie Torgersen 39.3 20.6 190 3650 male 2007
# … with abbreviated variable names ¹flipper_length_mm, ²body_mass_g
#species vs flipper_length_mmpen1<-penguins#DROP NApen2<-pen1%>%drop_na(flipper_length_mm,species)pen2
#Group and filter the datapen_flipp<-pen2%>%group_by(species)%>%drop_na(flipper_length_mm)%>% dplyr::summarize(meanflipp=mean(flipper_length_mm),sd=sd(flipper_length_mm),n=n(),se=sd/sqrt(n))pen_flipp
# A tibble: 3 × 5
species meanflipp sd n se
<fct> <dbl> <dbl> <int> <dbl>
1 Adelie 190. 6.54 151 0.532
2 Chinstrap 196. 7.13 68 0.865
3 Gentoo 217. 6.48 123 0.585
#make a graphggplot(data=pen_flipp,aes(x=species,y=meanflipp,color=species))+geom_point()+geom_errorbar(aes(ymin = meanflipp - se, ymax = meanflipp + se), width =0.2)+theme_bw()+geom_point(data=pen2,aes(x=species,y=flipper_length_mm,group=species, fill=species))+scale_color_npg()+scale_fill_npg()
#Gentoo has the longest flipper, and Chinstrap has the second longest flipper. Adelie has the shortest flipper.#Regular ANOVAanovflipp1<-aov(flipper_length_mm~species,data=pen2)anovflipp1
Call:
aov(formula = flipper_length_mm ~ species, data = pen2)
Terms:
species Residuals
Sum of Squares 52473.28 14953.26
Deg. of Freedom 2 339
Residual standard error: 6.641529
Estimated effects may be unbalanced
summary(anovflipp1) #The p value is less than 0.01, there is a significant difference in flipper_length_mm among species. We reject our null hypothesis that there is no difference in flipper length among species. The box plot also supports this.
Df Sum Sq Mean Sq F value Pr(>F)
species 2 52473 26237 594.8 <2e-16 ***
Residuals 339 14953 44
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.
Test your assumptions (individually– do not use check_model) and interpret your assumption checks
#Check outlierspen_flipp1<-penguins%>%group_by(species)%>%drop_na(flipper_length_mm)%>%identify_outliers(flipper_length_mm)pen_flipp1 ##There are only 2 outliers, (not extreme), so we can still use the data.
# A tibble: 2 × 10
species island bill_l…¹ bill_…² flipp…³ body_…⁴ sex year is.ou…⁵ is.ex…⁶
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> <lgl> <lgl>
1 Adelie Biscoe 37.9 18.6 172 3150 fema… 2007 TRUE FALSE
2 Adelie Torgersen 44.1 18 210 4000 male 2009 TRUE FALSE
# … with abbreviated variable names ¹bill_length_mm, ²bill_depth_mm,
# ³flipper_length_mm, ⁴body_mass_g, ⁵is.outlier, ⁶is.extreme
Call:
aov(formula = flipper_length_mm ~ species, data = pen2)
Terms:
species Residuals
Sum of Squares 52473.28 14953.26
Deg. of Freedom 2 339
Residual standard error: 6.641529
Estimated effects may be unbalanced
#entire model normalityshapiro_test(residuals(modelflipp)) #p.value greater than 0.05 indicates the nornality in the residuals.
#normality of each grouppen1%>%group_by(species)%>%shapiro_test(flipper_length_mm) #Adelie and Chinstrap are normal (p>0.05) but Gentoo is not normal (p<0.01)
# A tibble: 3 × 4
species variable statistic p
<fct> <chr> <dbl> <dbl>
1 Adelie flipper_length_mm 0.993 0.720
2 Chinstrap flipper_length_mm 0.989 0.811
3 Gentoo flipper_length_mm 0.962 0.00162
#Homogeneity of variance/Homoscedasticitylevene_test(flipper_length_mm~species,data=pen2) #This is not significant (p>0.05) so heterogeneity of variance is fine and the assumption is met. We don't need to do welch test.
# A tibble: 1 × 4
df1 df2 statistic p
<int> <int> <dbl> <dbl>
1 2 339 0.331 0.719
#Visual Check Using check_model()check_model(modelflipp)
4.
Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).
aovtest<-aov(flipper_length_mm ~ species, data=pen2)TukeyHSD(aovtest) #p<0.01 for all 3 interactions. There are significant differencce between all 3 species.
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = flipper_length_mm ~ species, data = pen2)
$species
diff lwr upr p adj
Chinstrap-Adelie 5.869887 3.586583 8.153191 0
Gentoo-Adelie 27.233349 25.334376 29.132323 0
Gentoo-Chinstrap 21.363462 19.000841 23.726084 0
Depth
Repeat 1-4 above using 2 or more explanatory variables from Palmer Penguins. Assess the effecs of multiple variable on a single numerical variable in the data frame. Make a graph or graphs (if needed). You will need to group, filter, and summarize your data to do this. Perform the necessary ANOVA and TukeyHSD test. Interpret your results (using the graph(s) and stats outputs). Check your assumptions and interpret your assumption check (you can do this individually or with check_model() if you can get the later to work with your ANOVA– it is not optimized for ANOVA and often does not work)
1.
Load PalmerPenguins and other necessary packages
#insert packages here :)library(tidyverse) #For data manipulation and graphinglibrary(ggsci) #for color scaleslibrary(patchwork) #to panel our graphslibrary(performance) #check stat/model assumptionslibrary(see)library(rstatix) #test for outliers, welch_anova_testlibrary(car) #levenes testlibrary(palmerpenguins)
2.
Using the penguins data, perform a 1-way ANOVA involving the effect of a categorical variable (x) on a numerical variable (y). Group and filter the data (remove NA, for example), calculate means and error, then make a graph. Pair that graph with an ANOVA test. Use the graph + statistical test to assess the null hypothesis of ANOVA.
#Hypotheses#Null Hypothesis: There is no difference in flipper length among species nor sex#Alternate Hypothesis: There is differences in flipper length among species and sex#Group and Filter the datapen_flipp_sex<-pen1%>%group_by(species,sex)%>%drop_na(flipper_length_mm)%>% dplyr::summarize(meanflipp=mean(flipper_length_mm),sd=sd(flipper_length_mm),n=n(),se=sd/sqrt(n))
`summarise()` has grouped output by 'species'. You can override using the
`.groups` argument.
pen_flipp_sex
# A tibble: 8 × 6
# Groups: species [3]
species sex meanflipp sd n se
<fct> <fct> <dbl> <dbl> <int> <dbl>
1 Adelie female 188. 5.60 73 0.655
2 Adelie male 192. 6.60 73 0.772
3 Adelie <NA> 186. 6.11 5 2.73
4 Chinstrap female 192. 5.75 34 0.987
5 Chinstrap male 200. 5.98 34 1.02
6 Gentoo female 213. 3.90 58 0.512
7 Gentoo male 222. 5.67 61 0.726
8 Gentoo <NA> 216. 1.26 4 0.629
# A tibble: 333 × 8
species island bill_length_mm bill_depth_mm flipper_…¹ body_…² sex year
<fct> <fct> <dbl> <dbl> <int> <int> <fct> <int>
1 Adelie Torgersen 39.1 18.7 181 3750 male 2007
2 Adelie Torgersen 39.5 17.4 186 3800 fema… 2007
3 Adelie Torgersen 40.3 18 195 3250 fema… 2007
4 Adelie Torgersen 36.7 19.3 193 3450 fema… 2007
5 Adelie Torgersen 39.3 20.6 190 3650 male 2007
6 Adelie Torgersen 38.9 17.8 181 3625 fema… 2007
7 Adelie Torgersen 39.2 19.6 195 4675 male 2007
8 Adelie Torgersen 41.1 17.6 182 3200 fema… 2007
9 Adelie Torgersen 38.6 21.2 191 3800 male 2007
10 Adelie Torgersen 34.6 21.1 198 4400 male 2007
# … with 323 more rows, and abbreviated variable names ¹flipper_length_mm,
# ²body_mass_g
#Make a graphggplot(data=pen_flipp_sex, aes(x=species, y=meanflipp, color = sex))+geom_point()+geom_errorbar(aes(ymin = meanflipp - se, ymax = meanflipp + se), width =0.2)+geom_point(data = pen3, aes(x=species, y=flipper_length_mm,color=sex), position =position_jitterdodge(dodge.width =0.75), alpha =0.3)
#For all 3 species, male tend to have longer flipper than female. Gentoo has the longest flipper, and Chinstrap has the second longest flipper. Adelie has the shortest flipper.aov_pen2<-aov(flipper_length_mm ~ species*sex, data=pen3)summary(aov_pen2) #The low p-values (<0.01) suggest the sigificant difference in flipper length among species and sex. We can reject our null hypothesis that there is no difference in flipper length among species nor sex. The graph also supports this.
Df Sum Sq Mean Sq F value Pr(>F)
species 2 50526 25263 789.912 < 2e-16 ***
sex 1 3906 3906 122.119 < 2e-16 ***
species:sex 2 329 165 5.144 0.00631 **
Residuals 327 10458 32
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
3.
Test your assumptions (individually– do not use check_model) and interpret your assumption checks
#Visual Check Using check_model()check_model(aov_pen2) #Linearity, Homogeneity, Normality look fine to some extent for ANOVA.
Variable `Component` is not in your data frame :/
#Outlier testpen_flipp2<-pen3%>%group_by(species,sex)%>%drop_na(flipper_length_mm)%>%identify_outliers(flipper_length_mm)pen_flipp2 ##There are 4 outliers, (not extreme), so we can still use the data.
Run a TukeyHSD test on your ANOVA and interpret the results. Make sure your comparisons are easily visible and comparable in your graph (above).
TukeyHSD(aov_pen2) #Adelie:male and Adelie:female, Chinstrap:male-Chinstrap:female and Gentoo:male-Gentoo:female have p<0.01, which suggest that there is a significant difference in lipper length between sex in species. male-female has p=~0, which suggests the significant difference in flipper length between female and male.