Reset R’s memory using the code below.
rm(list=ls())
The codes below helps in telling the R where to look for the data that will be used in this activity.
setwd("C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_8 ANOVA")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_8 ANOVA"
Box 8.1 Worked example: diatom communities in metal affected streams. Quinn and Keough: Experimental Design and Data Analysis for Biologists
Step 1
We were instructed to choose the data for processing and create data object medley using the codes below.
sample.data <-read.csv("diatom.csv", header = TRUE, sep= ",")
str(sample.data)
## 'data.frame': 34 obs. of 3 variables:
## $ STREAM : Factor w/ 6 levels "Arkan","Blue",..: 4 4 4 4 2 2 2 2 2 2 ...
## $ ZINC : Factor w/ 4 levels "BACK","HIGH",..: 1 2 2 4 1 2 1 1 2 4 ...
## $ DIVERSITY: num 2.27 1.25 1.15 1.62 1.7 0.63 2.05 1.98 1.04 2.19 ...
head(sample.data)
## STREAM ZINC DIVERSITY
## 1 Eagle BACK 2.27
## 2 Eagle HIGH 1.25
## 3 Eagle HIGH 1.15
## 4 Eagle MED 1.62
## 5 Blue BACK 1.70
## 6 Blue HIGH 0.63
tail(sample.data)
## STREAM ZINC DIVERSITY
## 29 Splat BACK 1.53
## 30 Splat BACK 0.76
## 31 Splat MED 0.80
## 32 Splat LOW 1.66
## 33 Splat MED 0.98
## 34 Splat BACK 1.89
Step 2
We reorganize the levels of the categorical factor into a more logical order using the codes below.
sample.data$ZINC <-factor(sample.data$ZINC, levels=c('HIGH', 'MED', 'LOW', 'BACK'), ordered = F)
sample.data
## STREAM ZINC DIVERSITY
## 1 Eagle BACK 2.27
## 2 Eagle HIGH 1.25
## 3 Eagle HIGH 1.15
## 4 Eagle MED 1.62
## 5 Blue BACK 1.70
## 6 Blue HIGH 0.63
## 7 Blue BACK 2.05
## 8 Blue BACK 1.98
## 9 Blue HIGH 1.04
## 10 Blue MED 2.19
## 11 Blue MED 2.10
## 12 Snake BACK 2.20
## 13 Snake MED 2.06
## 14 Snake HIGH 1.90
## 15 Snake HIGH 1.88
## 16 Snake HIGH 0.85
## 17 Arkan LOW 1.40
## 18 Arkan LOW 2.18
## 19 Arkan LOW 1.83
## 20 Arkan LOW 1.88
## 21 Arkan MED 2.02
## 22 Arkan MED 1.94
## 23 Arkan LOW 2.10
## 24 Chalk LOW 2.38
## 25 Chalk HIGH 1.43
## 26 Chalk HIGH 1.37
## 27 Chalk MED 1.75
## 28 Chalk LOW 2.83
## 29 Splat BACK 1.53
## 30 Splat BACK 0.76
## 31 Splat MED 0.80
## 32 Splat LOW 1.66
## 33 Splat MED 0.98
## 34 Splat BACK 1.89
Step 3
We assessed the normality/homogeneity of variance using boxplot of species diversity against zinc group using the code below.
boxplot(DIVERSITY~ZINC, data = sample.data)
ggplot
library(ggplot2)
zinc.bp <-ggplot(sample.data,
aes(y=DIVERSITY, x=ZINC, fill=ZINC))+
geom_boxplot()
zinc.bp
To add title to the zinc boxplot, use the code below.
zinc.bp + ggtitle("Diatom Communities in Metal Affected Streams")
Description
Diversity is considered as one of the factor. Therefore, it is concluded that there was not obvious violations of normality or homogeneity of variance since boxplots were not asymmetrical and do not vary greatly in size.
Step 4
We assessed the homogeneity of variance assumption with a table and/or plot of mean vs. variance using the codes below.
plot(tapply(sample.data$DIVERSITY, sample.data$ZINC, mean),
+ tapply(sample.data$DIVERSITY, sample.data$ZINC, var))
Conclusions
There is no obvious relationship between group mean and variance.
Step 5
We were instructed to test the HO that population group means are all equal by performing analysis of variance (fit the linear model) of species diversity versus the zinc-level group and examine the diagnostics (residual plot).
sample.data.aov <-aov(DIVERSITY~ZINC, data=sample.data)
plot(sample.data.aov)
Conclusions
There is no obvious violations of normality or homogeneity of variance since there is no obvious wedge shape in residuals, and normal Q-Q plot approximately linear. Cook’s D value is meaningless in ANOVA.
Step 6
Examining the ANOVA table
anova(sample.data.aov)
## Analysis of Variance Table
##
## Response: DIVERSITY
## Df Sum Sq Mean Sq F value Pr(>F)
## ZINC 3 2.5666 0.85554 3.9387 0.01756 *
## Residuals 30 6.5164 0.21721
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusions
Reject the HO that population group means are equal. It was found out that ZINC have significant impact on the DIVERSITY of diatoms (F3, 30 = 3.939, P= 0.018).
Step 7
We do the post-hoc Tukey’s test to investigate pairwise mean differences between all groups using multcomp package glht-general linear hypotheses test.
a. Tukey’s multiple comparisons test using multcomp package
pacman::p_load(multcomp)
fit1 <-summary(glht(sample.data.aov, linfct = mcp(ZINC="Tukey")))
fit1
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Fit: aov(formula = DIVERSITY ~ ZINC, data = sample.data)
##
## Linear Hypotheses:
## Estimate Std. Error t value Pr(>|t|)
## MED - HIGH == 0 0.44000 0.21970 2.003 0.2094
## LOW - HIGH == 0 0.75472 0.22647 3.333 0.0115 *
## BACK - HIGH == 0 0.51972 0.22647 2.295 0.1217
## LOW - MED == 0 0.31472 0.22647 1.390 0.5152
## BACK - MED == 0 0.07972 0.22647 0.352 0.9847
## BACK - LOW == 0 -0.23500 0.23303 -1.008 0.7457
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Conclusions
Diatom species diversity is significantly higher in low zinc sites than high zinc sites (t15 = 3.333, P = 0.011). No other H0 rejected. The Tukey’s adjusted P-values are based on robust procedures that were not available to Quinn and Keough (2002). The more recen Tukey’s test makes use of randomization procedures and thus, the exact P-values differ from run to run.
b. Another method to do Tukey’s multiple comparisons
Multiple comparisons by TukeyHSD (must be done on a list made by aov()).
fit2 <-aov(lm(DIVERSITY~ZINC, data = sample.data))
tukeyfit3 <-TukeyHSD(fit2)
plot(tukeyfit3, las=1, color="red")
## Warning in plot.window(...): "color" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "color" is not a graphical parameter
## Warning in title(...): "color" is not a graphical parameter
## Warning in axis(1, ...): "color" is not a graphical parameter
## Warning in axis(2, at = nrow(xi):1, labels = dimnames(xi)[[1L]], srt = 0, :
## "color" is not a graphical parameter
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...):
## "color" is not a graphical parameter
## Warning in segments(xi[, "lwr"], yvals, xi[, "upr"], yvals, ...): "color"
## is not a graphical parameter
## Warning in segments(as.vector(xi), rep.int(yvals - 0.1, 3L),
## as.vector(xi), : "color" is not a graphical parameter
c. Another way to do Tukey’s HSD, using agricolae package
pacman::p_load(agricolae)
fit4 <-aov(DIVERSITY~ZINC, data=sample.data)
HSD.test(fit4, "ZINC")
outfit4 <-HSD.test(fit4, "ZINC", group= TRUE)
outfit4
## $statistics
## MSerror Df Mean CV
## 0.2172137 30 1.694118 27.5106
##
## $parameters
## test name.t ntr StudentizedRange alpha
## Tukey ZINC 4 3.845401 0.05
##
## $means
## DIVERSITY std r Min Max Q25 Q50 Q75
## BACK 1.797500 0.4852613 8 0.76 2.27 1.6575 1.935 2.0875
## HIGH 1.277778 0.4268717 9 0.63 1.90 1.0400 1.250 1.4300
## LOW 2.032500 0.4449960 8 1.40 2.83 1.7875 1.990 2.2300
## MED 1.717778 0.5030104 9 0.80 2.19 1.6200 1.940 2.0600
##
## $comparison
## NULL
##
## $groups
## DIVERSITY groups
## LOW 2.032500 a
## BACK 1.797500 ab
## MED 1.717778 ab
## HIGH 1.277778 b
##
## attr(,"class")
## [1] "group"