Single factor ANOVA with Tukey’s test

Reset R’s memory using the code below.

rm(list=ls())

The codes below helps in telling the R where to look for the data that will be used in this activity.

setwd("C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_8 ANOVA")
getwd()
## [1] "C:/Users/April Mae Tabonda/Documents/MS Marine Science/Biostat/PLP/RMDs/PLP_8 ANOVA"

EXAMPLE: Single factor ANOVA with Tukey’s test

Box 8.1 Worked example: diatom communities in metal affected streams. Quinn and Keough: Experimental Design and Data Analysis for Biologists

Step 1

We were instructed to choose the data for processing and create data object medley using the codes below.

sample.data <-read.csv("diatom.csv", header = TRUE, sep= ",")
str(sample.data)
## 'data.frame':    34 obs. of  3 variables:
##  $ STREAM   : Factor w/ 6 levels "Arkan","Blue",..: 4 4 4 4 2 2 2 2 2 2 ...
##  $ ZINC     : Factor w/ 4 levels "BACK","HIGH",..: 1 2 2 4 1 2 1 1 2 4 ...
##  $ DIVERSITY: num  2.27 1.25 1.15 1.62 1.7 0.63 2.05 1.98 1.04 2.19 ...
head(sample.data)
##   STREAM ZINC DIVERSITY
## 1  Eagle BACK      2.27
## 2  Eagle HIGH      1.25
## 3  Eagle HIGH      1.15
## 4  Eagle  MED      1.62
## 5   Blue BACK      1.70
## 6   Blue HIGH      0.63
tail(sample.data)
##    STREAM ZINC DIVERSITY
## 29  Splat BACK      1.53
## 30  Splat BACK      0.76
## 31  Splat  MED      0.80
## 32  Splat  LOW      1.66
## 33  Splat  MED      0.98
## 34  Splat BACK      1.89

Step 2

We reorganize the levels of the categorical factor into a more logical order using the codes below.

sample.data$ZINC <-factor(sample.data$ZINC, levels=c('HIGH', 'MED', 'LOW', 'BACK'), ordered = F)
sample.data
##    STREAM ZINC DIVERSITY
## 1   Eagle BACK      2.27
## 2   Eagle HIGH      1.25
## 3   Eagle HIGH      1.15
## 4   Eagle  MED      1.62
## 5    Blue BACK      1.70
## 6    Blue HIGH      0.63
## 7    Blue BACK      2.05
## 8    Blue BACK      1.98
## 9    Blue HIGH      1.04
## 10   Blue  MED      2.19
## 11   Blue  MED      2.10
## 12  Snake BACK      2.20
## 13  Snake  MED      2.06
## 14  Snake HIGH      1.90
## 15  Snake HIGH      1.88
## 16  Snake HIGH      0.85
## 17  Arkan  LOW      1.40
## 18  Arkan  LOW      2.18
## 19  Arkan  LOW      1.83
## 20  Arkan  LOW      1.88
## 21  Arkan  MED      2.02
## 22  Arkan  MED      1.94
## 23  Arkan  LOW      2.10
## 24  Chalk  LOW      2.38
## 25  Chalk HIGH      1.43
## 26  Chalk HIGH      1.37
## 27  Chalk  MED      1.75
## 28  Chalk  LOW      2.83
## 29  Splat BACK      1.53
## 30  Splat BACK      0.76
## 31  Splat  MED      0.80
## 32  Splat  LOW      1.66
## 33  Splat  MED      0.98
## 34  Splat BACK      1.89

Step 3

We assessed the normality/homogeneity of variance using boxplot of species diversity against zinc group using the code below.

boxplot(DIVERSITY~ZINC, data = sample.data)

ggplot

library(ggplot2)

zinc.bp <-ggplot(sample.data,
            aes(y=DIVERSITY, x=ZINC, fill=ZINC))+
            geom_boxplot()
zinc.bp

To add title to the zinc boxplot, use the code below.

zinc.bp + ggtitle("Diatom Communities in Metal Affected Streams")

Description

Diversity is considered as one of the factor. Therefore, it is concluded that there was not obvious violations of normality or homogeneity of variance since boxplots were not asymmetrical and do not vary greatly in size.

Step 4

We assessed the homogeneity of variance assumption with a table and/or plot of mean vs. variance using the codes below.

plot(tapply(sample.data$DIVERSITY, sample.data$ZINC, mean),
     + tapply(sample.data$DIVERSITY, sample.data$ZINC, var))

Conclusions

There is no obvious relationship between group mean and variance.

Step 5

We were instructed to test the HO that population group means are all equal by performing analysis of variance (fit the linear model) of species diversity versus the zinc-level group and examine the diagnostics (residual plot).

sample.data.aov <-aov(DIVERSITY~ZINC, data=sample.data)
plot(sample.data.aov)

Conclusions

There is no obvious violations of normality or homogeneity of variance since there is no obvious wedge shape in residuals, and normal Q-Q plot approximately linear. Cook’s D value is meaningless in ANOVA.

Step 6

Examining the ANOVA table

anova(sample.data.aov)
## Analysis of Variance Table
## 
## Response: DIVERSITY
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## ZINC       3 2.5666 0.85554  3.9387 0.01756 *
## Residuals 30 6.5164 0.21721                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusions

Reject the HO that population group means are equal. It was found out that ZINC have significant impact on the DIVERSITY of diatoms (F3, 30 = 3.939, P= 0.018).

Step 7

We do the post-hoc Tukey’s test to investigate pairwise mean differences between all groups using multcomp package glht-general linear hypotheses test.

a. Tukey’s multiple comparisons test using multcomp package

pacman::p_load(multcomp)
fit1 <-summary(glht(sample.data.aov, linfct = mcp(ZINC="Tukey")))
fit1
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: aov(formula = DIVERSITY ~ ZINC, data = sample.data)
## 
## Linear Hypotheses:
##                  Estimate Std. Error t value Pr(>|t|)  
## MED - HIGH == 0   0.44000    0.21970   2.003   0.2094  
## LOW - HIGH == 0   0.75472    0.22647   3.333   0.0115 *
## BACK - HIGH == 0  0.51972    0.22647   2.295   0.1217  
## LOW - MED == 0    0.31472    0.22647   1.390   0.5152  
## BACK - MED == 0   0.07972    0.22647   0.352   0.9847  
## BACK - LOW == 0  -0.23500    0.23303  -1.008   0.7457  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Conclusions

Diatom species diversity is significantly higher in low zinc sites than high zinc sites (t15 = 3.333, P = 0.011). No other H0 rejected. The Tukey’s adjusted P-values are based on robust procedures that were not available to Quinn and Keough (2002). The more recen Tukey’s test makes use of randomization procedures and thus, the exact P-values differ from run to run.

b. Another method to do Tukey’s multiple comparisons

Multiple comparisons by TukeyHSD (must be done on a list made by aov()).

fit2 <-aov(lm(DIVERSITY~ZINC, data = sample.data))
tukeyfit3 <-TukeyHSD(fit2)
plot(tukeyfit3, las=1, color="red")
## Warning in plot.window(...): "color" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "color" is not a graphical parameter
## Warning in title(...): "color" is not a graphical parameter
## Warning in axis(1, ...): "color" is not a graphical parameter
## Warning in axis(2, at = nrow(xi):1, labels = dimnames(xi)[[1L]], srt = 0, :
## "color" is not a graphical parameter
## Warning in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...):
## "color" is not a graphical parameter
## Warning in segments(xi[, "lwr"], yvals, xi[, "upr"], yvals, ...): "color"
## is not a graphical parameter
## Warning in segments(as.vector(xi), rep.int(yvals - 0.1, 3L),
## as.vector(xi), : "color" is not a graphical parameter

c. Another way to do Tukey’s HSD, using agricolae package

pacman::p_load(agricolae)
fit4 <-aov(DIVERSITY~ZINC, data=sample.data)
HSD.test(fit4, "ZINC")
outfit4 <-HSD.test(fit4, "ZINC", group= TRUE)
outfit4
## $statistics
##     MSerror Df     Mean      CV
##   0.2172137 30 1.694118 27.5106
## 
## $parameters
##    test name.t ntr StudentizedRange alpha
##   Tukey   ZINC   4         3.845401  0.05
## 
## $means
##      DIVERSITY       std r  Min  Max    Q25   Q50    Q75
## BACK  1.797500 0.4852613 8 0.76 2.27 1.6575 1.935 2.0875
## HIGH  1.277778 0.4268717 9 0.63 1.90 1.0400 1.250 1.4300
## LOW   2.032500 0.4449960 8 1.40 2.83 1.7875 1.990 2.2300
## MED   1.717778 0.5030104 9 0.80 2.19 1.6200 1.940 2.0600
## 
## $comparison
## NULL
## 
## $groups
##      DIVERSITY groups
## LOW   2.032500      a
## BACK  1.797500     ab
## MED   1.717778     ab
## HIGH  1.277778      b
## 
## attr(,"class")
## [1] "group"