ONE-WAY ANOVA
oneway=read.csv("c:\\temp\\oneway.csv",header=T)
names(oneway)
## [1] "Growth" "Photoperiod"
calculating the mean growth at each photoperiod, using tapply
tapply(oneway$Growth, oneway$Photoperiod, mean)
## Long Short Very.long Very.short
## 2.500000 2.166667 3.166667 1.666667
Means appear to be increasing as photo period increases. Rearranging the factors in Photoperiod to convert it into an ordered ist in R:
oneway$Photoperiod=ordered(oneway$Photoperiod,levels=c("Very.short","Short","Long","Very.long"))
tapply(oneway$Growth, oneway$Photoperiod, mean)
## Very.short Short Long Very.long
## 1.666667 2.166667 2.500000 3.166667
To conduct 1-way ANOVA, we use aov() with arguments in the form (response.variable~explanatory.variable). Here, explanatory variable should be factor. For our example, checking if Photoperiod is a “factor” type in R:
is.factor(oneway$Photoperiod)
## [1] TRUE
It is! Conducting 1-way ANOVA:
one.way=aov(oneway$Growth~oneway$Photoperiod)
summary(one.way)
## Df Sum Sq Mean Sq F value Pr(>F)
## oneway$Photoperiod 3 7.12 2.375 1.462 0.255
## Residuals 20 32.50 1.625
plot(one.way)
Since the p-value is >0.05, we can say that the difference in means is not significant.
TWO-WAY ANOVA
twoway=read.table("c:\\temp\\twoway.txt",header=T)
names(twoway)
## [1] "Growth" "Photoperiod" "Genotype"
Comparing mean growth rate of the 6 genotypes:
tapply(twoway$Growth, twoway$Genotype, mean)
## A B C D E F
## 3.00 4.50 1.50 1.50 2.00 1.75
Conducting two-way ANOVA:
two.way=aov(twoway$Growth~twoway$Genotype+twoway$Photoperiod)
summary(two.way)
## Df Sum Sq Mean Sq F value Pr(>F)
## twoway$Genotype 5 27.875 5.575 18.081 7.09e-06 ***
## twoway$Photoperiod 3 7.125 2.375 7.703 0.0024 **
## Residuals 15 4.625 0.308
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(two.way)
## hat values (leverages) are all = 0.375
## and there are no factor predictors; no plot no. 5
The p-values for both Genotype and Photoperiod are <-.05, showing that there is a significant difference in their means.
TWO-WAY FACTORIAL ANOVA
factorial=read.table("c:\\temp\\factorial.txt",header=T)
factorial
## growth diet coat
## 1 6.6 A light
## 2 7.2 A light
## 3 6.9 B light
## 4 8.3 B light
## 5 7.9 C light
## 6 9.2 C light
## 7 8.3 A dark
## 8 8.7 A dark
## 9 8.1 B dark
## 10 8.5 B dark
## 11 9.1 C dark
## 12 9.0 C dark
model=aov(factorial$growth~factorial$diet*factorial$coat)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## factorial$diet 2 2.6600 1.3300 3.677 0.0907 .
## factorial$coat 1 2.6133 2.6133 7.226 0.0361 *
## factorial$diet:factorial$coat 2 0.6867 0.3433 0.949 0.4383
## Residuals 6 2.1700 0.3617
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interaction sum of squares is 0.68667. Next, we look at how it was obtained. Obtaining sums of the 2 replicates in each of the 6 combinations of factor levels:
tapply(factorial$growth,list(factorial$coat,factorial$diet),sum)
## A B C
## dark 17.0 16.6 18.1
## light 13.8 15.2 17.1
SSAB=sum(as.vector(tapply(factorial$growth,list(factorial$coat,factorial$diet),sum))^2)/2
Computing the correlation factor:
CF=sum(factorial$growth)^2/length(factorial$growth)
Calculating the interaction sum of squares(SSAB–SSA-SSB-CF):
SSAB-CF-2.66-2.61333
## [1] 0.68667
Above value is the same as calculated by the R aov(). Interpreting the results of 2-way factorial anova - Interaction terms have p-value >0.05, therefore interaction between terms is not significant. Out of the main effects, coat has a significant effect but diet does not. Plotting the model to carry out model criticism:
plot(model)
## hat values (leverages) are all = 0.5
## and there are no factor predictors; no plot no. 5
Variance is constant and errors are normal. Next, we update our model to remove the interaction term, and then compare model with model2:
model2=update(model , ~ . - factorial$diet:factorial$coat)
anova(model, model2)
## Analysis of Variance Table
##
## Model 1: factorial$growth ~ factorial$diet * factorial$coat
## Model 2: factorial$growth ~ factorial$diet + factorial$coat
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 6 2.1700
## 2 8 2.8567 -2 -0.68667 0.9493 0.4383
This shows that the explanatory power of model 2 is not significantly different from that of model. Therefore, model simplification turned out to be a good thing here!
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## factorial$diet 2 2.660 1.3300 3.725 0.0719 .
## factorial$coat 1 2.613 2.6133 7.319 0.0269 *
## Residuals 8 2.857 0.3571
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We notice that diet is possibly not significant, so we remove it to create model3:
model3=update(model2, ~. -factorial$diet)
anova(model2,model3)
## Analysis of Variance Table
##
## Model 1: factorial$growth ~ factorial$diet + factorial$coat
## Model 2: factorial$growth ~ factorial$coat
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 8 2.8567
## 2 10 5.5167 -2 -2.66 3.7246 0.0719 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, p-value is >0.05, showing that the difference in explanatory powers of models 2 and 3 is insignificant. Summarizing model3, the minimum adequate model:
summary(model3)
## Df Sum Sq Mean Sq F value Pr(>F)
## factorial$coat 1 2.613 2.6133 4.737 0.0546 .
## Residuals 10 5.517 0.5517
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We notice that after removing “diet”, coat loses significance as an explanatory variable in this model. So either we take both or neither - no one variable is unequivocally important. Exploring further, we look at the means for the 3 diets:
tapply(factorial$growth,factorial$diet,mean)
## A B C
## 7.70 7.95 8.80
It looks as if diets A and B produce a rather similar response, but perhaps diet C produces faster growth than the other 2 ? We calculate a new factor, diet2, which is 1 for diets A and B and 2 for diet C:
diet2=factor(1+(factorial$diet=="C"))
diet2
## [1] 1 1 1 1 2 2 1 1 1 1 2 2
## Levels: 1 2
Now we try adding this revised dietary factor to the minimal model and see whther it makes a significant difference using anova:
model4=update(model3, ~. +diet2)
anova(model3,model4)
## Analysis of Variance Table
##
## Model 1: factorial$growth ~ factorial$coat
## Model 2: factorial$growth ~ factorial$coat + diet2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 10 5.5167
## 2 9 2.9817 1 2.535 7.6518 0.02189 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yes, it does. Saving that extra degree of freedom, by reducing the levels of diet from 3 to 2 has turned a non-significant effect into a significant one. Let’s see if there is an interaction with this new, simpler factor:
model5=update(model4, ~. +diet2:factorial$coat)
anova(model4,model5)
## Analysis of Variance Table
##
## Model 1: factorial$growth ~ factorial$coat + diet2
## Model 2: factorial$growth ~ factorial$coat + diet2 + factorial$coat:diet2
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 9 2.9817
## 2 8 2.7000 1 0.28167 0.8346 0.3877
No, there isn’t. This means that model4 is the minimal adequate model.
summary(model4)
## Df Sum Sq Mean Sq F value Pr(>F)
## factorial$coat 1 2.613 2.6133 7.888 0.0204 *
## diet2 1 2.535 2.5350 7.652 0.0219 *
## Residuals 9 2.982 0.3313
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
THREE-WAY FACTORIAL ANOVA
Daphnia.data=read.csv("c:\\temp\\Course Data\\Daphnia.csv",header=T)
names(Daphnia.data)
## [1] "Growth.rate" "Water" "Detergent" "Daphnia"
factorial=aov(Daphnia.data$Growth.rate~Daphnia.data$Water*Daphnia.data$Detergent*Daphnia.data$Daphnia)
summary(factorial)
## Df Sum Sq
## Daphnia.data$Water 1 1.99
## Daphnia.data$Detergent 3 2.21
## Daphnia.data$Daphnia 2 39.18
## Daphnia.data$Water:Daphnia.data$Detergent 3 0.17
## Daphnia.data$Water:Daphnia.data$Daphnia 2 13.73
## Daphnia.data$Detergent:Daphnia.data$Daphnia 6 20.60
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia 6 5.85
## Residuals 48 33.43
## Mean Sq F value
## Daphnia.data$Water 1.985 2.850
## Daphnia.data$Detergent 0.737 1.059
## Daphnia.data$Daphnia 19.589 28.128
## Daphnia.data$Water:Daphnia.data$Detergent 0.058 0.084
## Daphnia.data$Water:Daphnia.data$Daphnia 6.866 9.859
## Daphnia.data$Detergent:Daphnia.data$Daphnia 3.433 4.930
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia 0.975 1.399
## Residuals 0.696
## Pr(>F)
## Daphnia.data$Water 0.097838 .
## Daphnia.data$Detergent 0.375478
## Daphnia.data$Daphnia 8.23e-09 ***
## Daphnia.data$Water:Daphnia.data$Detergent 0.968608
## Daphnia.data$Water:Daphnia.data$Daphnia 0.000259 ***
## Daphnia.data$Detergent:Daphnia.data$Daphnia 0.000532 ***
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia 0.234324
## Residuals
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Significant interaction between “water and daphnia” and “daphnia and detergent”.
plot(factorial)
## hat values (leverages) are all = 0.3333333
## and there are no factor predictors; no plot no. 5
tapply(Daphnia.data$Growth.rate,list(Daphnia.data$Detergent,Daphnia.data$Daphnia),mean)
## Clone1 Clone2 Clone3
## BrandA 2.732227 3.919002 5.003268
## BrandB 2.929140 4.402931 4.698062
## BrandC 3.071335 4.772805 4.019397
## BrandD 2.626797 5.213745 2.834151
Interaction plot:
interaction.plot(Daphnia.data$Detergent,Daphnia.data$Daphnia,Daphnia.data$Growth.rate)