MSBA320_Project7R_Kanupriya.utf8

ONE-WAY ANOVA

oneway=read.csv("c:\\temp\\oneway.csv",header=T)
names(oneway)

## [1] "Growth"      "Photoperiod"

calculating the mean growth at each photoperiod, using tapply

tapply(oneway$Growth, oneway$Photoperiod, mean)

##       Long      Short  Very.long Very.short 
##   2.500000   2.166667   3.166667   1.666667

Means appear to be increasing as photo period increases. Rearranging the factors in Photoperiod to convert it into an ordered ist in R:

oneway$Photoperiod=ordered(oneway$Photoperiod,levels=c("Very.short","Short","Long","Very.long"))
tapply(oneway$Growth, oneway$Photoperiod, mean)

## Very.short      Short       Long  Very.long 
##   1.666667   2.166667   2.500000   3.166667

To conduct 1-way ANOVA, we use aov() with arguments in the form (response.variable~explanatory.variable). Here, explanatory variable should be factor. For our example, checking if Photoperiod is a “factor” type in R:

is.factor(oneway$Photoperiod)

## [1] TRUE

It is! Conducting 1-way ANOVA:

one.way=aov(oneway$Growth~oneway$Photoperiod)
summary(one.way)

##                    Df Sum Sq Mean Sq F value Pr(>F)
## oneway$Photoperiod  3   7.12   2.375   1.462  0.255
## Residuals          20  32.50   1.625

plot(one.way)

Since the p-value is >0.05, we can say that the difference in means is not significant.

TWO-WAY ANOVA

twoway=read.table("c:\\temp\\twoway.txt",header=T)
names(twoway)

## [1] "Growth"      "Photoperiod" "Genotype"

Comparing mean growth rate of the 6 genotypes:

tapply(twoway$Growth, twoway$Genotype, mean)

##    A    B    C    D    E    F 
## 3.00 4.50 1.50 1.50 2.00 1.75

Conducting two-way ANOVA:

two.way=aov(twoway$Growth~twoway$Genotype+twoway$Photoperiod)
summary(two.way)

##                    Df Sum Sq Mean Sq F value   Pr(>F)    
## twoway$Genotype     5 27.875   5.575  18.081 7.09e-06 ***
## twoway$Photoperiod  3  7.125   2.375   7.703   0.0024 ** 
## Residuals          15  4.625   0.308                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(two.way)

## hat values (leverages) are all = 0.375
##  and there are no factor predictors; no plot no. 5

The p-values for both Genotype and Photoperiod are <-.05, showing that there is a significant difference in their means.

TWO-WAY FACTORIAL ANOVA

factorial=read.table("c:\\temp\\factorial.txt",header=T)
factorial

##    growth diet  coat
## 1     6.6    A light
## 2     7.2    A light
## 3     6.9    B light
## 4     8.3    B light
## 5     7.9    C light
## 6     9.2    C light
## 7     8.3    A  dark
## 8     8.7    A  dark
## 9     8.1    B  dark
## 10    8.5    B  dark
## 11    9.1    C  dark
## 12    9.0    C  dark

model=aov(factorial$growth~factorial$diet*factorial$coat)
summary(model)

##                               Df Sum Sq Mean Sq F value Pr(>F)  
## factorial$diet                 2 2.6600  1.3300   3.677 0.0907 .
## factorial$coat                 1 2.6133  2.6133   7.226 0.0361 *
## factorial$diet:factorial$coat  2 0.6867  0.3433   0.949 0.4383  
## Residuals                      6 2.1700  0.3617                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction sum of squares is 0.68667. Next, we look at how it was obtained. Obtaining sums of the 2 replicates in each of the 6 combinations of factor levels:

tapply(factorial$growth,list(factorial$coat,factorial$diet),sum)

##          A    B    C
## dark  17.0 16.6 18.1
## light 13.8 15.2 17.1

SSAB=sum(as.vector(tapply(factorial$growth,list(factorial$coat,factorial$diet),sum))^2)/2

Computing the correlation factor:

CF=sum(factorial$growth)^2/length(factorial$growth)

Calculating the interaction sum of squares(SSAB–SSA-SSB-CF):

SSAB-CF-2.66-2.61333

## [1] 0.68667

Above value is the same as calculated by the R aov(). Interpreting the results of 2-way factorial anova - Interaction terms have p-value >0.05, therefore interaction between terms is not significant. Out of the main effects, coat has a significant effect but diet does not. Plotting the model to carry out model criticism:

plot(model)

## hat values (leverages) are all = 0.5
##  and there are no factor predictors; no plot no. 5

Variance is constant and errors are normal. Next, we update our model to remove the interaction term, and then compare model with model2:

model2=update(model , ~ . - factorial$diet:factorial$coat)
anova(model, model2)

## Analysis of Variance Table
## 
## Model 1: factorial$growth ~ factorial$diet * factorial$coat
## Model 2: factorial$growth ~ factorial$diet + factorial$coat
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1      6 2.1700                           
## 2      8 2.8567 -2  -0.68667 0.9493 0.4383

This shows that the explanatory power of model 2 is not significantly different from that of model. Therefore, model simplification turned out to be a good thing here!

summary(model2)

##                Df Sum Sq Mean Sq F value Pr(>F)  
## factorial$diet  2  2.660  1.3300   3.725 0.0719 .
## factorial$coat  1  2.613  2.6133   7.319 0.0269 *
## Residuals       8  2.857  0.3571                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We notice that diet is possibly not significant, so we remove it to create model3:

model3=update(model2, ~. -factorial$diet)
anova(model2,model3)

## Analysis of Variance Table
## 
## Model 1: factorial$growth ~ factorial$diet + factorial$coat
## Model 2: factorial$growth ~ factorial$coat
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)  
## 1      8 2.8567                             
## 2     10 5.5167 -2     -2.66 3.7246 0.0719 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Again, p-value is >0.05, showing that the difference in explanatory powers of models 2 and 3 is insignificant. Summarizing model3, the minimum adequate model:

summary(model3)

##                Df Sum Sq Mean Sq F value Pr(>F)  
## factorial$coat  1  2.613  2.6133   4.737 0.0546 .
## Residuals      10  5.517  0.5517                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We notice that after removing “diet”, coat loses significance as an explanatory variable in this model. So either we take both or neither - no one variable is unequivocally important. Exploring further, we look at the means for the 3 diets:

tapply(factorial$growth,factorial$diet,mean)

##    A    B    C 
## 7.70 7.95 8.80

It looks as if diets A and B produce a rather similar response, but perhaps diet C produces faster growth than the other 2 ? We calculate a new factor, diet2, which is 1 for diets A and B and 2 for diet C:

diet2=factor(1+(factorial$diet=="C"))
diet2

##  [1] 1 1 1 1 2 2 1 1 1 1 2 2
## Levels: 1 2

Now we try adding this revised dietary factor to the minimal model and see whther it makes a significant difference using anova:

model4=update(model3, ~. +diet2)
anova(model3,model4)

## Analysis of Variance Table
## 
## Model 1: factorial$growth ~ factorial$coat
## Model 2: factorial$growth ~ factorial$coat + diet2
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     10 5.5167                              
## 2      9 2.9817  1     2.535 7.6518 0.02189 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, it does. Saving that extra degree of freedom, by reducing the levels of diet from 3 to 2 has turned a non-significant effect into a significant one. Let’s see if there is an interaction with this new, simpler factor:

model5=update(model4, ~. +diet2:factorial$coat)
anova(model4,model5)

## Analysis of Variance Table
## 
## Model 1: factorial$growth ~ factorial$coat + diet2
## Model 2: factorial$growth ~ factorial$coat + diet2 + factorial$coat:diet2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1      9 2.9817                           
## 2      8 2.7000  1   0.28167 0.8346 0.3877

No, there isn’t. This means that model4 is the minimal adequate model.

summary(model4)

##                Df Sum Sq Mean Sq F value Pr(>F)  
## factorial$coat  1  2.613  2.6133   7.888 0.0204 *
## diet2           1  2.535  2.5350   7.652 0.0219 *
## Residuals       9  2.982  0.3313                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

THREE-WAY FACTORIAL ANOVA

Daphnia.data=read.csv("c:\\temp\\Course Data\\Daphnia.csv",header=T)
names(Daphnia.data)

## [1] "Growth.rate" "Water"       "Detergent"   "Daphnia"

factorial=aov(Daphnia.data$Growth.rate~Daphnia.data$Water*Daphnia.data$Detergent*Daphnia.data$Daphnia)
summary(factorial)

##                                                                Df Sum Sq
## Daphnia.data$Water                                              1   1.99
## Daphnia.data$Detergent                                          3   2.21
## Daphnia.data$Daphnia                                            2  39.18
## Daphnia.data$Water:Daphnia.data$Detergent                       3   0.17
## Daphnia.data$Water:Daphnia.data$Daphnia                         2  13.73
## Daphnia.data$Detergent:Daphnia.data$Daphnia                     6  20.60
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia  6   5.85
## Residuals                                                      48  33.43
##                                                                Mean Sq F value
## Daphnia.data$Water                                               1.985   2.850
## Daphnia.data$Detergent                                           0.737   1.059
## Daphnia.data$Daphnia                                            19.589  28.128
## Daphnia.data$Water:Daphnia.data$Detergent                        0.058   0.084
## Daphnia.data$Water:Daphnia.data$Daphnia                          6.866   9.859
## Daphnia.data$Detergent:Daphnia.data$Daphnia                      3.433   4.930
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia   0.975   1.399
## Residuals                                                        0.696        
##                                                                  Pr(>F)    
## Daphnia.data$Water                                             0.097838 .  
## Daphnia.data$Detergent                                         0.375478    
## Daphnia.data$Daphnia                                           8.23e-09 ***
## Daphnia.data$Water:Daphnia.data$Detergent                      0.968608    
## Daphnia.data$Water:Daphnia.data$Daphnia                        0.000259 ***
## Daphnia.data$Detergent:Daphnia.data$Daphnia                    0.000532 ***
## Daphnia.data$Water:Daphnia.data$Detergent:Daphnia.data$Daphnia 0.234324    
## Residuals                                                                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Significant interaction between “water and daphnia” and “daphnia and detergent”.

plot(factorial)

## hat values (leverages) are all = 0.3333333
##  and there are no factor predictors; no plot no. 5

tapply(Daphnia.data$Growth.rate,list(Daphnia.data$Detergent,Daphnia.data$Daphnia),mean)

##          Clone1   Clone2   Clone3
## BrandA 2.732227 3.919002 5.003268
## BrandB 2.929140 4.402931 4.698062
## BrandC 3.071335 4.772805 4.019397
## BrandD 2.626797 5.213745 2.834151

Interaction plot:

interaction.plot(Daphnia.data$Detergent,Daphnia.data$Daphnia,Daphnia.data$Growth.rate)