setwd("F:/Documents/Work/Courses/Stats_LUND_2012_October/Exercises/Ex_07")

To learn:
. How to do a one-way ANOVA in R
. How to test its assumptions
. How to make planned contrasts
. How to make post-hoc comparisons
. That a one-way ANOVA with two categories is identical to an independent samples t-test

Part A (warming up):
1. Read in the good old file “tits.xls” again. The data are: spe (or SPE), the species of the studied brood (TX for great tit [talgoxe] and BM for blue tit [blåmes]), wei, fledging weight - the average body weight (in grams) of the young when they leave the nest, and egg, clutch size - the number of eggs that gave rise to that brood. We will test (once again) if blue tits are as heavy as great tits, on average. Make a couple of appropriate plots (boxplot? errorbar?) to get started.

tits <- read.csv("tits.csv", header = TRUE, sep = ",")
str(tits)

## 'data.frame':    90 obs. of  3 variables:
##  $ SPE: Factor w/ 2 levels "BM","TX": 2 2 2 2 2 2 1 2 2 2 ...
##  $ wei: num  17.3 17.2 14.5 17.8 18 18.3 9.3 15.1 NA 15.8 ...
##  $ egg: int  10 6 9 4 7 8 10 9 10 10 ...

First a boxplot. These are good for non-parametric tests, but not so suitable for parametric tests where the variance is tested.

boxplot(tits$wei ~ tits$SPE, rm.na = TRUE)

plot of chunk unnamed-chunk-3

Now an error bar plot with 95% CI of mean

source("F:/Documents/Work/Courses/Stats_LUND_2012_October/Exercises/Ex_04/summarySEfunction.R")
bw.se <- summarySE(data = tits, measurevar = "wei", groupvars = "SPE", na.rm = TRUE)

## Loading required package: plyr


# extract the confidence intervals
bw.se$ci

## [1] 0.5151 0.4385


library(ggplot2)
# plot data using ggplot
ggplot(bw.se, aes(x = SPE, y = wei)) + geom_errorbar(aes(ymin = wei - ci, ymax = wei + 
    ci), width = 0.1) + geom_line() + geom_point()

## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?

plot of chunk unnamed-chunk-4

For the actual test, use both a t-test (as before) and a one-way ANOVA. The t.test command has the form t.test(y~x). The same is true for a univariate ANOVA, except you use the command aov in place of t.test. Save the output of the ANOVA in a new object (e.g. aov.wei) and then extract the information from it using summary(aov.wei). Compare the significance levels with the t-tests. They should be almost the same (and extremely low). Why?

t-test

t.test(tits$wei ~ tits$SPE, na.rm = TRUE)

## 
##  Welch Two Sample t-test
## 
## data:  tits$wei by tits$SPE 
## t = -19.1, df = 44.22, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0 
## 95 percent confidence interval:
##  -6.907 -5.588 
## sample estimates:
## mean in group BM mean in group TX 
##            11.16            17.41

ANOVA

aov.wei <- aov(tits$wei ~ tits$SPE)
summary(aov.wei)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## tits$SPE     1    505     505     269 <2e-16 ***
## Residuals   62    116       2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 26 observations deleted due to missingness

The significane level (p value) is the same for both tests. It is the same because we are comparing just two groups - the two tit species. Thus the one-way ANOVA is equivelant of running a t-test.

Part B:
1. Use the file “peas.csv”. It contains data on the effects of sugars on length of pea sections (something on a plant). This example is Box 9.4 in Sokal & Rohlf. The variables are the treatments factor, containing five categories, and the length of the segments.

peas <- read.csv("peas.csv", header = TRUE, sep = ";")
str(peas)

## 'data.frame':    50 obs. of  2 variables:
##  $ treat: Factor w/ 5 levels "1g1f","2f","2g",..: 5 5 5 5 5 5 5 5 5 5 ...
##  $ len  : int  75 67 70 75 65 71 67 67 76 68 ...

Make an error-bar graph and a boxplot, to look at your data. Do you remember the difference between the plots? Make treat (Experimental treatment) your x-variable, and len (Length of pea section) the y. Boxplot

boxplot(peas$len ~ peas$treat, rm.na = TRUE)

plot of chunk unnamed-chunk-8

Now an error bar plot with 95% CI of mean

source("F:/Documents/Work/Courses/Stats_LUND_2012_October/Exercises/Ex_04/summarySEfunction.R")
bw.se <- summarySE(data = peas, measurevar = "len", groupvars = "treat", na.rm = TRUE)

# extract the confidence intervals
bw.se$ci

## [1] 1.012 1.340 1.171 1.282 2.850


library(ggplot2)
# plot data using ggplot
ggplot(bw.se, aes(x = treat, y = len)) + geom_errorbar(aes(ymin = len - ci, 
    ymax = len + ci), width = 0.1) + geom_line() + geom_point()

## geom_path: Each group consist of only one observation. Do you need to
## adjust the group aesthetic?

plot of chunk unnamed-chunk-9

The boxplot show the median value (middle), the 50% quantiles (upper and lower) and the maximum and minimum values (whiskers).

Make a one-way analysis of variance. Dissect your ANOVA output - what does it tell you? What null-hypothesis does it test?

aov.peas <- aov(peas$len ~ peas$treat)
summary(aov.peas)

##             Df Sum Sq Mean Sq F value  Pr(>F)    
## peas$treat   4   1077   269.3    49.4 6.7e-16 ***
## Residuals   45    246     5.5                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It tests the null hypothesis that none of the groups differ from the grand mean. The significant results suggests that at least one of the groups differs from the others. It is highly significant, explaining a high proportion of variation (see sum sq).

There's an alternative way to do exactly the same test. Try using the lm command instead of the aov command. How does the output differ? Save the residuals using the command residuals.

lm.peas <- lm(peas$len ~ peas$treat)
summary(lm.peas)

## 
## Call:
## lm(formula = peas$len ~ peas$treat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.100 -1.825 -0.150  0.975  5.900 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    58.000      0.739   78.53  < 2e-16 ***
## peas$treat2f    0.200      1.045    0.19     0.85    
## peas$treat2g    1.300      1.045    1.24     0.22    
## peas$treat2s    6.100      1.045    5.84  5.4e-07 ***
## peas$treatc    12.100      1.045   11.58  4.3e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 2.34 on 45 degrees of freedom
## Multiple R-squared: 0.814,   Adjusted R-squared: 0.798 
## F-statistic: 49.4 on 4 and 45 DF,  p-value: 6.74e-16

# str(lm.peas)
peas$resids <- lm.peas$residuals

The output for lm differs from the aov output. Giving signicant values for each treatment, which is whether an individual treatment is different from the first treamtment.

Does your data set fulfil the assumptions of normality? What should be normal? Test if the residuals you extracted above have a normal distribution. You can also check for normality within each treatment, which may or may not be advisable. What do you think is the case here?

# Make histogram of residuals
hist(peas$resid)

plot of chunk unnamed-chunk-12

This looks pretty normal.

# Test for normality
shapiro.test(peas$resid)

## 
##  Shapiro-Wilk normality test
## 
## data:  peas$resid 
## W = 0.9693, p-value = 0.2164

# Make Q-Q plot
qqnorm(peas$resid)
qqline(peas$resid)

plot of chunk unnamed-chunk-13

the qq plot looks a bit strange, with a deviation from the line at low and high values, suggesting perhaps a quadratic function.
Why is it inappropriate to test the normality of the raw len variable?
Because there are several groups, we wish to know whether each group is normal, not the overall dataset. Testing if the error left over from the model is normal takes this into account (i.e. we have already taken out the group effect).

Investigate if your data set fulfils the assumptions of homogeneity of variances. Use the leveneTest command in the car library. What is the result? Is that very bad? If you look at the boxplot above, which group seems to be responsible for the significance?

library(car)

## Loading required package: MASS

## Loading required package: nnet

leveneTest(lm.peas)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)   
## group  4    4.58 0.0035 **
##       45                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This suggests that there is not homogeneity of variance, that is, the variance differs across groups. This is most likely largely caused by group 'c', which shows greater variance than the other groups, as visible from the boxplot above.

Test the null-hypothesis that the control group is identical to all other groups taken together. Such hypotheses are tested as contrasts. You can see what the default contrasts are by using the command contrasts(peas$treat). The matrix shows that each treatment is compared with the 1g1f treatment as the default. You can see the results of the default contrasts by using the command summary.lm(aov.peas). To compare all other treatments against the control, you need to specify a new contrast matrix, with columns equal to the number of treatments minus 1. Enter 4 for the treatment that is the reference treatment, and -1 for the other treatments in the first contrast, then all zeros for the other contrasts:

# view default contrasts
contrasts(peas$treat)

##      2f 2g 2s c
## 1g1f  0  0  0 0
## 2f    1  0  0 0
## 2g    0  1  0 0
## 2s    0  0  1 0
## c     0  0  0 1

contrast.1 <- cbind(c(-1, -1, -1, -1, 4), c(0, 0, 0, 0, 0), c(0, 0, 0, 0, 0), 
    c(0, 0, 0, 0, 0))
contrasts(peas$treat) <- contrast.1
summary.lm(aov(peas$len ~ peas$treat))

## 
## Call:
## aov(formula = peas$len ~ peas$treat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##   -5.1   -2.9   -0.9    2.1    7.1 
## 
## Coefficients: (3 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   61.940      0.452  137.01  < 2e-16 ***
## peas$treat1    2.040      0.226    9.02  6.5e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 3.2 on 48 degrees of freedom
## Multiple R-squared: 0.629,   Adjusted R-squared: 0.621 
## F-statistic: 81.5 on 1 and 48 DF,  p-value: 6.52e-12

The contrast tests the groups with positive value against those with a negative, and the sum of the values you enter should (normally, unless you have really good reasons) equal zero. Groups represented by the value 0 are left out of the test. More specifically, the test is whether the function: 4*mean(control) - mean('2g') - mean('2f') - mean('1g1f') - mean('2s') is different from zero. Run the ANOVA with the contrast. Should you accept or reject the null-hypothesis (that the function above is zero)?
We should reject the null hypothesis, for we gain a significant result.

For more details about contrasts, you can find some helpful information here: http://wwwuser.gwdg.de/~cscherb1/content/Statistics%20Course%20files/Working%20with%20orthogonal%20contrasts%20in%20R.pdf.

Test a null-hypothesis that the mixed sugar treatment is identical to the single sugar treatments, using a new contrast.

contrast.2 <- cbind(c(3, -1, -1, -1, 0), c(0, 0, 0, 0, 0), c(0, 0, 0, 0, 0), 
    c(0, 0, 0, 0, 0))
contrasts(peas$treat) <- contrast.2
summary.lm(aov(peas$len ~ peas$treat))

## 
## Call:
## aov(formula = peas$len ~ peas$treat)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.57  -3.44  -1.57   2.43  14.06 
## 
## Coefficients: (3 not defined because of singularities)
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   61.940      0.729   84.99   <2e-16 ***
## peas$treat1   -0.633      0.470   -1.35     0.18    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Residual standard error: 5.15 on 48 degrees of freedom
## Multiple R-squared: 0.0364,  Adjusted R-squared: 0.0163 
## F-statistic: 1.81 on 1 and 48 DF,  p-value: 0.185

Make a post-hoc test of the difference between all groups. There are many different methods that can be used. Tukey is a good method to use as ones personal standard, but should perhaps not be trusted in this example as variances are not equal between groups. You can also try Bonferroni-corrected t-tests.

TukeyHSD(aov.peas)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = peas$len ~ peas$treat)
## 
## $`peas$treat`
##         diff    lwr    upr  p adj
## 2f-1g1f  0.2 -2.768  3.168 0.9997
## 2g-1g1f  1.3 -1.668  4.268 0.7256
## 2s-1g1f  6.1  3.132  9.068 0.0000
## c-1g1f  12.1  9.132 15.068 0.0000
## 2g-2f    1.1 -1.868  4.068 0.8291
## 2s-2f    5.9  2.932  8.868 0.0000
## c-2f    11.9  8.932 14.868 0.0000
## 2s-2g    4.8  1.832  7.768 0.0003
## c-2g    10.8  7.832 13.768 0.0000
## c-2s     6.0  3.032  8.968 0.0000

pairwise.t.test(peas$len, peas$treat, p.adj = "bonferroni", paired = F)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  peas$len and peas$treat 
## 
##    1g1f    2f      2g      2s     
## 2f 1.00000 -       -       -      
## 2g 1.00000 1.00000 -       -      
## 2s 5.4e-06 1.0e-05 0.00035 -      
## c  4.3e-14 7.5e-14 1.8e-12 7.5e-06
## 
## P value adjustment method: bonferroni

Methods to avoid are: Scheffé, S-N-K and Duncan. Look at the differences in P-values produced by the different methods. What does the output tell us? What is the difference in philosophy behind making contrasts on the one hand, and post-hoc tests on the other?
Contrasts should only be used when we have prior expectations of what differences might be. Whereas post-hoc tests can be used where we don't know which groups differ, and have no prior expectation - the post-hoc test helps us determin which groups differ.

Make a nonparametric version of the one-way ANOVA, for example a Kruskal-Wallis test. The Kruskal-Wallis is a generalisation for many groups of the Mann-Whitney U test, just like the one-way ANOVA is a generalisation of the t-test. Use the command kruskal.test.

kruskal.test(peas$len ~ peas$treat)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  peas$len by peas$treat 
## Kruskal-Wallis chi-squared = 38.44, df = 4, p-value = 9.105e-08