ANOVA Exercises

Temperature increase due to global change is assumed to affect many ecosystem processes. One hidden from the human eye is the carbon stored in soils worldwide. Soils store roughly 1500 billion tons of carbon, that is more than 3 times the carbon in the atmosphere (while annual CO2 emissions from fossil fuels are roughly 32 billion tons). Most of this carbon is organic in origin (about 60%). Hence, the ecosystem processes that influence the accumulation of soil carbon, such as decomposition of organic matter, are important to study and understand. This is because any change in these processes could potentially lead to climate feed backs that impact global climate change. Will soil keep on storing large amounts of carbon?

How does decomposition of litter change with increasing temperature? Should we expect shifts? And will soils store more, or less carbon as global temperatures increase? One way to start to understand these questions is to conduct research on litter decomposition along climatic gradients (temperature gradients along a mountain for instance).

Exercise 1

A t-test is not appropriate here. Why? Think of the reasons why a t-test should not be used.

# setting the new working directory
setwd("~/OneDrive - Universiteit Leiden/Quantitative Research Skills/R Tutorials Weeks 5-8")

# loading the data (tab delimited text file)
decomp <- read.delim("./OrganicLitter.txt") # alternatively: read.table("./OrganicLitter.txt", header=TRUE)

# checking the 1,2,3 (mean, spread, distribution) to ensure data loaded correctly
summary(decomp) # BUT: location understood as integer by R (not factor - category) -> must be fixed

##    species             location      loss      
##  Length:60          Min.   :1   Min.   :12.58  
##  Class :character   1st Qu.:1   1st Qu.:30.47  
##  Mode  :character   Median :2   Median :37.77  
##                     Mean   :2   Mean   :44.91  
##                     3rd Qu.:3   3rd Qu.:52.64  
##                     Max.   :3   Max.   :99.02

decomp$location <- as.factor(decomp$location) # forcing it to be a factor
str(decomp) # checking if it worked

## 'data.frame':    60 obs. of  3 variables:
##  $ species : chr  "AA" "BN" "SL" "SM" ...
##  $ location: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ loss    : num  45.3 25.3 28.2 43 32.4 ...

# getting an impression of the overall distribution
hist(decomp$loss, xlab="loss", main="Decomposition dataset")

Answer: We have more than two groups, and that means there are many pair-wise tests. If we use t-tests we would get an increased chance of at least 1 Type I error. We could lower the \(\alpha\) rate, but what would increase the Type II errors probability (\(\beta\) rate). Instead, we use a test which keeps the \(\alpha\) and \(\beta\) rates at their standards: the one-way ANOVA.

In other words, when you set \(\alpha\)=0.05, you accept a 5% chance of a false positive (Type I error) per test. But when you run multiple t-tests, these probabilities accumulate. So, instead of a 5% error rate, the chance of incorrectly finding a “significant” difference increases with the number of groups! e.g.
4 groups | 6 comparisons | (0.95)^6 | 27% false positive rate 5 groups | 10 comparisons | (0.95)^10 | 40% false positive rate 6 groups | 15 comparisons | (0.95)^15 | 54% false positive rate

ANOVA performs one aomnibus test that controls the Type I error rate at your chosen \(\alpha\) rate, typically 5%, across all groups simultaneously. It asks: Is there any difference among these groups?

Hypothesis: The rate of litter decomposition, as measured by the loss of mass, is significantly altered by the location in which the litter was placed.
- H0: There is no difference between the average rate of litter decomposition across locations. All locations have equal mean decomposition rates. - HA: There is a difference between the average rate of litter decomposition across locations. At least one location differs in decomposition rate from the others.

Exercises 2 & 5

Evaluate the second assumption of the ANOVA (homogeneity of variances) for the litter dataset. What do you conclude on the homogeneity of variance, using the information from the tests and graphical tests?

require(car)

## Loading required package: car

## Loading required package: carData

# getting an estimate of the distribution of the data for each location
boxplot(loss~location,data=decomp) # decomposition rate as a function of location

# though the boxplot already suggests that variances are unequal across locations, to be sure...
# formal test: Levene's test (F-test only works for 2 groups)
leveneTest(loss~location,data=decomp) # decomposition rate as a function of location

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value    Pr(>F)    
## group  2  8.9127 0.0004286 ***
##       57                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# same result as leveneTest(decompAnov) --> extracts raw data from this object

Answer: It appears that the assumption of equal variances is violated. We can reject the hypothesis that the variances are equal among groups. Hence, we should distrust the results of our ANOVA test. If variances differ too much, the F-test becomes unreliable. This is why transformations are often the next step. Alternatives include the Kruskal-Wallis test when your data is non-normal, ordinal, or you cannot assume equal variances (see the QRS cheatsheet).

Exercises 3 & 4

Reproduce the histogram and qqplot. To do so you need to extract the residuals from the anova test (all functions given in the cheatsheet). Evaluate the third assumption (normality) of the ANOVA for the litter dataset. What do you conclude on the normality of the residual, using the information from the tests and graphs?

Residuals = “leftovers” after we subtract the model’s predictions (here the means of each locations). By checking them, we test whether ANOVA’s assumptions (equal spread, normal distribution) are plausible.

# evaluating normality using the residuals, since these measure within-group variability
decompAnov <- aov(loss~location,data=decomp)

# seeing the attributes of earlier ANOVA test --> can be treated as column names of data.frame
attributes(decompAnov) # fitted.values are the predictions for each observation, in this case: the means

## $names
##  [1] "coefficients"  "residuals"     "effects"       "rank"         
##  [5] "fitted.values" "assign"        "qr"            "df.residual"  
##  [9] "contrasts"     "xlevels"       "call"          "terms"        
## [13] "model"        
## 
## $class
## [1] "aov" "lm"

# residuals in ANOVA = observations - fitted.values (mean of each location)

# plotting the residuals
hist(decompAnov$residuals, freq=FALSE, main="ANOVA Residuals", xlab="residuals", 
     col="white") # alternatively: hist(residuals(anovaRes))
# if freq=FALSE, probability/component densities are plotted (so that histogram has total area of one)

# for plotting normal curve, saving residuals in object is easier: decompRes <- residuals(decompAnov)
curve(dnorm(x,mean(decompAnov$residuals),sd(decompAnov$residuals)),add=TRUE,lwd=2)

car::qqPlot(decompAnov$residuals) # testing whether residuals fall within the confidence envelope

## [1] 19 41

?hist
# thought the qqPlot already suggests that groups are normally distributed, to be sure...
# formal test: Shapiro-Wilk test
shapiro.test(decompAnov$residuals) # remember to run the test on the residuals!

## 
##  Shapiro-Wilk normality test
## 
## data:  decompAnov$residuals
## W = 0.98419, p-value = 0.6278

# plotting 4 diagnostic plots
plot(decompAnov)

plot(decompAnov$fitted.values,decompAnov$residuals,xlab="residuals",ylab="fitted values")

Answer: The formal tests simply confirm what we already suspected: the residuals are likely normal.

Notes: - Formal tests like Shapiro–Wilk are very sensitive to sample size - they fail at too low and too high samples sizes - so relying solely on them is problematic. That’s why we always pair them with visual/graphic checks (histogram, QQ-plot), since these are not dependent on sample size! - When running plot(decompAnov), you get four diagnostic plots, each one of which helps you check if ANOVA’s assumptions are reasonable: 1. Residuals vs Fitted o Shows if residuals (errors) are centered around 0 and have equal spread across fitted values. o If you see a funnel shape (residuals spread out more for bigger fitted values), it suggests unequal variances → ANOVA may not be valid. 2. Normal Q-Q plot o Compares residuals to what we’d expect under a normal distribution. o If points fall roughly on the diagonal line, normality assumption is fine. o Big S-curves or outliers far from the line = potential violation. 3. Scale-Location (Spread vs Fitted) o Similar to Residuals vs Fitted, but plots square-root of standardized residuals. o Easier to see whether variance is constant across groups. o Ideally a flat horizontal band; sloping = variance problems. 4. Residuals vs Leverage (Cook’s Distance) o Detects influential points that may drive the results. o Look for points with high Cook’s distance (red circles or far from bulk of data). o If you see one or two unusual points, you may need to check if they are data errors or true outliers. o The graph in this exericise shows NO SIGNS OF OUTLIERS as the Cook’s distance lines that would show up are not even visible.

Exercise 6

Interpret the ANOVA-table. what does it suggest, and do you think it is trustworthy?

If you are happy that the assumptions are not violated, you can move forward and interpret the model by using the summary function (summary(anova)). The summary of the ANOVA test shows several things, including the between and within group variances, the F-statistic (see our video wall) and the p-value. It is only safe to interpret this if the assumptions are not violated.

summary(decompAnov) # don't interpret this as we can't trust this yet!

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## location     2  13925    6963   22.62 5.85e-08 ***
## Residuals   57  17542     308                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: If variances differ too much - as in this case - the F-test becomes unreliable. Therefore, the ANOVA test results are unreliable.

Exercise 7

What are your conclusions on the test assumptions for the transformed data?

If you feel that we did not meet the assumptions of your test, and that it is not trustworthy, you can decide to 1) do a non parametric test or 2) transform the dependent variable (the ”loss” observations).

Answer: …

decompRes <- residuals(decompAnov)

decomp$logloss <- log10(decomp$loss) # adding a column with transformed decomposition rates

decompAnovLOG <- aov(logloss~location,data=decomp)
decompResLOG <- residuals(decompAnovLOG) # alternatively: decompAnovLOG$residuals

# testing assumptions again for transformed data
# 1. equal variances
par(mfrow=c(1,2))
boxplot(decompRes~location,data=decomp,main="Original Data")
boxplot(decompResLOG~location,data=decomp,main="Log10 Transformed Data")

leveneTest(logloss~location,data=decomp)

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  2  0.3851 0.6822
##       57

# 2. normality
car::qqPlot(decompResLOG) # qqplot of just the transformed residuals

## [1] 10 19

Answer: The log10 transformation worked - all ANOVA assumptions are now met and we can proceed. Log transformations compress large values more than small ones, and this can equalize variance across groups. What the ANOVA test says is: Given that the variances are actually equal (H0 is true), there’s a 68.22% probability of observing variance differences as extreme as (or more extreme than) what we saw in our sample.

Exercise 8

Finally, what do you conclude on our hypothesis that location affects the amount of mass loss from litter?

Remember that the F-distribution’s shape depends entirely on the two df values (numerator and denominator). Different df = different distribution = different critical values. The sum of squares between groups and sum of squares within groups are respectively divided by their degrees of freedom. - Between-group df (numerator): More groups → more df → flatter, wider distribution → higher F critical needed - Within-group df (denominator): More observations → more df → narrower, more concentrated distribution → lower F critical needed

# looking for the degrees of freedom (df)
summary(decompAnovLOG) # df= 2 (number of groups - 1) & 57 (total observations - groups)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## location     2  1.119  0.5593   18.57 6.17e-07 ***
## Residuals   57  1.717  0.0301                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# dividing between-variation (location sum sq) by its df (2) gives its mean sq!
# diving within-variation (residuals sum sq) by its df (57) gives its mean sq!
# dividing between-variation mean sq by within-variation mean sq gives the F-statistic!

curve(df(x,2,57),0,20,lwd=2,xlab="F-value",ylab="probability")
Fcrit <- qf(0.95,2,57) # the threshold for rejecting H0 at α = 0.05
# if observed F>Fcrit, reject H0 (locations differ significantly)
# if observed F<Fcrit, fail to reject H0 (no significant difference among locations)
abline(v=Fcrit,lty=2,col="red") # adding threshold to the F-curve

aovtable <- summary(decompAnovLOG)
Fval <- aovtable[[1]]$`F value`
arrows(Fval,0.2,Fval,0)

# concluding by checking the ANOVA table
aovtable

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## location     2  1.119  0.5593   18.57 6.17e-07 ***
## Residuals   57  1.717  0.0301                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# showing fitted values in ANOVA = group means
# ANOVA predictions simply predict each observation will equal its group's mean
means1 <- tapply(predict(decompAnovLOG),decomp$location,mean)
means2 <- tapply(decomp$logloss,decomp$location,mean)
all.equal(means1,means2) # checking for near equality (despite miniscule rounding differences)

## [1] TRUE

Answer: The ANOVA hypotheses are: - H0: there is no significant difference among groups in their means (\(\alpha\)>0.05) - H1: there is a significant difference among groups in their means (\(\alpha\)<0.05) We can reject H0: there appears to be a difference in means. Location does have an effect on litter decomposition rate. However, we are now working with geometric means.