C. Donovan
Extending the \( t \)-tests to more groups
Titanium levels in Cannabis leaves
We want to investigate if the average Titanium level differs in cannabis leaves across the three soil types: Blockhouse Bay (bhb),Northland (nth) and Potting mix (pm).
p <- ggplot(TiData) + geom_boxplot(aes(Group, Ti), fill = 'purple', alpha = 0.8)
p
For an ANOVA:
[waves hands, uses board - attending lectures is useful]
so the testing process is similar to previous:
Calculate the \( F \)-test statistic for our data, \( f_0 \), using the ratio of the variability between groups (\( s^2_B \)) and the variability within groups (\( s^2_W \)):
\[ f_0=\frac{s^2_B}{s^2_W} \]
\[ s^2_B=\frac{\sum_{k=1}^K n_k(\bar{x}_{k}-\bar{x}_{..})^2}{k-1} \]
\[ s^2_W=\frac{\sum_{k=1}^K (n_k -1)s^2_k}{n_{tot}-k} \]
Evidence against \( H_0 \) is provided by values of \( f_0 \) which would be unusually large if \( H_0 \) were true
\( F \)-distributions take on a range of shapes..
x <- seq(0, 5, length = 200)
plot(x, df(x, df1 = 1, df2 = 1), lwd = 2, type ='l')
lines(x, df(x, df1 = 10, df2 = 1), lwd = 2, col = 'blue')
lines(x, df(x, df1 = 100, df2 = 100), lwd = 2, col = 'purple')
\[ s^2_B=\frac{\sum_{k=1}^K n_k(\bar{x}_{k}-\bar{x}_{..})^2}{k-1} \]
gpSummary <- TiData %>% group_by(Group) %>% summarise(mean = mean(Ti), var = var(Ti), N = n())
groupSS <- sum(gpSummary$N*(gpSummary$mean-mean(TiData$Ti))^2)
mseBetween <- groupSS/(3-1)
mseBetween
[1] 59.29798
\[ s^2_W=\frac{\sum_{k=1}^K (n_k -1)s^2_k}{n_{tot}-k} \]
withinSS <- sum((gpSummary$N-1)*gpSummary$var)
mseWithin <- withinSS/(nrow(TiData)-3)
mseWithin
[1] 2.094315
f_0 <- mseBetween/mseWithin
pf(f_0, df1 = 2, df2 = nrow(TiData)-3, lower.tail = F)
[1] 1.426962e-08
Of course, you'd not do it like that - simply:
weedANOVA <- aov(Ti ~ Group, data = TiData)
summary(weedANOVA)
Df Sum Sq Mean Sq F value Pr(>F)
Group 2 118.60 59.30 28.31 1.43e-08 ***
Residuals 43 90.06 2.09
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In order for \( F \)-test results to be valid, we need the following assumptions to be met:
Again - all about the noise: A simple model applies - the noise looks like independent draws from one Normal distribution
Let's check the others:
qqnorm(weedANOVA$residuals)
qqline(weedANOVA$residuals)
shapiro.test(weedANOVA$residuals)
Shapiro-Wilk normality test
data: weedANOVA$residuals
W = 0.98038, p-value = 0.6216
TiData %>% group_by(Group) %>% summarise(SD = sd(Ti))
# A tibble: 3 x 2
Group SD
<fct> <dbl>
1 bhb 1.32
2 nth 1.62
3 pm 1.45
So, you've found a significant ANOVA result
The immediate question is: what is different from what?
So, we're back to the multiple comparisons problem - this can be addressed by modified testing (adjustments for multiple comparisons)
Recall
The obvious(?) fix, is to alter our significance threshold for each test (make it “harder”), so the collective type-1 is reasonable
There are many such approaches and they are not without controversy (refer end of lecture)
We will consider 3 types of adjustment here (many many more):
This one is really easy. We set a threshold \( p \)-value by dividiing by the number of comparisons we plan:
\[ \alpha_{adj} = \alpha/k \]
where \( \alpha_{adj} \) is our new threshold, \( \alpha \) is the desired type 1 error collectively (family error rate) and \( k \) is the number of comparisons
For example, if we have a control, we plan to compare against each of 3 treatments, we adjust:
\[ \alpha_{adj} = \alpha/k = 0.05/3 = 0.01666667 \]
So we need \( p \)-value < 0.0166 to conclude a significant result.
Observe the probability of 1 more more false positives is:
1-pbinom(0, 3, 0.05/3) # recall from binomial lectures
[1] 0.0491713
It gives a family error rate \( \le \alpha \)
We set a threshold \( p \)-value based on the number of comparisons we plan:
\[ \alpha_{adj} = 1 - (1 - \alpha)^{1/k} \]
where \( \alpha_{adj} \) is our new threshold, \( \alpha \) is the desired type 1 error collectively (family error rate) and \( k \) is the number of comparisons
For example, if we have a control, we plan to compare against each of 3 treatments, to give overall type 1 error of 5%
\[ \alpha_{adj} = 1 - (1 - \alpha)^{1/k} = 1 - (1 - 0.05)^{1/3} = 0.01695243 \]
So we need \( p \)-value < 0.01695 to conclude a significant result.
Observe the probability of 1 more more false positives is:
1-pbinom(0, 3, 0.01695243) # recall from binomial lectures
[1] 0.05000001
Which is the rationale
This is generally well-regarded (the previous ones often considered too conservative).
For example - using our ANOVA previously
TukeyHSD(weedANOVA)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = Ti ~ Group, data = TiData)
$Group
diff lwr upr p adj
nth-bhb 2.477778 0.9544691 4.001086 0.0008236
pm-bhb 3.750000 2.5402571 4.959743 0.0000000
pm-nth 1.272222 -0.1008697 2.645314 0.0743237
So, conclude the ANOVA result was due to:
nth
being significantly higher than bhb
- on average 0.95 to 4.00 units higher (95% confidence)pm
being significantly higher than bhb
- on average 2.54 to 4.96 units higher (95% confidence)Whereas pm
and nth
are not significantly different - on average pm
was -0.1 units lower, to 1.27 units higher than nth
(95% confidence), so 0 is plausible
For the interested - papers on moodel:
In short, we are trading one error for another
Type 1 error: the incorrect rejection of \( H_0 \)
Type 2 error: the incorrect failure to reject \( H_0 \)
Clearly we can control type 1 at the cost of type 2:
This relates to a concept of power, which we cover next.
We've covered:
Next: