M. Drew LaMar
October 30, 2020
Practice Problem #1
Many humans like the effect of caffeine, but it occurs in plants as a deterrent to herbivory by animals. Caffeine is also found in flower nectar, and nectar is meant as a reward for pollinators, not a deterrent. How does caffeine in nectar affect visitation by pollinators?
Practice Problem #1
Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).
Singaravelan et al. (2005) set up feeding stations where bees were offered a choice between a control solution with 20% sucrose or a caffeinated solution with 20% sucrose plus some quantity of caffeine. Over the course of the experiment, four different concentrations of caffeine were provided: 50, 100, 150, and 200 ppm. The response variable was the difference between the amount of nectar consumed from the caffeine feeders and that removed from the control feeders at the same station (grams).
Discuss: Describe the experimental design.
\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i})\right]^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y})^2 + (Y_{ij} - \bar{Y}_{i})^2 + 2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i})\right] \\ & = & \sum_{i}\sum_{j}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \sum_{i}n_{i}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \end{eqnarray*} } \]
Can show:
\[ \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) = 0, \]
and thus
\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}}. \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]
Definition: The
group mean square is given by
\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).
Definition: The
error mean square is given by
\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).
str(strungOutBees)
'data.frame': 20 obs. of 2 variables:
$ ppmCaffeine : Factor w/ 4 levels "ppm50","ppm100",..: 1 2 3 4 1 2 3 4 1 2 ...
$ consumptionDifferenceFromControl: num -0.4 0.01 0.65 0.24 0.34 -0.39 0.53 0.44 0.19 -0.08 ...
Discuss: Is this data tidy or messy?
Definition: Tidy!
stripchart(consumptionDifferenceFromControl ~ ppmCaffeine,
data = strungOutBees,
vertical = TRUE,
method = "jitter",
xlab="Caffeine (ppm)",
col="red")
Discuss: State the null and alternative hypotheses appropriate for this question.
\[ \begin{eqnarray*} H_{0} & : & \mu_{50} = \mu_{100} = \mu_{150} = \mu_{200} \\ H_{A} & : & \mathrm{At \ least \ one \ of \ the \ means \ is \ different} \end{eqnarray*} \]
Short cut using R
caffResults <- lm(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
anova(caffResults)
Analysis of Variance Table
Response: consumptionDifferenceFromControl
Df Sum Sq Mean Sq F value Pr(>F)
ppmCaffeine 3 1.1344 0.37814 4.1779 0.02308 *
Residuals 16 1.4482 0.09051
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Definition: The
\( R^{2} \) value in ANOVA is the “fraction of the variation explained by groups” and is given by
\[ R^{2} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{SS}_{\mathrm{total}}}. \] Note: \( 0 \leq R^2 \leq 1 \).
beeAnovaSummary <- summary(caffResults)
beeAnovaSummary$r.squared
[1] 0.4392573
Long way
Question: Calculate the following summary statistics for each group: \( n_{i} \), \( \bar{Y}_{i} \), and \( s_{i} \).
library(dplyr)
beeStats <- strungOutBees %>%
group_by(ppmCaffeine) %>%
summarise(n = n(),
mean = mean(consumptionDifferenceFromControl),
sd = sd(consumptionDifferenceFromControl))
knitr::kable(beeStats)
ppmCaffeine | n | mean | sd |
---|---|---|---|
ppm50 | 5 | 0.008 | 0.2887386 |
ppm100 | 5 | -0.172 | 0.1694698 |
ppm150 | 5 | 0.376 | 0.3093218 |
ppm200 | 5 | 0.378 | 0.3927722 |
Compute sum-of-squares
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{groups}} & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 \end{eqnarray*} } \]
grandMean <- mean(strungOutBees$consumptionDifferenceFromControl)
(SS_groups <- sum(beeStats$n*(beeStats$mean - grandMean)^2))
[1] 1.134415
Compute sum-of-squares
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{error}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 = \sum_{i}(n_{i}-1)s_{i}^2 \end{eqnarray*} } \]
(SS_error <- sum((beeStats$n-1)*beeStats$sd^2))
[1] 1.44816
Compute degree of freedom
\[ \begin{equation} \mathrm{df}_{\mathrm{groups}} = k-1 \end{equation} \]
where \( k \) is number of groups.
(df_groups <- 4-1)
[1] 3
(df_error <- nrow(strungOutBees)-4)
[1] 16
Compute degree of freedom
\[ \begin{equation} \mathrm{df}_{\mathrm{error}} = N-k \end{equation} \]
where \( N \) is number of total observations in data.
(df_error <- nrow(strungOutBees)-4)
[1] 16
Compute mean squares
\[ \begin{equation} \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{df}_{\mathrm{groups}}} \end{equation} \]
(MS_groups <- SS_groups/df_groups)
[1] 0.3781383
Compute mean squares
\[ \begin{equation} \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{\mathrm{df}_{\mathrm{error}}} \end{equation} \]
(MS_error <- SS_error/df_error)
[1] 0.09051
Compute \( F \)-statistic and \( P \)-value
\[ \begin{equation} F = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \end{equation} \]
(F_ratio <- MS_groups/MS_error)
[1] 4.177862
(pval <- pf(F_ratio, df_groups, df_error, lower.tail=FALSE))
[1] 0.02307757
Using Statistical F Table
Using Statistical F Table
Create manual table and compare
mytable <- data.frame(Df = c(df_groups, df_error),
SumSq = c(SS_groups, SS_error),
MeanSq = c(MS_groups, MS_error),
Fval = c(F_ratio, NA),
Pval = c(pval, NA))
rownames(mytable) <- c("ppmCaffeine", "Residuals")
knitr::kable(mytable)
Df | SumSq | MeanSq | Fval | Pval | |
---|---|---|---|---|---|
ppmCaffeine | 3 | 1.134415 | 0.3781383 | 4.177862 | 0.0230776 |
Residuals | 16 | 1.448160 | 0.0905100 | NA | NA |
Analysis of Variance Table
Response: consumptionDifferenceFromControl
Df Sum Sq Mean Sq F value Pr(>F)
ppmCaffeine 3 1.1344 0.37814 4.1779 0.02308 *
Residuals 16 1.4482 0.09051
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Assumptions (same as 2-sample \( t \)-test)
Robustness (same as 2-sample \( t \)-test)
Definition: The
Kruskal-Wallis test is a nonparametric method for mulutiple groups based on ranks.
The Kruskal-Wallis test is similar to the Mann-Whitney \( U \)-test and has the same assumptions:
Power of Kruskal-Wallis test is nearly as powerful as ANOVA when sample sizes are large, but has smaller power than ANOVA for small sample sizes.