M. Drew LaMar
November 18/20, 2019
\[ t = \frac{\bar{Y}_{1}-\bar{Y}_{2}}{\mathrm{SE}_{\bar{Y}_{1}-\bar{Y}_{2}}} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} \end{eqnarray*} } \]
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{total}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i})\right]^2 \\ & = & \sum_{i}\sum_{j}\left[(\bar{Y}_{i} - \bar{Y})^2 + (Y_{ij} - \bar{Y}_{i})^2 + 2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i})\right] \\ & = & \sum_{i}\sum_{j}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \sum_{i}n_{i}(\bar{Y}_{i} - \bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij} - \bar{Y}_{i})^2 + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \\ & = & \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}} + \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) \end{eqnarray*} } \]
Can show:
\[ \sum_{i}\sum_{j}2(\bar{Y}_{i} - \bar{Y})(Y_{ij} - \bar{Y}_{i}) = 0, \]
and thus
\[ \mathrm{SS}_{\mathrm{total}} = \mathrm{SS}_{\mathrm{groups}} + \mathrm{SS}_{\mathrm{error}}. \]
Data: With \( i \) representing group \( i \), we have
\[ Y_{ij} - \bar{Y} = (\bar{Y}_{i} - \bar{Y}) + (Y_{ij} - \bar{Y}_{i}) \]
\[ \scriptsize{\mathrm{SS}_{\mathrm{total}} = \sum_{i}\sum_{j}(Y_{ij}-\bar{Y})^2 = \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 + \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2} \]
Definition: The
group mean square is given by
\[ \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{df_{\mathrm{groups}}}, \] with \( df_{\mathrm{groups}} = k-1 \).
Definition: The
error mean square is given by
\[ \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{df_{\mathrm{error}}}, \] with \( df_{\mathrm{error}} = \sum (n_{i}-1) = N-k \).
str(strungOutBees)
'data.frame': 20 obs. of 2 variables:
$ ppmCaffeine : Factor w/ 4 levels "ppm50","ppm100",..: 1 2 3 4 1 2 3 4 1 2 ...
$ consumptionDifferenceFromControl: num -0.4 0.01 0.65 0.24 0.34 -0.39 0.53 0.44 0.19 -0.08 ...
Discuss: Is this data tidy or messy?
Definition: Tidy!
stripchart(consumptionDifferenceFromControl ~ ppmCaffeine,
data = strungOutBees,
vertical = TRUE,
method = "jitter",
xlab="Caffeine (ppm)",
col="red")
Discuss: State the null and alternative hypotheses appropriate for this question.
\[ \begin{eqnarray*} H_{0} & : & \mu_{50} = \mu_{100} = \mu_{150} = \mu_{200} \\ H_{A} & : & \mathrm{At \ least \ one \ of \ the \ means \ is \ different} \end{eqnarray*} \]
Short cut using R
caffResults <- lm(consumptionDifferenceFromControl ~ ppmCaffeine, data=strungOutBees)
anova(caffResults)
Analysis of Variance Table
Response: consumptionDifferenceFromControl
Df Sum Sq Mean Sq F value Pr(>F)
ppmCaffeine 3 1.1344 0.37814 4.1779 0.02308 *
Residuals 16 1.4482 0.09051
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Definition: The
\( R^{2} \) value in ANOVA is the “fraction of the variation explained by groups” and is given by
\[ R^{2} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{SS}_{\mathrm{total}}}. \] Note: \( 0 \leq R^2 \leq 1 \).
beeAnovaSummary <- summary(caffResults)
beeAnovaSummary$r.squared
[1] 0.4392573
Long way
Question: Calculate the following summary statistics for each group: \( n_{i} \), \( \bar{Y}_{i} \), and \( s_{i} \).
library(dplyr)
beeStats <- strungOutBees %>%
group_by(ppmCaffeine) %>%
summarise(n = n(),
mean = mean(consumptionDifferenceFromControl),
sd = sd(consumptionDifferenceFromControl))
knitr::kable(beeStats)
ppmCaffeine | n | mean | sd |
---|---|---|---|
ppm50 | 5 | 0.008 | 0.2887386 |
ppm100 | 5 | -0.172 | 0.1694698 |
ppm150 | 5 | 0.376 | 0.3093218 |
ppm200 | 5 | 0.378 | 0.3927722 |
Compute sum-of-squares
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{groups}} & = & \sum_{i}n_{i}(\bar{Y}_{i}-\bar{Y})^2 \end{eqnarray*} } \]
grandMean <- mean(strungOutBees$consumptionDifferenceFromControl)
(SS_groups <- sum(beeStats$n*(beeStats$mean - grandMean)^2))
[1] 1.134415
Compute sum-of-squares
\[ \scriptsize{ \begin{eqnarray*} \mathrm{SS}_{\mathrm{error}} & = & \sum_{i}\sum_{j}(Y_{ij}-\bar{Y}_{i})^2 = \sum_{i}(n_{i}-1)s_{i}^2 \end{eqnarray*} } \]
(SS_error <- sum((beeStats$n-1)*beeStats$sd^2))
[1] 1.44816
Compute degree of freedom
\[ \begin{equation} \mathrm{df}_{\mathrm{groups}} = k-1 \end{equation} \]
where \( k \) is number of groups.
(df_groups <- 4-1)
[1] 3
(df_error <- nrow(strungOutBees)-4)
[1] 16
Compute degree of freedom
\[ \begin{equation} \mathrm{df}_{\mathrm{error}} = N-k \end{equation} \]
where \( N \) is number of total observations in data.
(df_error <- nrow(strungOutBees)-4)
[1] 16
Compute mean squares
\[ \begin{equation} \mathrm{MS}_{\mathrm{groups}} = \frac{\mathrm{SS}_{\mathrm{groups}}}{\mathrm{df}_{\mathrm{groups}}} \end{equation} \]
(MS_groups <- SS_groups/df_groups)
[1] 0.3781383
Compute mean squares
\[ \begin{equation} \mathrm{MS}_{\mathrm{error}} = \frac{\mathrm{SS}_{\mathrm{error}}}{\mathrm{df}_{\mathrm{error}}} \end{equation} \]
(MS_error <- SS_error/df_error)
[1] 0.09051
Compute \( F \)-statistic and \( P \)-value
\[ \begin{equation} F = \frac{\mathrm{MS}_{\mathrm{groups}}}{\mathrm{MS}_{\mathrm{error}}} \end{equation} \]
(F_ratio <- MS_groups/MS_error)
[1] 4.177862
(pval <- pf(F_ratio, df_groups, df_error, lower.tail=FALSE))
[1] 0.02307757
Using Statistical F Table
Using Statistical F Table
Create manual table and compare
mytable <- data.frame(Df = c(df_groups, df_error),
SumSq = c(SS_groups, SS_error),
MeanSq = c(MS_groups, MS_error),
Fval = c(F_ratio, NA),
Pval = c(pval, NA))
rownames(mytable) <- c("ppmCaffeine", "Residuals")
knitr::kable(mytable)
Df | SumSq | MeanSq | Fval | Pval | |
---|---|---|---|---|---|
ppmCaffeine | 3 | 1.134415 | 0.3781383 | 4.177862 | 0.0230776 |
Residuals | 16 | 1.448160 | 0.0905100 | NA | NA |
Analysis of Variance Table
Response: consumptionDifferenceFromControl
Df Sum Sq Mean Sq F value Pr(>F)
ppmCaffeine 3 1.1344 0.37814 4.1779 0.02308 *
Residuals 16 1.4482 0.09051
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Assumptions (same as 2-sample \( t \)-test)
Robustness (same as 2-sample \( t \)-test)
Definition: The
Kruskal-Wallis test is a nonparametric method for mulutiple groups based on ranks.
The Kruskal-Wallis test is similar to the Mann-Whitney \( U \)-test and has the same assumptions:
Power of Kruskal-Wallis test is nearly as powerful as ANOVA when sample sizes are large, but has smaller power than ANOVA for small sample sizes.