In this question, I calculated the sample size per group needed to reach power = 0.80 with 4 groups, within-group variance = 3.5, and significance level = 0.05. I looked at three different cases of variability in the group means.
power.anova.test(groups = 4, n = NULL, between.var = var(c(18,19,19,20)), within.var = 3.5, sig.level = 0.05, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 20.08368
## between.var = 0.6666667
## within.var = 3.5
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
power.anova.test(groups = 4, n = NULL, between.var = var(c(18,18.6667,19.3333,20)), within.var = 3.5, sig.level = 0.05, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 18.17901
## between.var = 0.7407259
## within.var = 3.5
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
power.anova.test(groups = 4, n = NULL, between.var = var(c(18,18,20,20)), within.var = 3.5, sig.level = 0.05, power = 0.80)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 10.56952
## between.var = 1.333333
## within.var = 3.5
## sig.level = 0.05
## power = 0.8
##
## NOTE: n is number in each group
Conclusion for Question 1: When the variability between the group means is small, more samples per group are needed to detect differences. When the variability between the means is large, fewer samples are needed because the groups are easier to distinguish.
The experiment collected six lifetimes for each of four fluids. We want to test whether the mean life is the same for all fluids at α = 0.10. If not, we will use Tukey’s test to see which fluids differ.
Fluid1 <- c(17.6,18.9,16.3,17.4,20.1,21.6)
Fluid2 <- c(16.9,15.3,18.6,17.1,19.5,20.3)
Fluid3 <- c(21.4,23.6,19.4,18.5,20.5,22.3)
Fluid4 <- c(19.3,21.1,16.9,17.5,18.3,19.8)
dat <- data.frame(Fluid1,Fluid2,Fluid3,Fluid4)
dat # data is not tidy
## Fluid1 Fluid2 Fluid3 Fluid4
## 1 17.6 16.9 21.4 19.3
## 2 18.9 15.3 23.6 21.1
## 3 16.3 18.6 19.4 16.9
## 4 17.4 17.1 18.5 17.5
## 5 20.1 19.5 20.5 18.3
## 6 21.6 20.3 22.3 19.8
library(tidyr)
dat <- pivot_longer(dat, c(Fluid1,Fluid2,Fluid3,Fluid4))
dat # now data is tidy
## # A tibble: 24 × 2
## name value
## <chr> <dbl>
## 1 Fluid1 17.6
## 2 Fluid2 16.9
## 3 Fluid3 21.4
## 4 Fluid4 19.3
## 5 Fluid1 18.9
## 6 Fluid2 15.3
## 7 Fluid3 23.6
## 8 Fluid4 21.1
## 9 Fluid1 16.3
## 10 Fluid2 18.6
## # ℹ 14 more rows
aov.model <- aov(value ~ name, data=dat)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.17 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
H0: The mean life of the fluids is the same
Ha: At least one fluid mean is different
The ANOVA output shows the p-value is less than 0.10, so we reject H0.
Conclusion: There are significant differences among the fluid means.
plot(aov.model)
(Residuals vs Fitted, Q-Q, Scale-Location, Leverage) show that residuals are roughly normal and variances are fairly constant.
TukeyHSD(aov.model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = value ~ name, data = dat)
##
## $name
## diff lwr upr p adj
## Fluid2-Fluid1 -0.7000000 -3.63540073 2.2354007 0.9080815
## Fluid3-Fluid1 2.3000000 -0.63540073 5.2354007 0.1593262
## Fluid4-Fluid1 0.1666667 -2.76873407 3.1020674 0.9985213
## Fluid3-Fluid2 3.0000000 0.06459927 5.9354007 0.0440578
## Fluid4-Fluid2 0.8666667 -2.06873407 3.8020674 0.8413288
## Fluid4-Fluid3 -2.1333333 -5.06873407 0.8020674 0.2090635
plot(TukeyHSD(aov.model))
The plot of confidence intervals confirms that Fluid 3 has significantly higher mean life than Fluid 2.
At the 10% significance level, we reject H0 and conclude not all fluids have the same mean life. The ANOVA model assumptions are adequate. Tukey’s test indicates that certain fluids (e.g., Fluid 3 vs Fluid 2) differ significantly in their mean life.