## Using the built-in data set PlantGrowth data(PlantGrowth) str(PlantGrowth)
## 'data.frame': 30 obs. of 2 variables: ## $ weight: num 4.17 5.58 5.18 6.11 4.5 4.61 5.17 4.53 5.33 5.14 ... ## $ group : Factor w/ 3 levels "ctrl","trt1",..: 1 1 1 1 1 1 1 1 1 1 ...
summary(PlantGrowth)
## weight group ## Min. :3.590 ctrl:10 ## 1st Qu.:4.550 trt1:10 ## Median :5.155 trt2:10 ## Mean :5.073 ## 3rd Qu.:5.530 ## Max. :6.310
Analysis of variance (ANOVA)
Always carry out sanity checks (str
, summary
) and plot your data before you get carried away with statistical modelling. What do you conclude looking at the boxplot below?
Analysis of variance (ANOVA)
If we carry out an ANOVA with the aov
command, we obtain a summary stating an overall P-value for the predictor variable. So, it only indicates whether the predictor variable had a statistically significant effect but it does not tell us where those differences lie, i.e. which of the levels of the predictor variable differ significantly from each other. They could all differ signficantly from each other but a single significant difference between any two of the factor levels is enough to give an overall significance for the effect of the predictor variable.
## Df Sum Sq Mean Sq F value Pr(>F) ## group 2 3.766 1.8832 4.846 0.0159 * ## Residuals 27 10.492 0.3886 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Analysis of variance (ANOVA)
|
Analysis of variance (ANOVA)
An analysis of variance with an explanatory variable that has four levels (e.g. habitat type: estuarine, dryland, wetland, swamp), is called a:
- one-way ANOVA
- four-way ANOVA
- three-way ANOVA using estuarine habitat as reference level
- None of the above
Analysis of variance (ANOVA)
An analysis of variance with an explanatory variable that has four levels (e.g. habitat type: estuarine, dryland, wetland, swamp), is called a:
- one-way ANOVA
- four-way ANOVA
- three-way ANOVA using estuarine habitat as reference level
- None of the above
Analysis of variance (ANOVA)
An analysis of variance with two explanatory variables, each with three levels (e.g. fertiliser: low, med, high; plant species: A, B, C), is called a:
- three-way ANOVA
- six-way ANOVA
- two-way ANOVA
- None of the above
Analysis of variance (ANOVA)
An analysis of variance with two explanatory variables, each with three levels (e.g. fertiliser: low, med, high; plant species: A, B, C), is called a:
- three-way ANOVA
- six-way ANOVA
- two-way ANOVA
- None of the above
Analysis of variance (ANOVA)
Interpretation of a classical one-way ANOVA table associated with the aov
.
Analysis of variance (ANOVA)
We can also run an analysis of variance using the lm
function, which results in the output below. Â
Â
As you can see this output style returns partial pairwise-comparisons. The intercept term gives the mean of the reference group (which is determined alphabetically by default but can be set using factor(x, levels = ...)
!) and the associated P-value tests if its value is significantly different from zero. The remaining parameter estimates give the differences between the means of the respective factor levels and the reference mean. The associated P-values test whether these differences betweenmeans are significantly different from zero. If this is not the case, then obsviously there is no statistically significant difference.
Analysis of variance (ANOVA)
We can quickly check this by calculating the means and their differences. Â
library(dplyr) treat.means <- group_by(PlantGrowth, group) %>% summarise(aver = mean(weight)) %>% mutate(diff = aver - aver[1]) treat.means
## # A tibble: 3 x 3 ## group aver diff ## <fct> <dbl> <dbl> ## 1 ctrl 5.03 0 ## 2 trt1 4.66 -0.371 ## 3 trt2 5.53 0.494
## We could also have computed the desired differences like this: # treat.means$aver[2:3] - treat.means$aver[1]
## [1] -0.371 0.494
Post-hoc analysis (multiple comparison test)
Post-hoc analysis (multiple comparison test)
## Dummy P-values from a post-hoc analysis p <- c(0.04, 0.001, 0.01, 0.021) p * 4
## [1] 0.160 0.004 0.040 0.084
p.adjust(p = p, method = "bonferroni")
## [1] 0.160 0.004 0.040 0.084
Post-hoc analysis (multiple comparison test)
An more modern, alternative approach to controlling the family-wise error rate is to control the false discovery rate, which is based on the expected proportion of Type I errors among the rejected null hypotheses (false positives; Benjamini & Hochberg 1995).
## Dummy P-values from a post-hoc analysis p <- c(0.04, 0.001, 0.01, 0.021) p.adjust(p = p, method = "BH") # Benjamini & Hochberg method
## [1] 0.040 0.004 0.020 0.028
Post-hoc analysis (multiple comparison test)
The contemporary approach is to combine the functionality of the emmeans
and the multcomp
package.
library(emmeans) library(multcomp) m1 <- lm(weight ~ group, data = PlantGrowth) ems <- emmeans(object = m1, specs = "group") pairs(ems)
## contrast estimate SE df t.ratio p.value ## ctrl - trt1 0.371 0.279 27 1.331 0.3909 ## ctrl - trt2 -0.494 0.279 27 -1.772 0.1980 ## trt1 - trt2 -0.865 0.279 27 -3.103 0.0120 ## ## P value adjustment: tukey method for comparing a family of 3 estimates
summary(object = as.glht(pairs(ems)), test = adjusted(type = "BH"))
## ## Simultaneous Tests for General Linear Hypotheses ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## ctrl - trt1 == 0 0.3710 0.2788 1.331 0.1944 ## ctrl - trt2 == 0 -0.4940 0.2788 -1.772 0.1315 ## trt1 - trt2 == 0 -0.8650 0.2788 -3.103 0.0134 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- BH method)
Post-hoc analysis (multiple comparison test)
…continued.
ems <- emmeans(object = m1, specs = "group") ## pairs(ems) summary(object = as.glht(pairs(ems)), test = adjusted(type = "BH"))
## ## Simultaneous Tests for General Linear Hypotheses ## ## Linear Hypotheses: ## Estimate Std. Error t value Pr(>|t|) ## ctrl - trt1 == 0 0.3710 0.2788 1.331 0.1944 ## ctrl - trt2 == 0 -0.4940 0.2788 -1.772 0.1315 ## trt1 - trt2 == 0 -0.8650 0.2788 -3.103 0.0134 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## (Adjusted p values reported -- BH method)
Post-hoc analysis for two-way ANOVA with interaction
m1 <- lm(p ~ loc * horizon, data = soil) anova(m1)
## Analysis of Variance Table ## ## Response: p ## Df Sum Sq Mean Sq F value Pr(>F) ## loc 1 41993 41993 22.0146 2.282e-05 *** ## horizon 2 62683 31342 16.4305 3.665e-06 *** ## loc:horizon 2 17697 8848 4.6386 0.0144 * ## Residuals 48 91561 1908 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Post-hoc analysis for two-way ANOVA with interaction
ems <- emmeans(object = m1, specs = "loc", by = "horizon") summary(object = as.glht(pairs(ems)), test = adjusted(type = "BH"))
Post-hoc analysis for two-way ANOVA with interaction
ems <- emmeans(object = m1, specs = "horizon", by = "loc") summary(object = as.glht(pairs(ems)), test = adjusted(type = "BH"))