- ANOVA: ANalysis of VAriance
- we are testing for the variance in means between 3+ groups
- uses the test statistic F
2026-04-14
The 4 assumptions that need met:
\[F = \frac{SS_{Factor}/(K-1)}{SS_{Error}/(N-K)}\] - F significance is the p-value
Suppose we are interested in determining whether the average test scores of the same test differ between different classes. We will look at five different class sections Class 1, Class 2, Class 3, Class 4 and Class 5. The null hypothesis is: \[H_0: \mu_{Class 1} = \mu_{Class 2} = \mu_{Class 3} = \mu_{Class 4} = \mu_{Class 5}\] The alternate hypothesis is: \[H_a: at\ least\ one\ \mu\ is\ different\] - we will be using a significance level of 0.05
columns = class[, c("Class.1", "Class.2", "Class.3", "Class.4", "Class.5")]
means = sapply(columns, function(x) mean(x, na.rm = TRUE))
standarddeviation = sapply(columns, function(x) sd(x, na.rm = TRUE))
samplesize = sapply(columns, function(x) sum(!is.na(x)))
library(plotly)
plot_ly(
type = 'table',
header = list(values = c("Class", "Means", "Standard Deviation", "Sample Size")), cells = list(values = list(c("Class.1", "Class.2", "Class.3", "Class.4", "Class.5"), round(means, 2), round(standarddeviation, 2), samplesize))
)
## Df Sum Sq Mean Sq F value Pr(>F) ## Class 4 2587 646.68 0.759 0.5536 ## Residuals 149 126955 852.05
Because the p-value(0.5536) is larger than 0.05 we are able to conclude that there is not enough evidence to conclude that the means of test scores are different between the classes.