This document provides examples of how to apply ANOVA in different statistical settings:
We analyze energy consumption data from four U.S. regions: Northeast, Midwest, South, and West. The goal is to determine whether there are significant differences in mean energy consumption across these regions.
data <- data.frame(
region = rep(c("Northeast", "Midwest", "South", "West"), times = c(5, 6, 4, 5)),
consumption = c(13, 8, 11, 12, 11,
15, 10, 16, 11, 13, 10,
5, 11, 9, 5,
8, 10, 6, 5, 7)
)
anova_result <- aov(consumption ~ region, data = data)
summary(anova_result)
## Df Sum Sq Mean Sq F value Pr(>F)
## region 3 105.9 35.30 6.325 0.00493 **
## Residuals 16 89.3 5.58
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this example, we use the Boston
dataset from the
MASS
package to test whether at least one predictor
significantly explains variability in the response variable.
library(MASS)
library(ISLR2)
attach(Boston)
# Fit a multiple linear regression model
lm.fit2 <- lm(medv ~ lstat + I(lstat^2))
summary(lm.fit2)
##
## Call:
## lm(formula = medv ~ lstat + I(lstat^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.2834 -3.8313 -0.5295 2.3095 25.4148
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.862007 0.872084 49.15 <2e-16 ***
## lstat -2.332821 0.123803 -18.84 <2e-16 ***
## I(lstat^2) 0.043547 0.003745 11.63 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.524 on 503 degrees of freedom
## Multiple R-squared: 0.6407, Adjusted R-squared: 0.6393
## F-statistic: 448.5 on 2 and 503 DF, p-value: < 2.2e-16
medv
).medv
.This example uses an ANOVA test to compare a full model
(lm.fit2
) with a simpler model (lm.fit
).
# Fit a simpler linear regression model
lm.fit <- lm(medv ~ lstat)
# Compare the two models using ANOVA
anova(lm.fit, lm.fit2)
## Analysis of Variance Table
##
## Model 1: medv ~ lstat
## Model 2: medv ~ lstat + I(lstat^2)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 504 19472
## 2 503 15347 1 4125.1 135.2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I(lstat^2)
) in the full model does not
significantly improve the model.