Introduction

This document provides examples of how to apply ANOVA in different statistical settings:

  1. One-way ANOVA test: Comparing the means of multiple groups.
  2. Testing the overall significance of a regression model.
  3. Comparing nested regression models using ANOVA.

Example 1: One-Way ANOVA (Testing Mean Differences Between Groups)

Data Overview

We analyze energy consumption data from four U.S. regions: Northeast, Midwest, South, and West. The goal is to determine whether there are significant differences in mean energy consumption across these regions.

Dataset

data <- data.frame(
  region = rep(c("Northeast", "Midwest", "South", "West"), times = c(5, 6, 4, 5)),
  consumption = c(13, 8, 11, 12, 11,
                  15, 10, 16, 11, 13, 10,
                  5, 11, 9, 5,
                  8, 10, 6, 5, 7)
)

Performing One-Way ANOVA in R

anova_result <- aov(consumption ~ region, data = data)
summary(anova_result)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## region       3  105.9   35.30   6.325 0.00493 **
## Residuals   16   89.3    5.58                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

  • The null hypothesis states that all region means are equal.
  • If the p-value is small (typically < 0.05), we reject the null hypothesis, concluding that at least one region has a significantly different mean energy consumption.

Example 2: Testing Overall Model Significance in Regression

In this example, we use the Boston dataset from the MASS package to test whether at least one predictor significantly explains variability in the response variable.

library(MASS)
library(ISLR2)
attach(Boston)

# Fit a multiple linear regression model
lm.fit2 <- lm(medv ~ lstat + I(lstat^2))
summary(lm.fit2)
## 
## Call:
## lm(formula = medv ~ lstat + I(lstat^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2834  -3.8313  -0.5295   2.3095  25.4148 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 42.862007   0.872084   49.15   <2e-16 ***
## lstat       -2.332821   0.123803  -18.84   <2e-16 ***
## I(lstat^2)   0.043547   0.003745   11.63   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.524 on 503 degrees of freedom
## Multiple R-squared:  0.6407, Adjusted R-squared:  0.6393 
## F-statistic: 448.5 on 2 and 503 DF,  p-value: < 2.2e-16

Interpretation

  • The null hypothesis states that all regression coefficients (except the intercept) are zero.
  • The F-statistic tests whether the model explains a significant proportion of variance in the response variable (medv).
  • A small p-value suggests at least one predictor is significantly related to medv.

Example 3: Comparing Nested Regression Models

This example uses an ANOVA test to compare a full model (lm.fit2) with a simpler model (lm.fit).

# Fit a simpler linear regression model
lm.fit <- lm(medv ~ lstat)

# Compare the two models using ANOVA
anova(lm.fit, lm.fit2)
## Analysis of Variance Table
## 
## Model 1: medv ~ lstat
## Model 2: medv ~ lstat + I(lstat^2)
##   Res.Df   RSS Df Sum of Sq     F    Pr(>F)    
## 1    504 19472                                 
## 2    503 15347  1    4125.1 135.2 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interpretation

  • The null hypothesis states that the additional predictor (I(lstat^2)) in the full model does not significantly improve the model.
  • If the p-value is small, we reject the null hypothesis, indicating the full model provides a significantly better fit.

Conclusion