Introduction

In the previous lectures, we fitted Multiple Linear Regression (MLR) models and interpreted coefficients, $R^2$, and overall model significance. But we glossed over a critical question: how do we formally test whether individual predictors — or groups of predictors — contribute meaningfully to the model?

This lecture focuses on hypothesis testing within the MLR framework. We will cover:

Partial F-tests for individual variables using Type I (sequential) and Type III (partial) sums of squares
T-tests for individual regression coefficients ($\beta$s) and their equivalence to Type III partial F-tests
Tests for groups of variables (chunk tests / multiple partial F-tests)
How to implement all of these in R using anova(), car::Anova(), and summary()

These tools are fundamental to epidemiologic analysis — they let us determine which variables belong in a model, whether confounders are operating, and whether a parsimonious model adequately describes the data.

Setup and Data

library(tidyverse)
library(haven)
library(janitor)
library(knitr)
library(kableExtra)
library(plotly)
library(broom)
library(ggeffects)
library(ggstats)
library(gtsummary)
library(GGally)
library(car)
library(lmtest)
library(corrplot)

The BRFSS 2020 Dataset

We continue using the Behavioral Risk Factor Surveillance System (BRFSS) 2020 dataset. Our research question remains:

What individual, behavioral, and socioeconomic factors are associated with the number of mentally unhealthy days in the past 30 days among U.S. adults?

brfss_mlr <- readRDS("brfss_mlr.rds")

glimpse(brfss_mlr)

## Rows: 5,000
## Columns: 9
## $ menthlth_days <dbl> 15, 0, 0, 20, 0, 7, 0, 0, 0, 0, 15, 0, 0, 0, 10, 0, 0, 0…
## $ physhlth_days <dbl> 0, 30, 0, 0, 0, 0, 15, 0, 0, 0, 30, 0, 28, 0, 1, 0, 0, 0…
## $ sleep_hrs     <dbl> 6, 8, 8, 8, 5, 6, 7, 7, 7, 7, 6, 7, 6, 6, 6, 6, 6, 8, 4,…
## $ age           <dbl> 27, 70, 74, 71, 29, 28, 76, 43, 67, 47, 60, 31, 60, 74, …
## $ income_cat    <dbl> 8, 5, 1, 6, 8, 4, 8, 8, 1, 1, 8, 7, 8, 6, 1, 3, 5, 6, 4,…
## $ income_f      <fct> >$75k, $25-35k, <$10k, $35-50k, >$75k, $20-25k, >$75k, >…
## $ sex           <fct> Male, Female, Male, Female, Male, Female, Male, Male, Ma…
## $ exercise      <fct> Yes, No, Yes, Yes, Yes, Yes, No, Yes, No, Yes, No, Yes, …
## $ bmi           <dbl> 27.89, 23.81, 25.54, 33.61, 29.03, 28.19, 35.26, 29.38, …

brfss_mlr %>%
  select(menthlth_days, physhlth_days, sleep_hrs, age, income_cat, sex, exercise) %>%
  tbl_summary(
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    label = list(
      menthlth_days ~ "Mentally Unhealthy Days (past 30)",
      physhlth_days ~ "Physically Unhealthy Days (past 30)",
      sleep_hrs     ~ "Sleep Hours per Night",
      age           ~ "Age (years)",
      income_cat    ~ "Income Category (1–8)",
      sex           ~ "Sex",
      exercise      ~ "Any Exercise (past 30 days)"
    )
  ) %>%
  bold_labels()

Characteristic	N = 5,000¹
Mentally Unhealthy Days (past 30)	4 (8)
Physically Unhealthy Days (past 30)	3 (8)
Sleep Hours per Night	7.06 (1.35)
Age (years)	54 (17)
Income Category (1–8)
1	190 (3.8%)
2	169 (3.4%)
3	312 (6.2%)
4	434 (8.7%)
5	489 (9.8%)
6	683 (14%)
7	841 (17%)
8	1,882 (38%)
Sex
Male	2,331 (47%)
Female	2,669 (53%)
Any Exercise (past 30 days)	3,874 (77%)
¹ Mean (SD); n (%)

Fitting the Full Model

We will work with the full multiple regression model from the previous lecture:

\[\text{MentalHealthDays}_i = \beta_0 + \beta_1 \cdot \text{PhysHealth}_i + \beta_2 \cdot \text{Sleep}_i + \beta_3 \cdot \text{Age}_i + \beta_4 \cdot \text{Income}_i + \beta_5 \cdot \text{Sex}_i + \beta_6 \cdot \text{Exercise}_i + \varepsilon_i\]

m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

tidy(m_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	p-value	95% CI Lower	95% CI Upper
(Intercept)	12.4755	0.7170	17.4006	0.0000	11.0699	13.8810
physhlth_days	0.2917	0.0136	21.4779	0.0000	0.2650	0.3183
sleep_hrs	-0.5092	0.0753	-6.7574	0.0000	-0.6569	-0.3614
age	-0.0823	0.0059	-13.8724	0.0000	-0.0939	-0.0707
income_cat	-0.3213	0.0520	-6.1778	0.0000	-0.4233	-0.2194
sexFemale	1.2451	0.2023	6.1535	0.0000	0.8484	1.6417
exerciseYes	-0.3427	0.2531	-1.3537	0.1759	-0.8389	0.1536

glance(m_full) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 2. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Overall Model Summary
Metric	Value
R²	0.1569
Adjusted R²	0.1559
Root MSE (σ̂)	7.090

Part 1: Guided Practice

1. Review: The Overall F-Test

Before we test individual predictors, let’s recall the overall F-test, which tests whether any of the predictors in the model are useful:

\[H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 \quad \text{(no predictor is useful)}\] \[H_1: \text{At least one } \beta_j \neq 0\]

The test statistic is:

\[F = \frac{SSR/p}{SSE/(n - p - 1)} = \frac{MSR}{MSE}\]

where $SSR$ is the regression (model) sum of squares, $SSE$ is the error (residual) sum of squares, $p$ is the number of predictors, and $n$ is the sample size.

# Extract the overall F-test from summary
f_stat <- summary(m_full)$fstatistic
cat("F-statistic:", round(f_stat[1], 2), "\n")

## F-statistic: 154.9

cat("Numerator df (p):", f_stat[2], "\n")

## Numerator df (p): 6

cat("Denominator df (n-p-1):", f_stat[3], "\n")

## Denominator df (n-p-1): 4993

cat("p-value:", format.pval(pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Conclusion: We reject $H_0$ and conclude that at least one predictor is significantly associated with mental health days. But which ones? That’s what the rest of this lecture addresses.

2. Type I Sums of Squares (Sequential)

2.1 Concept

Type I sums of squares are sequential — each variable is tested for its contribution to the model given the variables that were entered before it, but not the variables entered after it.

For a model with three predictors entered in order $X_1, X_2, X_3$:

Variable	Type I SS	What It Tests
$X_1$	$SS(X_1)$	Effect of $X_1$ alone
$X_2$	$SS(X_2 \mid X_1)$	Additional effect of $X_2$ given $X_1$ is in the model
$X_3$	$SS(X_3 \mid X_1, X_2)$	Additional effect of $X_3$ given $X_1$ and $X_2$ are in the model

Key property: Type I SS depend on the order in which variables are entered. Changing the order changes the Type I SS for all but the last variable.

Key property: The Type I SS always add up to the total regression SS:

\[SSR(\text{full}) = SS(X_1) + SS(X_2 \mid X_1) + SS(X_3 \mid X_1, X_2)\]

2.2 Computing Type I SS in R

In R, the anova() function on a single model produces Type I (sequential) sums of squares:

anova_type1 <- anova(m_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 3. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 3. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	586.34	0.00
sleep_hrs	1	3323.04	3323.04	66.11	0.00
age	1	9261.47	9261.47	184.24	0.00
income_cat	1	2648.76	2648.76	52.69	0.00
sex	1	1918.39	1918.39	38.16	0.00
exercise	1	92.12	92.12	1.83	0.18
Residuals	4993	250992.80	50.27	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

2.3 Interpreting the Type I ANOVA Table

Let’s verify that the sequential sums of squares add up to the total model SS:

type1_ss <- anova_type1$`Sum Sq`
ssr_full <- sum(type1_ss[1:6])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[7]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 46718.54

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 250992.8

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1569

Important note: The F-statistics and p-values reported by anova() divide each Type I SS by the MSE of the full model. For the partial F-test of the last variable entered, this is correct. But for partial F-tests of variables entered earlier using Type I SS, the proper MSE should come from the reduced model (see Section 4).

3. Type III Sums of Squares (Partial)

3.1 Concept

Type III sums of squares are partial — each variable is tested for its contribution to the model given all other variables are already in the model.

For a model with three predictors:

Variable	Type III SS	What It Tests
$X_1$	$SS(X_1 \mid X_2, X_3)$	Effect of $X_1$ given $X_2$ and $X_3$ are in the model
$X_2$	$SS(X_2 \mid X_1, X_3)$	Effect of $X_2$ given $X_1$ and $X_3$ are in the model
$X_3$	$SS(X_3 \mid X_1, X_2)$	Effect of $X_3$ given $X_1$ and $X_2$ are in the model

Key property: Type III SS do not depend on the order of variable entry. They are the same regardless of how you specify the model formula.

Key property: Type III SS generally do not add up to the total SSR (unless predictors are uncorrelated).

Key property: The Type I SS and Type III SS are equal for the last variable entered in the model — because by the time we reach the last variable, all other variables are already in the model.

3.2 Comparing Type I and Type III

# Type III using car::Anova()
anova_type3 <- Anova(m_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age", "income_cat", "sex", "exercise"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:6], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:7], 1)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	23189.1
sleep_hrs	3323.0	2295.4
age	9261.5	9673.9
income_cat	2648.8	1918.5
sex	1918.4	1903.5
exercise	92.1	92.1
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

Notice: The Type I and Type III SS for exercise (the last variable) are identical. The values for other variables differ because Type I SS does not adjust for variables entered later in the model.

3.3 Type III SS in R

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 5. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 5. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15220.3947	1	302.7793	0.0000
physhlth_days	23189.1197	1	461.3012	0.0000
sleep_hrs	2295.4257	1	45.6629	0.0000
age	9673.9387	1	192.4437	0.0000
income_cat	1918.5269	1	38.1653	0.0000
sex	1903.4532	1	37.8654	0.0000
exercise	92.1241	1	1.8326	0.1759
Residuals	250992.8036	4993	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

4. Partial F-Tests for Individual Variables

4.1 The General Framework

Suppose we want to test whether a single variable $X^*$ contributes to the prediction of $Y$ given that $X_1, \ldots, X_p$ are already in the model.

\[H_0: \beta^* = 0 \quad \text{($X^*$ does not add to the model given the other variables)}\] \[H_1: \beta^* \neq 0\]

The partial F-test compares a full model (containing $X^*$) to a reduced model (excluding $X^*$):

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})} = \frac{SS(X^* \mid X_1, \ldots, X_p) / 1}{MSE(\text{full})}\]

Under $H_0$, $F \sim F_{1, \, n-p-2}$.

4.2 Example: Testing Whether `sleep_hrs` Adds to the Model

Let’s test: does sleep_hrs contribute to predicting mental health days, given that all other predictors are in the model?

# Reduced model (excludes sleep_hrs)
m_no_sleep <- lm(menthlth_days ~ physhlth_days + age + income_cat + sex + exercise,
                 data = brfss_mlr)

# Full model (includes sleep_hrs)
m_full <- lm(menthlth_days ~ physhlth_days + age + income_cat + sex + exercise + sleep_hrs,
             data = brfss_mlr)

# Compare using anova() — this performs the partial F-test
anova(m_no_sleep, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no sleep_hrs)", "Full (with sleep_hrs)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 6. Partial F-Test: Does sleep_hrs add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Partial F-Test: Does sleep_hrs add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no sleep_hrs)	4994	253288.2	NA	NA	NA	NA
Full (with sleep_hrs)	4993	250992.8	1	2295.426	45.6629	0

Manual computation: We can also compute this by hand:

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_no_sleep)$`Sum Sq`[1:5])
mse_full    <- anova(m_full)$`Mean Sq`[7]  # MSE of full model

F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 44423.11

cat("SS(sleep | others):", round(ssr_full - ssr_reduced, 2), "\n")

## SS(sleep | others): 2295.43

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_stat, 4), "\n")

## F-statistic: 45.6629

cat("Critical value F(1, 4993, 0.95):", round(qf(0.95, 1, 4993), 4), "\n")

## Critical value F(1, 4993, 0.95): 3.8433

cat("p-value:", format.pval(pf(F_stat, 1, 4993, lower.tail = FALSE)), "\n")

## p-value: 1.5652e-11

Conclusion: Since the p-value < 0.05, we reject $H_0$ and conclude that sleep hours adds statistically significant information to the prediction of mental health days, given that physical health, age, income, sex, and exercise are already in the model.

4.3 Example: Testing Whether `exercise` Adds to the Model

m_no_exercise <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex,
                    data = brfss_mlr)

anova(m_no_exercise, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no exercise)", "Full (with exercise)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 7. Partial F-Test: Does exercise add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 7. Partial F-Test: Does exercise add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no exercise)	4994	251084.9	NA	NA	NA	NA
Full (with exercise)	4993	250992.8	1	92.1241	1.8326	0.1759

Conclusion: With p > 0.05, we fail to reject $H_0$. Exercise does not add statistically significant predictive information beyond what is already captured by the other predictors. This could be because the effect of exercise on mental health is partially mediated through physical health or sleep, which are already in the model.

5. Using Type I SS for Partial F-Tests

5.1 When Type I SS Are Correct

The Type I SS can be used directly for partial F-tests if the variable of interest is the last one entered (or if you only want to test variables conditional on those entered before them in the sequence).

From our Type I ANOVA table:

cat("Type I SS for exercise (entered last):", round(anova_type1$`Sum Sq`[6], 2), "\n")

## Type I SS for exercise (entered last): 92.12

cat("Type III SS for exercise:", round(anova_type3$`Sum Sq`[7], 2), "\n")

## Type III SS for exercise: 92.12

cat("These should be equal:",
    all.equal(anova_type1$`Sum Sq`[6], anova_type3$`Sum Sq`[7]), "\n")

## These should be equal: TRUE

5.2 Important Caveat

The F-statistics in the anova() output divide each Type I SS by the MSE of the full model. For a rigorous Type I SS partial F-test for variables entered earlier, the denominator should technically be the MSE from a model containing only the variables up to and including the one being tested. In practice, with large samples, this distinction matters little, and most analysts use the Type III approach instead.

6. Using Type III SS for Partial F-Tests

6.1 The Preferred Approach

Type III SS are generally preferred in epidemiologic analysis because they answer the question we most often care about: does this variable contribute to the model given all other variables?

The car::Anova() function with type = "III" provides the correct F-tests and p-values:

Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  filter(Source != "(Intercept)") %>%
  mutate(
    Significant = ifelse(`Pr(>F)` < 0.05, "Yes", "No"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 8. Type III Partial F-Tests for Each Predictor",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value", "Significant (α = 0.05)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  column_spec(6, bold = TRUE)

Table 8. Type III Partial F-Tests for Each Predictor
Source	Sum of Sq	df	F value	p-value	Significant (α = 0.05)
physhlth_days	23189.1197	1	461.3012	0.0000	Yes
age	9673.9387	1	192.4437	0.0000	Yes
income_cat	1918.5269	1	38.1653	0.0000	Yes
sex	1903.4532	1	37.8654	0.0000	Yes
exercise	92.1241	1	1.8326	0.1759	No
sleep_hrs	2295.4257	1	45.6629	0.0000	Yes
Residuals	250992.8036	4993	NA	NA	NA

6.2 Interpreting Each Test

Each row in the Type III table tests:

\[H_0: \beta_j = 0 \quad \text{given all other predictors are in the model}\]

For our model:

physhlth_days: Significant — physical health adds to the prediction of mental health days, controlling for sleep, age, income, sex, and exercise.
sleep_hrs: Significant — sleep adds unique information beyond physical health, age, income, sex, and exercise.
age: Significant — age contributes independently to mental health days prediction.
income_cat: Significant — income is independently associated with mental health.
sex: Significant — sex differences in mental health persist after adjusting for all other covariates.
exercise: Not significant — exercise does not add information beyond what the other predictors already explain.

7. T-Tests for Individual $\beta$s

7.1 Concept

We can also conduct t-tests for individual regression coefficients. The test for $\beta_j$ is:

\[H_0: \beta_j = 0 \qquad H_1: \beta_j \neq 0\]

\[t = \frac{\hat{\beta}_j}{se(\hat{\beta}_j)} \sim t_{n-p-2} \quad \text{under } H_0\]

We reject $H_0$ if $|t| > t_{n-p-2, 1-\alpha/2}$.

7.2 Equivalence to Type III Partial F-Tests

A fundamental result in regression theory: the t-test for $\beta_j$ is exactly equivalent to the Type III partial F-test for $X_j$.

If $t \sim t_k$, then $t^2 \sim F_{1,k}$.

Therefore: \[t^2_j = F_j \quad \text{(Type III partial F-test)}\]

Both tests produce the same p-value.

7.3 Demonstrating the Equivalence

# Get t-statistics from summary()
t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

# Get F-statistics from Type III
f_stats <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Compare
left_join(t_stats, f_stats, by = "term") %>%
  mutate(
    `t²` = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, t_stat = t_stat, `t²`, `F (Type III)` = f_stat, `t² = F?`,
         `p (t-test)` = t_pvalue, `p (F-test)` = f_pvalue, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4)) %>%
  kable(caption = "Table 9. Equivalence of T-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 9. Equivalence of T-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p (t-test)	p (F-test)	p-values equal?
physhlth_days	21.4779	461.3012	461.3012	✓	0.000000	0	✓
age	-13.8724	192.4437	192.4437	✓	0.000000	0	✓
income_cat	-6.1778	38.1653	38.1653	✓	0.000000	0	✓
sexFemale	6.1535	37.8654	NA	NA	0.000000	NA	NA
exerciseYes	-1.3537	1.8326	NA	NA	0.175879	NA	NA
sleep_hrs	-6.7574	45.6629	45.6629	✓	0.000000	0	✓

Takeaway: summary() gives you t-tests; car::Anova(type = "III") gives you the equivalent F-tests. Both answer the same question: “Does this variable add to the model given all other variables?” You can use either one in practice.

8. Tests for Groups of Variables (Chunk Tests)

8.1 Concept

Sometimes we want to test whether a group of variables collectively adds to the model. This is called a chunk test or multiple partial F-test.

\[H_0: \beta_{k+1} = \beta_{k+2} = \cdots = \beta_p = 0 \quad \text{(the group adds nothing)}\]

The test statistic compares a full model to a reduced model:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

\[F \sim F_{(p_{\text{full}} - p_{\text{reduced}}), \, (n - p_{\text{full}} - 1)} \quad \text{under } H_0\]

8.2 Example: Do Demographic Variables Add to the Model?

Suppose we want to test whether adding age, income_cat, sex, and exercise as a group improves prediction beyond physhlth_days and sleep_hrs alone.

\[H_0: \beta_{\text{age}} = \beta_{\text{income}} = \beta_{\text{sex}} = \beta_{\text{exercise}} = 0\]

# Reduced model: only health behaviors
m_health_only <- lm(menthlth_days ~ physhlth_days + sleep_hrs, data = brfss_mlr)

# Full model: health behaviors + demographics
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

# Chunk test
anova(m_health_only, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep only)", "Full (+ demographics)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 10. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 10. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep only)	4997	264913.6	NA	NA	NA	NA
Full (+ demographics)	4993	250992.8	4	13920.75	69.2314	0

Manual computation:

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_health_only)$`Sum Sq`[1:2])
mse_full    <- anova(m_full)$`Mean Sq`[7]
df_diff     <- 4  # 4 additional variables

F_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("Difference:", round(ssr_full - ssr_reduced, 2), "\n")

## Difference: 13920.75

cat("df (number of added variables):", df_diff, "\n")

## df (number of added variables): 4

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk, 4), "\n")

## F-statistic: 69.2314

cat("Critical value F(4, 4993, 0.95):", round(qf(0.95, df_diff, 4993), 4), "\n")

## Critical value F(4, 4993, 0.95): 2.3737

cat("p-value:", format.pval(pf(F_chunk, df_diff, 4993, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Conclusion: The group of demographic variables (age, income, sex, exercise) collectively adds statistically significant information to the model, even though exercise alone was not significant. This is important — individual non-significance does not mean the variable isn’t useful as part of a group.

8.3 Example: Do Sleep and Age Together Add Beyond Physical Health?

m_phys_only <- lm(menthlth_days ~ physhlth_days, data = brfss_mlr)
m_phys_sleep_age <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr)

anova(m_phys_only, m_phys_sleep_age) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth only)", "Full (+ sleep + age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 11. Chunk Test: Do sleep_hrs and age together add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 11. Chunk Test: Do sleep_hrs and age together add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth only)	4998	268236.6	NA	NA	NA	NA
Full (+ sleep + age)	4996	255652.1	2	12584.51	122.9645	0

8.4 Important Note on Type I SS and Chunk Tests

For a chunk test using Type I SS, the variables being tested must be entered sequentially at the end of the model formula. This is because Type I SS for a group of consecutive variables at the end sum to the correct chunk test numerator.

Type III SS cannot be used for chunk tests directly, because each Type III SS adjusts for all variables (including others in the group being tested).

9. The Order of Variable Entry Matters (for Type I)

To illustrate how Type I SS change with variable ordering, let’s fit the same model with a different variable order:

# Original order
m_order1 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
               data = brfss_mlr)

# Reversed order
m_order2 <- lm(menthlth_days ~ exercise + sex + income_cat + age + sleep_hrs + physhlth_days,
               data = brfss_mlr)

# Compare Type I SS
type1_order1 <- anova(m_order1) %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(Variable != "Residuals") %>%
  select(Variable, `Type I SS (Order 1)` = `Sum Sq`)

type1_order2 <- anova(m_order2) %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(Variable != "Residuals") %>%
  select(Variable, `Type I SS (Order 2)` = `Sum Sq`)

# Also get Type III for comparison (order-invariant)
type3_all <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(!Variable %in% c("(Intercept)", "Residuals")) %>%
  select(Variable, `Type III SS` = `Sum Sq`)

# Combine
left_join(type1_order1, type1_order2, by = "Variable") %>%
  left_join(type3_all, by = "Variable") %>%
  mutate(across(where(is.numeric), ~ round(., 1))) %>%
  kable(
    caption = "Table 12. How Variable Entry Order Affects Type I SS (Type III is invariant)"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 12. How Variable Entry Order Affects Type I SS (Type III is invariant)
Variable	Type I SS (Order 1)	Type I SS (Order 2)	Type III SS
physhlth_days	29474.7	23189.1	23189.1
sleep_hrs	3323.0	4290.4	2295.4
age	9261.5	8701.0	9673.9
income_cat	2648.8	5843.8	1918.5
sex	1918.4	2372.2	1903.5
exercise	92.1	2322.0	92.1

Takeaway: Type I SS change depending on entry order; Type III SS do not. In epidemiologic analysis, Type III is generally preferred because we want to know the effect of each variable after adjusting for all others.

10. Summary of Hypothesis Testing Approaches

tribble(
  ~Approach, ~`R Function`, ~`What It Tests`, ~`Order-Dependent?`, ~`Use For`,
  "Overall F-test", "summary(model)", "All βs = 0 simultaneously", "No", "Does the model explain anything?",
  "Type I SS (sequential)", "anova(model)", "Each variable given predecessors", "Yes", "Sequential variable importance",
  "Type III SS (partial)", "car::Anova(model, type='III')", "Each variable given ALL others", "No", "Adjusted variable significance",
  "t-test for βj", "summary(model)", "Single βj = 0 given all others", "No", "Individual coefficient testing",
  "Chunk test", "anova(reduced, full)", "Group of βs = 0 simultaneously", "Depends on approach", "Testing groups of variables"
) %>%
  kable(caption = "Table 13. Summary of Hypothesis Testing Methods in MLR") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 13. Summary of Hypothesis Testing Methods in MLR
Approach	R Function	What It Tests	Order-Dependent?	Use For
Overall F-test	summary(model)	All βs = 0 simultaneously	No	Does the model explain anything?
Type I SS (sequential)	anova(model)	Each variable given predecessors	Yes	Sequential variable importance
Type III SS (partial)	car::Anova(model, type=‘III’)	Each variable given ALL others	No	Adjusted variable significance
t-test for βj	summary(model)	Single βj = 0 given all others	No	Individual coefficient testing
Chunk test	anova(reduced, full)	Group of βs = 0 simultaneously	Depends on approach	Testing groups of variables

Part 2: In-Class Lab Activity (Pair-Based)

EPI 553 — Tests of Hypotheses Lab Due: End of class

Instructions

This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:

Driver: Types the code and writes answers in the .Rmd file
Navigator: Reviews the code, checks logic, suggests improvements, and helps interpret results

Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.

Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.

Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

# Load packages and data
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(car)
library(gtsummary)

brfss_mlr <- readRDS("brfss_mlr.rds")

Grading Rubric

Task	Description	Points
Task 1	Fitting Models and ANOVA Tables	15
Task 2	Type I vs. Type III Sums of Squares	15
Task 3	Partial F-Tests for Individual Variables	20
Switch roles
Task 4	T-Tests and the F-Test Equivalence	15
Task 5	Chunk Test (Testing Groups of Variables)	20
Task 6	Synthesis and Interpretation	15
Total		100

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
             data = brfss_mlr)

tidy(m_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

glance(m_full) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 2. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Overall Model Summary
Metric	Value
R²	0.1413
Adjusted R²	0.1408
Root MSE (σ̂)	7.153

1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

anova_type1 <- anova(m_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 3. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 3. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	576.00	0
sleep_hrs	1	3323.04	3323.04	64.94	0
age	1	9261.47	9261.47	180.99	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

type1_ss <- anova_type1$`Sum Sq`
ssr_full <- sum(type1_ss[1:3])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[4]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 42059.26

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 255652.1

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1413

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares.

# Type III using car::Anova()
anova_type3 <- Anova(m_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:4], 1)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	30012.3
sleep_hrs	3323.0	2271.8
age	9261.5	9261.5
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 5. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 5. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15643.372	1	305.7057	0
physhlth_days	30012.259	1	586.5051	0
sleep_hrs	2271.809	1	44.3961	0
age	9261.475	1	180.9894	0
Residuals	255652.080	4996	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why? From the table, age in both types are the same but not in physical days and sleep hours. This shows that the data given for the variables are equal and balanced. —

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b.

m_full <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days,
             data = brfss_mlr)

tidy(m_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 6. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439

glance(m_full) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 7. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 7. Overall Model Summary
Metric	Value
R²	0.1413
Adjusted R²	0.1408
Root MSE (σ̂)	7.153

anova_type1 <- anova(m_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 8. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 8. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
age	1	7249.10	7249.10	141.66	0
sleep_hrs	1	4797.91	4797.91	93.76	0
physhlth_days	1	30012.26	30012.26	586.51	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

type1_ss <- anova_type1$`Sum Sq`
ssr_full <- sum(type1_ss[1:3])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[4]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 42059.26

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 255652.1

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1413

Which values changed and which stayed the same? All of the predictors changed but the R^2 stayed the same.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model.

# Type III using car::Anova()
anova_type3 <- Anova(m_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:4], 1)
)

comparison %>%
  kable(caption = "Table 9. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 9. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	7249.1	9261.5
sleep_hrs	4797.9	2271.8
age	30012.3	30012.3
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 10. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 10. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15643.372	1	305.7057	0
age	9261.475	1	180.9894	0
sleep_hrs	2271.809	1	44.3961	0
physhlth_days	30012.259	1	586.5051	0
Residuals	255652.080	4996	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

Did the Type III SS change compared to Task 1c? Explain why or why not. Yes, Type III SS is different compared to Type I SS, there is significant data for psychical days for type I compared to III. Sleep is different for both type I and type III. Age is the same in type I and III.

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters. Epidemiologist should prefer Type III SS over Type I SS because we are using complex models to determine the poor mental health days. Health professionals will use this for certain types of data as well.

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

Fitting a reduced model (without age)
Fitting the full model (with age)
Using anova(reduced, full) to compare them

m_reduced <- lm(menthlth_days ~ physhlth_days + sleep_hrs,
                 data = brfss_mlr)

# Full model (includes physhlth_days + sleep_hrs)
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
             data = brfss_mlr)

# Compare using anova() — this performs the partial F-test
anova(m_reduced, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no age)", "Full (with age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 11. Partial F-Test: Does age add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 11. Partial F-Test: Does age add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no age)	4997	264913.6	NA	NA	NA	NA
Full (with age)	4996	255652.1	1	9261.475	180.9894	0

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at $\alpha = 0.05$.

H0: Mental health days not associated with age H1: Mental health days associated with age

F-statistic: 180.9894
P-value: 0

Conclusion: Reject the null hypothesis. The p-value =0, therefore it is statistically signficiant. This shows mental health days is associated with age.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$ from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth_days}, \text{sleep_hrs}) / 1}{MSE(\text{full model})}\]

# Perform ANOVA on both models
anova_full <- anova(m_full)
anova_reduced <- anova(m_reduced)

# Sum of squares for the full and reduced models
ssr_full <- sum(anova_full$`Sum Sq`[1:3])   # Full model: sum of first three terms
ssr_reduced <- sum(anova_reduced$`Sum Sq`[1:2])  # Reduced model: sum of first two terms

# MSE (Mean Squared Error) for the full model
mse_full <- anova_full$`Mean Sq`[length(anova_full$`Mean Sq`)]  # Last Mean Sq value in full model

# F-statistic for comparing the models
F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full  # 1 is the difference in df between models (testing age)

# Print the results
cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 42059.26

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("SS(age | others):", round(ssr_full - ssr_reduced, 2), "\n")

## SS(age | others): 9261.47

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 51.17

cat("F-statistic:", round(F_stat, 4), "\n")

## F-statistic: 180.9894

# Critical value F(1, 4996, 0.95)
critical_value <- qf(0.95, 1, 4996)
cat("Critical value F(1, 4996, 0.95):", round(critical_value, 4), "\n")

## Critical value F(1, 4996, 0.95): 3.8433

# p-value for the F-statistic
p_value <- pf(F_stat, 1, 4996, lower.tail = FALSE)
cat("p-value:", format.pval(p_value), "\n")

## p-value: < 2.22e-16

Compare to the critical value $F_{1, n-p-1, 0.95}$. Does your manual calculation agree with the anova() comparison from 3a? Yes, the manual calculation agrees with the anova comparison of 3a. —

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

# Get t-statistics from summary()
t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)
t_stats %>%
  kable(
    caption = "Table 12. t-Statistics and p-values for Each Coefficient",
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 12. t-Statistics and p-values for Each Coefficient
term	t_stat	t_pvalue
physhlth_days	24.21787	0
sleep_hrs	-6.66304	0
age	-13.45323	0

4b. (5 pts) For each predictor, compute $t^2$ and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, $t^2$, the Type III F-statistic, and both p-values.

# Get t-statistics from the model (exclude intercept)
t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

# Get Type III F-statistics
f_stats <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Combine t² and F-statistics into one table
comparison <- left_join(t_stats, f_stats, by = "term") %>%
  mutate(
    t_sq = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗"),
    t_stat = round(t_stat, 4)
  ) %>%
  select(
    Predictor = term,
    `t-statistic` = t_stat,
    `t²` = t_sq,
    `F (Type III)` = f_stat,
    `t² = F?`,
    `p (t-test)` = t_pvalue,
    `p (F-test)` = f_pvalue,
    `p-values equal?`
  )

# Display the table
comparison %>%
  kable(
    caption = "Table 13. Equivalence of T-Tests and Type III Partial F-Tests",
    align = "c"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 13. Equivalence of T-Tests and Type III Partial F-Tests
Predictor	t-statistic	t²	F (Type III)	t² = F?	p-values equal?
physhlth_days	24.2179	586.5051	586.5051	✓	✓
sleep_hrs	-6.6630	44.3961	44.3961	✓	✓
age	-13.4532	180.9894	180.9894	✓	✓

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true?

The t-test and the Type III partial F test have the same results because it measures the same coefficients. The T-distribution and F-distribution include continuous probability for the given exposures and outcome.

Task 5: Chunk Test — Testing Groups of Variables (20 points)

5a. (10 pts) Now consider the full 6-predictor model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\] Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

# Reduced model: only health behaviors
m_reduced <- lm(menthlth_days ~ physhlth_days + sleep_hrs, data = brfss_mlr)

# Full model: health behaviors + demographics
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

# Chunk test
anova(m_health_only, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep, age)", "Full (+ income_cat, sex, and exercise)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 14. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 14. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep, age)	4997	264913.6	NA	NA	NA	NA
Full (+ income_cat, sex, and exercise)	4993	250992.8	4	13920.75	69.2314	0

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

H0: Mental health days not associated with reduced variables H1: Mental health days associated with reduced variables

Conclusion: Reject null hypothesis. With the full demographics, the p-value is 0 but the reduced variables of income, sex and exercise is NA. This means it is not statistically significant. That mental health days are not associated with reduced variables.

5b. (5 pts) Compute the chunk test F-statistic manually using:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

# Perform ANOVA on both models
anova_full <- anova(m_full)
anova_reduced <- anova(m_reduced)

# Calculate Sum of Squares for the full and reduced models
# You sum the appropriate rows based on the number of terms in each model
ssr_full <- sum(anova_full$`Sum Sq`[1:(nrow(anova_full)-1)])  # Sum of squares for the full model excluding residuals
ssr_reduced <- sum(anova_reduced$`Sum Sq`[1:(nrow(anova_reduced)-1)])  # Sum of squares for the reduced model excluding residuals

# MSE for the full model
mse_full <- anova_full$`Mean Sq`[nrow(anova_full)]  # The last row corresponds to the residual mean square (MSE)

# Degrees of freedom for the difference
df_diff <- nrow(anova_full) - nrow(anova_reduced)  # Difference in the number of terms between full and reduced models

# F-statistic for the chunk (the additional variables in the full model)
F_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

# Print the results
cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("Difference:", round(ssr_full - ssr_reduced, 2), "\n")

## Difference: 13920.75

cat("df (number of added variables):", df_diff, "\n")

## df (number of added variables): 4

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk, 4), "\n")

## F-statistic: 69.2314

# Critical value F(df_diff, residual_df, 0.95)
residual_df <- nrow(anova_full) - 1  # The residual degrees of freedom for the full model (last row)
critical_value <- qf(0.95, df_diff, residual_df)
cat("Critical value F(", df_diff, ",", residual_df, ", 0.95):", round(critical_value, 4), "\n")

## Critical value F( 4 , 6 , 0.95): 4.5337

# p-value for the F-statistic
p_value <- pf(F_chunk, df_diff, residual_df, lower.tail = FALSE)
cat("p-value:", format.pval(p_value), "\n")

## p-value: 3.7543e-05

Show all intermediate values. Does your manual computation match the anova() result?

Based on the manual computation, it does match the ANOVA because it shows from the p-values that the variables are significant.

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology.

Even though exercise was not individual significant in the Type III table but it was as a group collection, this shows that when looking at certain epidemiological outcomes, if the individual variable stands alone, they may not have huge contributions but when adding other variables it can. This shows that its important to keep the individual variable in the model even if a p-value is not significant.It can still infer a confounder in the model.

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at $\alpha = 0.05$? List them and briefly state the direction of each association (positive or negative).

# Get the coefficients, t-statistics, and p-values
tidy(m_full) %>%
  filter(term != "(Intercept)") %>% 
  mutate(
    Signif = ifelse(p.value < 0.05, "Yes", "No"),
    Direction = ifelse(estimate > 0, "Positive", "Negative")
  ) %>%
  select(Predictor = term, Estimate = estimate, Signif, Direction)

## # A tibble: 6 × 4
##   Predictor     Estimate Signif Direction
##   <chr>            <dbl> <chr>  <chr>    
## 1 physhlth_days   0.292  Yes    Positive 
## 2 sleep_hrs      -0.509  Yes    Negative 
## 3 age            -0.0823 Yes    Negative 
## 4 income_cat     -0.321  Yes    Negative 
## 5 sexFemale       1.25   Yes    Positive 
## 6 exerciseYes    -0.343  No     Negative

Based on the predictors, physhlth_days is significant with a positive direction. Sleep_hrs is significant with a negative direction. Age is significant with a negative direction. Income is significant with a negative direction. SexFemale is significant with a positive direction. Exercise is not significant with a negative direction.

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale. Even though a colleague argues to drop the variable “exercise” from the model because it is not significant, I do not agree with this statement. With using the chunk test and epidemiological results shows exercise might be still an important factor. I believe that the variable adds significant data to the overall outcome of determining the poor mental health days for individuals.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.” Given the statistical methods used, we determined if the outcome of poor mental health days had specific variables, the data shows that the physical days, sleep hours and age are crucial factors for individuals that have poor mental health days. It is also important to identify that income, sex and exercise also make contributions.

Task 7: Reflection (Not Graded, strongly recommended)

In 2–3 sentences, reflect on the process of working with a partner in this lab. What did you find helpful about the Driver-Navigator approach? How did it affect your understanding of the material?

I really enjoyed working with a partner for this lab. I liked the driver and navigator approach since it allows for independency while also having support just in case. I liked that I was able to perform my steps but if I had certain struggles or obstacles, it was nice to have my partner to reguide me if I start to feel confused and lost. I also felt it improved my understanding of the material because it was beneficial to have someone explain certain concepts multiple times. —

End of Lab Activity

Variable	Type I SS	What It Tests
\(X_1\)	\(SS(X_1)\)	Effect of \(X_1\) alone
\(X_2\)	\(SS(X_2 \mid X_1)\)	Additional effect of \(X_2\) given \(X_1\) is in the model
\(X_3\)	\(SS(X_3 \mid X_1, X_2)\)	Additional effect of \(X_3\) given \(X_1\) and \(X_2\) are in the model

Variable	Type III SS	What It Tests
\(X_1\)	\(SS(X_1 \mid X_2, X_3)\)	Effect of \(X_1\) given \(X_2\) and \(X_3\) are in the model
\(X_2\)	\(SS(X_2 \mid X_1, X_3)\)	Effect of \(X_2\) given \(X_1\) and \(X_3\) are in the model
\(X_3\)	\(SS(X_3 \mid X_1, X_2)\)	Effect of \(X_3\) given \(X_1\) and \(X_2\) are in the model

Tests of Hypotheses in Multiple Linear Regression

EPI 553 — Principles of Statistical Inference II (Spring 2026)

June McCarthy

March 14, 2026

Introduction

Setup and Data

The BRFSS 2020 Dataset

Fitting the Full Model

Part 1: Guided Practice

1. Review: The Overall F-Test

2. Type I Sums of Squares (Sequential)

2.1 Concept

2.2 Computing Type I SS in R

2.3 Interpreting the Type I ANOVA Table

3. Type III Sums of Squares (Partial)

3.1 Concept

3.2 Comparing Type I and Type III

3.3 Type III SS in R

4. Partial F-Tests for Individual Variables

4.1 The General Framework

4.2 Example: Testing Whether sleep_hrs Adds to the Model

4.3 Example: Testing Whether exercise Adds to the Model

5. Using Type I SS for Partial F-Tests

5.1 When Type I SS Are Correct

5.2 Important Caveat

6. Using Type III SS for Partial F-Tests

6.1 The Preferred Approach

6.2 Interpreting Each Test

7. T-Tests for Individual \(\beta\)s

7.1 Concept

7.2 Equivalence to Type III Partial F-Tests

7.3 Demonstrating the Equivalence

8. Tests for Groups of Variables (Chunk Tests)

8.1 Concept

8.2 Example: Do Demographic Variables Add to the Model?

8.3 Example: Do Sleep and Age Together Add Beyond Physical Health?

8.4 Important Note on Type I SS and Chunk Tests

9. The Order of Variable Entry Matters (for Type I)

10. Summary of Hypothesis Testing Approaches

Part 2: In-Class Lab Activity (Pair-Based)

Instructions

Data for the Lab

Grading Rubric

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

Task 2: Type I vs. Type III Sums of Squares (15 points)

Task 3: Partial F-Tests for Individual Variables (20 points)

Round 2: Student B Drives, Student A Navigates

Task 4: T-Tests and the F-Test Equivalence (15 points)

Task 5: Chunk Test — Testing Groups of Variables (20 points)

Task 6: Synthesis and Interpretation (15 points)

Task 7: Reflection (Not Graded, strongly recommended)

4.2 Example: Testing Whether `sleep_hrs` Adds to the Model

4.3 Example: Testing Whether `exercise` Adds to the Model