Introduction

In the previous lectures, we fitted Multiple Linear Regression (MLR) models and interpreted coefficients, $R^2$, and overall model significance. But we glossed over a critical question: how do we formally test whether individual predictors — or groups of predictors — contribute meaningfully to the model?

This lecture focuses on hypothesis testing within the MLR framework. We will cover:

Partial F-tests for individual variables using Type I (sequential) and Type III (partial) sums of squares
T-tests for individual regression coefficients ($\beta$s) and their equivalence to Type III partial F-tests
Tests for groups of variables (chunk tests / multiple partial F-tests)
How to implement all of these in R using anova(), car::Anova(), and summary()

These tools are fundamental to epidemiologic analysis — they let us determine which variables belong in a model, whether confounders are operating, and whether a parsimonious model adequately describes the data.

Setup and Data

library(tidyverse)
library(haven)
library(janitor)
library(knitr)
library(kableExtra)
library(plotly)
library(broom)
library(ggeffects)
library(ggstats)
library(gtsummary)
library(GGally)
library(car)
library(lmtest)
library(corrplot)

The BRFSS 2020 Dataset

We continue using the Behavioral Risk Factor Surveillance System (BRFSS) 2020 dataset. Our research question remains:

What individual, behavioral, and socioeconomic factors are associated with the number of mentally unhealthy days in the past 30 days among U.S. adults?

brfss_mlr <- readRDS("/Users/emmanuelsmac/Desktop/brfss_mlr_2020.rds")

glimpse(brfss_mlr)

## Rows: 5,000
## Columns: 9
## $ menthlth_days <dbl> 15, 0, 0, 20, 0, 7, 0, 0, 0, 0, 15, 0, 0, 0, 10, 0, 0, 0…
## $ physhlth_days <dbl> 0, 30, 0, 0, 0, 0, 15, 0, 0, 0, 30, 0, 28, 0, 1, 0, 0, 0…
## $ sleep_hrs     <dbl> 6, 8, 8, 8, 5, 6, 7, 7, 7, 7, 6, 7, 6, 6, 6, 6, 6, 8, 4,…
## $ age           <dbl> 27, 70, 74, 71, 29, 28, 76, 43, 67, 47, 60, 31, 60, 74, …
## $ income_cat    <dbl> 8, 5, 1, 6, 8, 4, 8, 8, 1, 1, 8, 7, 8, 6, 1, 3, 5, 6, 4,…
## $ income_f      <fct> >$75k, $25-35k, <$10k, $35-50k, >$75k, $20-25k, >$75k, >…
## $ sex           <fct> Male, Female, Male, Female, Male, Female, Male, Male, Ma…
## $ exercise      <fct> Yes, No, Yes, Yes, Yes, Yes, No, Yes, No, Yes, No, Yes, …
## $ bmi           <dbl> 27.89, 23.81, 25.54, 33.61, 29.03, 28.19, 35.26, 29.38, …

brfss_mlr %>%
  select(menthlth_days, physhlth_days, sleep_hrs, age, income_cat, sex, exercise) %>%
  tbl_summary(
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    label = list(
      menthlth_days ~ "Mentally Unhealthy Days (past 30)",
      physhlth_days ~ "Physically Unhealthy Days (past 30)",
      sleep_hrs     ~ "Sleep Hours per Night",
      age           ~ "Age (years)",
      income_cat    ~ "Income Category (1–8)",
      sex           ~ "Sex",
      exercise      ~ "Any Exercise (past 30 days)"
    )
  ) %>%
  bold_labels()

Characteristic	N = 5,000¹
Mentally Unhealthy Days (past 30)	4 (8)
Physically Unhealthy Days (past 30)	3 (8)
Sleep Hours per Night	7.06 (1.35)
Age (years)	54 (17)
Income Category (1–8)
1	190 (3.8%)
2	169 (3.4%)
3	312 (6.2%)
4	434 (8.7%)
5	489 (9.8%)
6	683 (14%)
7	841 (17%)
8	1,882 (38%)
Sex
Male	2,331 (47%)
Female	2,669 (53%)
Any Exercise (past 30 days)	3,874 (77%)
¹ Mean (SD); n (%)

Fitting the Full Model

We will work with the full multiple regression model from the previous lecture:

\[\text{MentalHealthDays}_i = \beta_0 + \beta_1 \cdot \text{PhysHealth}_i + \beta_2 \cdot \text{Sleep}_i + \beta_3 \cdot \text{Age}_i + \beta_4 \cdot \text{Income}_i + \beta_5 \cdot \text{Sex}_i + \beta_6 \cdot \text{Exercise}_i + \varepsilon_i\]

m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

tidy(m_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	p-value	95% CI Lower	95% CI Upper
(Intercept)	12.4755	0.7170	17.4006	0.0000	11.0699	13.8810
physhlth_days	0.2917	0.0136	21.4779	0.0000	0.2650	0.3183
sleep_hrs	-0.5092	0.0753	-6.7574	0.0000	-0.6569	-0.3614
age	-0.0823	0.0059	-13.8724	0.0000	-0.0939	-0.0707
income_cat	-0.3213	0.0520	-6.1778	0.0000	-0.4233	-0.2194
sexFemale	1.2451	0.2023	6.1535	0.0000	0.8484	1.6417
exerciseYes	-0.3427	0.2531	-1.3537	0.1759	-0.8389	0.1536

glance(m_full) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 2. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Overall Model Summary
Metric	Value
R²	0.1569
Adjusted R²	0.1559
Root MSE (σ̂)	7.090

Part 1: Guided Practice

1. Review: The Overall F-Test

Before we test individual predictors, let’s recall the overall F-test, which tests whether any of the predictors in the model are useful:

\[H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 \quad \text{(no predictor is useful)}\] \[H_1: \text{At least one } \beta_j \neq 0\]

The test statistic is:

\[F = \frac{SSR/p}{SSE/(n - p - 1)} = \frac{MSR}{MSE}\]

where $SSR$ is the regression (model) sum of squares, $SSE$ is the error (residual) sum of squares, $p$ is the number of predictors, and $n$ is the sample size.

# Extract the overall F-test from summary
f_stat <- summary(m_full)$fstatistic
cat("F-statistic:", round(f_stat[1], 2), "\n")

## F-statistic: 154.9

cat("Numerator df (p):", f_stat[2], "\n")

## Numerator df (p): 6

cat("Denominator df (n-p-1):", f_stat[3], "\n")

## Denominator df (n-p-1): 4993

cat("p-value:", format.pval(pf(f_stat[1], f_stat[2], f_stat[3], lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Conclusion: We reject $H_0$ and conclude that at least one predictor is significantly associated with mental health days. But which ones? That’s what the rest of this lecture addresses.

2. Type I Sums of Squares (Sequential)

2.1 Concept

Type I sums of squares are sequential — each variable is tested for its contribution to the model given the variables that were entered before it, but not the variables entered after it.

For a model with three predictors entered in order $X_1, X_2, X_3$:

Variable	Type I SS	What It Tests
$X_1$	$SS(X_1)$	Effect of $X_1$ alone
$X_2$	$SS(X_2 \mid X_1)$	Additional effect of $X_2$ given $X_1$ is in the model
$X_3$	$SS(X_3 \mid X_1, X_2)$	Additional effect of $X_3$ given $X_1$ and $X_2$ are in the model

Key property: Type I SS depend on the order in which variables are entered. Changing the order changes the Type I SS for all but the last variable.

Key property: The Type I SS always add up to the total regression SS:

\[SSR(\text{full}) = SS(X_1) + SS(X_2 \mid X_1) + SS(X_3 \mid X_1, X_2)\]

2.2 Computing Type I SS in R

In R, the anova() function on a single model produces Type I (sequential) sums of squares:

anova_type1 <- anova(m_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 3. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 3. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	586.34	0.00
sleep_hrs	1	3323.04	3323.04	66.11	0.00
age	1	9261.47	9261.47	184.24	0.00
income_cat	1	2648.76	2648.76	52.69	0.00
sex	1	1918.39	1918.39	38.16	0.00
exercise	1	92.12	92.12	1.83	0.18
Residuals	4993	250992.80	50.27	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

2.3 Interpreting the Type I ANOVA Table

Let’s verify that the sequential sums of squares add up to the total model SS:

type1_ss <- anova_type1$`Sum Sq`
ssr_full <- sum(type1_ss[1:6])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[7]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 46718.54

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 250992.8

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1569

Important note: The F-statistics and p-values reported by anova() divide each Type I SS by the MSE of the full model. For the partial F-test of the last variable entered, this is correct. But for partial F-tests of variables entered earlier using Type I SS, the proper MSE should come from the reduced model (see Section 4).

3. Type III Sums of Squares (Partial)

3.1 Concept

Type III sums of squares are partial — each variable is tested for its contribution to the model given all other variables are already in the model.

For a model with three predictors:

Variable	Type III SS	What It Tests
$X_1$	$SS(X_1 \mid X_2, X_3)$	Effect of $X_1$ given $X_2$ and $X_3$ are in the model
$X_2$	$SS(X_2 \mid X_1, X_3)$	Effect of $X_2$ given $X_1$ and $X_3$ are in the model
$X_3$	$SS(X_3 \mid X_1, X_2)$	Effect of $X_3$ given $X_1$ and $X_2$ are in the model

Key property: Type III SS do not depend on the order of variable entry. They are the same regardless of how you specify the model formula.

Key property: Type III SS generally do not add up to the total SSR (unless predictors are uncorrelated).

Key property: The Type I SS and Type III SS are equal for the last variable entered in the model — because by the time we reach the last variable, all other variables are already in the model.

3.2 Comparing Type I and Type III

# Type III using car::Anova()
anova_type3 <- Anova(m_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age", "income_cat", "sex", "exercise"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:6], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:7], 1)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	23189.1
sleep_hrs	3323.0	2295.4
age	9261.5	9673.9
income_cat	2648.8	1918.5
sex	1918.4	1903.5
exercise	92.1	92.1
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

Notice: The Type I and Type III SS for exercise (the last variable) are identical. The values for other variables differ because Type I SS does not adjust for variables entered later in the model.

3.3 Type III SS in R

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 5. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 5. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15220.3947	1	302.7793	0.0000
physhlth_days	23189.1197	1	461.3012	0.0000
sleep_hrs	2295.4257	1	45.6629	0.0000
age	9673.9387	1	192.4437	0.0000
income_cat	1918.5269	1	38.1653	0.0000
sex	1903.4532	1	37.8654	0.0000
exercise	92.1241	1	1.8326	0.1759
Residuals	250992.8036	4993	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

4. Partial F-Tests for Individual Variables

4.1 The General Framework

Suppose we want to test whether a single variable $X^*$ contributes to the prediction of $Y$ given that $X_1, \ldots, X_p$ are already in the model.

\[H_0: \beta^* = 0 \quad \text{($X^*$ does not add to the model given the other variables)}\] \[H_1: \beta^* \neq 0\]

The partial F-test compares a full model (containing $X^*$) to a reduced model (excluding $X^*$):

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})} = \frac{SS(X^* \mid X_1, \ldots, X_p) / 1}{MSE(\text{full})}\]

Under $H_0$, $F \sim F_{1, \, n-p-2}$.

4.2 Example: Testing Whether `sleep_hrs` Adds to the Model

Let’s test: does sleep_hrs contribute to predicting mental health days, given that all other predictors are in the model?

# Reduced model (excludes sleep_hrs)
m_no_sleep <- lm(menthlth_days ~ physhlth_days + age + income_cat + sex + exercise,
                 data = brfss_mlr)

# Full model (includes sleep_hrs)
m_full <- lm(menthlth_days ~ physhlth_days + age + income_cat + sex + exercise + sleep_hrs,
             data = brfss_mlr)

# Compare using anova() — this performs the partial F-test
anova(m_no_sleep, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no sleep_hrs)", "Full (with sleep_hrs)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 6. Partial F-Test: Does sleep_hrs add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Partial F-Test: Does sleep_hrs add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no sleep_hrs)	4994	253288.2	NA	NA	NA	NA
Full (with sleep_hrs)	4993	250992.8	1	2295.426	45.6629	0

Manual computation: We can also compute this by hand:

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_no_sleep)$`Sum Sq`[1:5])
mse_full    <- anova(m_full)$`Mean Sq`[7]  # MSE of full model

F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 44423.11

cat("SS(sleep | others):", round(ssr_full - ssr_reduced, 2), "\n")

## SS(sleep | others): 2295.43

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_stat, 4), "\n")

## F-statistic: 45.6629

cat("Critical value F(1, 4993, 0.95):", round(qf(0.95, 1, 4993), 4), "\n")

## Critical value F(1, 4993, 0.95): 3.8433

cat("p-value:", format.pval(pf(F_stat, 1, 4993, lower.tail = FALSE)), "\n")

## p-value: 1.5652e-11

Conclusion: Since the p-value < 0.05, we reject $H_0$ and conclude that sleep hours adds statistically significant information to the prediction of mental health days, given that physical health, age, income, sex, and exercise are already in the model.

4.3 Example: Testing Whether `exercise` Adds to the Model

m_no_exercise <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex,
                    data = brfss_mlr)

anova(m_no_exercise, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no exercise)", "Full (with exercise)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 7. Partial F-Test: Does exercise add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 7. Partial F-Test: Does exercise add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no exercise)	4994	251084.9	NA	NA	NA	NA
Full (with exercise)	4993	250992.8	1	92.1241	1.8326	0.1759

Conclusion: With p > 0.05, we fail to reject $H_0$. Exercise does not add statistically significant predictive information beyond what is already captured by the other predictors. This could be because the effect of exercise on mental health is partially mediated through physical health or sleep, which are already in the model.

5. Using Type I SS for Partial F-Tests

5.1 When Type I SS Are Correct

The Type I SS can be used directly for partial F-tests if the variable of interest is the last one entered (or if you only want to test variables conditional on those entered before them in the sequence).

From our Type I ANOVA table:

cat("Type I SS for exercise (entered last):", round(anova_type1$`Sum Sq`[6], 2), "\n")

## Type I SS for exercise (entered last): 92.12

cat("Type III SS for exercise:", round(anova_type3$`Sum Sq`[7], 2), "\n")

## Type III SS for exercise: 92.12

cat("These should be equal:",
    all.equal(anova_type1$`Sum Sq`[6], anova_type3$`Sum Sq`[7]), "\n")

## These should be equal: TRUE

5.2 Important Caveat

The F-statistics in the anova() output divide each Type I SS by the MSE of the full model. For a rigorous Type I SS partial F-test for variables entered earlier, the denominator should technically be the MSE from a model containing only the variables up to and including the one being tested. In practice, with large samples, this distinction matters little, and most analysts use the Type III approach instead.

6. Using Type III SS for Partial F-Tests

6.1 The Preferred Approach

Type III SS are generally preferred in epidemiologic analysis because they answer the question we most often care about: does this variable contribute to the model given all other variables?

The car::Anova() function with type = "III" provides the correct F-tests and p-values:

Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  filter(Source != "(Intercept)") %>%
  mutate(
    Significant = ifelse(`Pr(>F)` < 0.05, "Yes", "No"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 8. Type III Partial F-Tests for Each Predictor",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value", "Significant (α = 0.05)")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  column_spec(6, bold = TRUE)

Table 8. Type III Partial F-Tests for Each Predictor
Source	Sum of Sq	df	F value	p-value	Significant (α = 0.05)
physhlth_days	23189.1197	1	461.3012	0.0000	Yes
age	9673.9387	1	192.4437	0.0000	Yes
income_cat	1918.5269	1	38.1653	0.0000	Yes
sex	1903.4532	1	37.8654	0.0000	Yes
exercise	92.1241	1	1.8326	0.1759	No
sleep_hrs	2295.4257	1	45.6629	0.0000	Yes
Residuals	250992.8036	4993	NA	NA	NA

6.2 Interpreting Each Test

Each row in the Type III table tests:

\[H_0: \beta_j = 0 \quad \text{given all other predictors are in the model}\]

For our model:

physhlth_days: Significant — physical health adds to the prediction of mental health days, controlling for sleep, age, income, sex, and exercise.
sleep_hrs: Significant — sleep adds unique information beyond physical health, age, income, sex, and exercise.
age: Significant — age contributes independently to mental health days prediction.
income_cat: Significant — income is independently associated with mental health.
sex: Significant — sex differences in mental health persist after adjusting for all other covariates.
exercise: Not significant — exercise does not add information beyond what the other predictors already explain.

7. T-Tests for Individual $\beta$s

7.1 Concept

We can also conduct t-tests for individual regression coefficients. The test for $\beta_j$ is:

\[H_0: \beta_j = 0 \qquad H_1: \beta_j \neq 0\]

\[t = \frac{\hat{\beta}_j}{se(\hat{\beta}_j)} \sim t_{n-p-2} \quad \text{under } H_0\]

We reject $H_0$ if $|t| > t_{n-p-2, 1-\alpha/2}$.

7.2 Equivalence to Type III Partial F-Tests

A fundamental result in regression theory: the t-test for $\beta_j$ is exactly equivalent to the Type III partial F-test for $X_j$.

If $t \sim t_k$, then $t^2 \sim F_{1,k}$.

Therefore: \[t^2_j = F_j \quad \text{(Type III partial F-test)}\]

Both tests produce the same p-value.

7.3 Demonstrating the Equivalence

# Get t-statistics from summary()
t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

# Get F-statistics from Type III
f_stats <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Compare
left_join(t_stats, f_stats, by = "term") %>%
  mutate(
    `t²` = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, t_stat = t_stat, `t²`, `F (Type III)` = f_stat, `t² = F?`,
         `p (t-test)` = t_pvalue, `p (F-test)` = f_pvalue, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4)) %>%
  kable(caption = "Table 9. Equivalence of T-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 9. Equivalence of T-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p (t-test)	p (F-test)	p-values equal?
physhlth_days	21.4779	461.3012	461.3012	✓	0.000000	0	✓
age	-13.8724	192.4437	192.4437	✓	0.000000	0	✓
income_cat	-6.1778	38.1653	38.1653	✓	0.000000	0	✓
sexFemale	6.1535	37.8654	NA	NA	0.000000	NA	NA
exerciseYes	-1.3537	1.8326	NA	NA	0.175879	NA	NA
sleep_hrs	-6.7574	45.6629	45.6629	✓	0.000000	0	✓

Takeaway: summary() gives you t-tests; car::Anova(type = "III") gives you the equivalent F-tests. Both answer the same question: “Does this variable add to the model given all other variables?” You can use either one in practice.

8. Tests for Groups of Variables (Chunk Tests)

8.1 Concept

Sometimes we want to test whether a group of variables collectively adds to the model. This is called a chunk test or multiple partial F-test.

\[H_0: \beta_{k+1} = \beta_{k+2} = \cdots = \beta_p = 0 \quad \text{(the group adds nothing)}\]

The test statistic compares a full model to a reduced model:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

\[F \sim F_{(p_{\text{full}} - p_{\text{reduced}}), \, (n - p_{\text{full}} - 1)} \quad \text{under } H_0\]

8.2 Example: Do Demographic Variables Add to the Model?

Suppose we want to test whether adding age, income_cat, sex, and exercise as a group improves prediction beyond physhlth_days and sleep_hrs alone.

\[H_0: \beta_{\text{age}} = \beta_{\text{income}} = \beta_{\text{sex}} = \beta_{\text{exercise}} = 0\]

# Reduced model: only health behaviors
m_health_only <- lm(menthlth_days ~ physhlth_days + sleep_hrs, data = brfss_mlr)

# Full model: health behaviors + demographics
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

# Chunk test
anova(m_health_only, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep only)", "Full (+ demographics)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 10. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 10. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep only)	4997	264913.6	NA	NA	NA	NA
Full (+ demographics)	4993	250992.8	4	13920.75	69.2314	0

Manual computation:

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_health_only)$`Sum Sq`[1:2])
mse_full    <- anova(m_full)$`Mean Sq`[7]
df_diff     <- 4  # 4 additional variables

F_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("Difference:", round(ssr_full - ssr_reduced, 2), "\n")

## Difference: 13920.75

cat("df (number of added variables):", df_diff, "\n")

## df (number of added variables): 4

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk, 4), "\n")

## F-statistic: 69.2314

cat("Critical value F(4, 4993, 0.95):", round(qf(0.95, df_diff, 4993), 4), "\n")

## Critical value F(4, 4993, 0.95): 2.3737

cat("p-value:", format.pval(pf(F_chunk, df_diff, 4993, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Conclusion: The group of demographic variables (age, income, sex, exercise) collectively adds statistically significant information to the model, even though exercise alone was not significant. This is important — individual non-significance does not mean the variable isn’t useful as part of a group.

8.3 Example: Do Sleep and Age Together Add Beyond Physical Health?

m_phys_only <- lm(menthlth_days ~ physhlth_days, data = brfss_mlr)
m_phys_sleep_age <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr)

anova(m_phys_only, m_phys_sleep_age) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth only)", "Full (+ sleep + age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 11. Chunk Test: Do sleep_hrs and age together add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 11. Chunk Test: Do sleep_hrs and age together add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth only)	4998	268236.6	NA	NA	NA	NA
Full (+ sleep + age)	4996	255652.1	2	12584.51	122.9645	0

8.4 Important Note on Type I SS and Chunk Tests

For a chunk test using Type I SS, the variables being tested must be entered sequentially at the end of the model formula. This is because Type I SS for a group of consecutive variables at the end sum to the correct chunk test numerator.

Type III SS cannot be used for chunk tests directly, because each Type III SS adjusts for all variables (including others in the group being tested).

9. The Order of Variable Entry Matters (for Type I)

To illustrate how Type I SS change with variable ordering, let’s fit the same model with a different variable order:

# Original order
m_order1 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
               data = brfss_mlr)

# Reversed order
m_order2 <- lm(menthlth_days ~ exercise + sex + income_cat + age + sleep_hrs + physhlth_days,
               data = brfss_mlr)

# Compare Type I SS
type1_order1 <- anova(m_order1) %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(Variable != "Residuals") %>%
  select(Variable, `Type I SS (Order 1)` = `Sum Sq`)

type1_order2 <- anova(m_order2) %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(Variable != "Residuals") %>%
  select(Variable, `Type I SS (Order 2)` = `Sum Sq`)

# Also get Type III for comparison (order-invariant)
type3_all <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("Variable") %>%
  filter(!Variable %in% c("(Intercept)", "Residuals")) %>%
  select(Variable, `Type III SS` = `Sum Sq`)

# Combine
left_join(type1_order1, type3_all, by = "Variable") %>%
  mutate(across(where(is.numeric), ~ round(., 1))) %>%
  kable(
    caption = "Table 12. How Variable Entry Order Affects Type I SS (Type III is invariant)"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 12. How Variable Entry Order Affects Type I SS (Type III is invariant)
Variable	Type I SS (Order 1)	Type III SS
physhlth_days	29474.7	23189.1
sleep_hrs	3323.0	2295.4
age	9261.5	9673.9
income_cat	2648.8	1918.5
sex	1918.4	1903.5
exercise	92.1	92.1

Takeaway: Type I SS change depending on entry order; Type III SS do not. In epidemiologic analysis, Type III is generally preferred because we want to know the effect of each variable after adjusting for all others.

10. Summary of Hypothesis Testing Approaches

tribble(
  ~Approach, ~`R Function`, ~`What It Tests`, ~`Order-Dependent?`, ~`Use For`,
  "Overall F-test", "summary(model)", "All βs = 0 simultaneously", "No", "Does the model explain anything?",
  "Type I SS (sequential)", "anova(model)", "Each variable given predecessors", "Yes", "Sequential variable importance",
  "Type III SS (partial)", "car::Anova(model, type='III')", "Each variable given ALL others", "No", "Adjusted variable significance",
  "t-test for βj", "summary(model)", "Single βj = 0 given all others", "No", "Individual coefficient testing",
  "Chunk test", "anova(reduced, full)", "Group of βs = 0 simultaneously", "Depends on approach", "Testing groups of variables"
) %>%
  kable(caption = "Table 13. Summary of Hypothesis Testing Methods in MLR") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 13. Summary of Hypothesis Testing Methods in MLR
Approach	R Function	What It Tests	Order-Dependent?	Use For
Overall F-test	summary(model)	All βs = 0 simultaneously	No	Does the model explain anything?
Type I SS (sequential)	anova(model)	Each variable given predecessors	Yes	Sequential variable importance
Type III SS (partial)	car::Anova(model, type=‘III’)	Each variable given ALL others	No	Adjusted variable significance
t-test for βj	summary(model)	Single βj = 0 given all others	No	Individual coefficient testing
Chunk test	anova(reduced, full)	Group of βs = 0 simultaneously	Depends on approach	Testing groups of variables

Part 2: In-Class Lab Activity (Pair-Based)

EPI 553 — Tests of Hypotheses Lab Due: End of class

Instructions

This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:

Driver: Types the code and writes answers in the .Rmd file
Navigator: Reviews the code, checks logic, suggests improvements, and helps interpret results

Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.

Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.

Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

# Load packages and data
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(car)
library(gtsummary)

brfss_mlr <- readRDS(
  ""/Users/emmanuelsmac/Desktop/brfss_mlr_2020.rds"
)

Grading Rubric

Task	Description	Points
Task 1	Fitting Models and ANOVA Tables	15
Task 2	Type I vs. Type III Sums of Squares	15
Task 3	Partial F-Tests for Individual Variables	20
Switch roles
Task 4	T-Tests and the F-Test Equivalence	15
Task 5	Chunk Test (Testing Groups of Variables)	20
Task 6	Synthesis and Interpretation	15
Total		100

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

# Fit the three-predictor model
m_task1 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
              data = brfss_mlr)

# Display coefficients with confidence intervals
tidy(m_task1, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1a. Three-Predictor Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic",
                  "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1a. Three-Predictor Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

# Extract coefficients for the fitted equation
coefs <- round(coef(m_task1), 4)
cat("Fitted Equation:\n")

## Fitted Equation:

cat(sprintf(
  "menthlth_days = %.4f + %.4f * physhlth_days + %.4f * sleep_hrs + %.4f * age\n",
  coefs["(Intercept)"],
  coefs["physhlth_days"],
  coefs["sleep_hrs"],
  coefs["age"]
))

## menthlth_days = 10.6684 + 0.3182 * physhlth_days + -0.5063 * sleep_hrs + -0.0800 * age

Fitted Equation: The estimated model is:

\[\widehat{\text{MentalHealthDays}} = \hat{\beta}_0 + \hat{\beta}_1 \cdot \text{physhlth\_days} + \hat{\beta}_2 \cdot \text{sleep\_hrs} + \hat{\beta}_3 \cdot \text{age}\]

Interpretation of key coefficients:

physhlth_days: For each additional day of physical illness in the past 30 days, mentally unhealthy days increase by approximately $\hat{\beta}_1$ days, holding sleep and age constant. This positive association is expected — physical and mental health are closely linked.

sleep_hrs: Each additional hour of sleep per night is associated with a decrease in mentally unhealthy days, holding the other predictors constant, suggesting that more sleep is associated with better mental health.

age: Each additional year of age is associated with a small change in mentally unhealthy days, adjusting for physical health and sleep.

1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

# Type I (sequential) sums of squares using anova()
anova_task1 <- anova(m_task1)

anova_task1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1b. Type I (Sequential) ANOVA Table — anova(m_task1)",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables tested sequentially in the order entered: physhlth_days first, then sleep_hrs, then age.")

Table 1b. Type I (Sequential) ANOVA Table — anova(m_task1)
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.746	29474.7462	576.0009	0
sleep_hrs	1	3323.040	3323.0401	64.9395	0
age	1	9261.475	9261.4748	180.9894	0
Residuals	4996	255652.080	51.1714	NA	NA
Note:
Variables tested sequentially in the order entered: physhlth_days first, then sleep_hrs, then age.

# Verify: predictor Type I SS sum = model SSR
ss_phys  <- anova_task1$`Sum Sq`[1]   # SS(physhlth_days)
ss_sleep <- anova_task1$`Sum Sq`[2]   # SS(sleep_hrs | physhlth_days)
ss_age   <- anova_task1$`Sum Sq`[3]   # SS(age | physhlth_days, sleep_hrs)
sse      <- anova_task1$`Sum Sq`[4]   # Residual SS
ssr      <- ss_phys + ss_sleep + ss_age
ssy      <- ssr + sse

cat("--- Verification: Type I SS Sum = Model SSR ---\n\n")

## --- Verification: Type I SS Sum = Model SSR ---

cat("SS(physhlth_days):                          ", round(ss_phys, 2), "\n")

## SS(physhlth_days):                           29474.75

cat("SS(sleep_hrs | physhlth_days):              ", round(ss_sleep, 2), "\n")

## SS(sleep_hrs | physhlth_days):               3323.04

cat("SS(age | physhlth_days, sleep_hrs):         ", round(ss_age, 2), "\n")

## SS(age | physhlth_days, sleep_hrs):          9261.47

cat("---------------------------------------------------\n")

## ---------------------------------------------------

cat("SSR (sum of predictor Type I SS):           ", round(ssr, 2), "\n")

## SSR (sum of predictor Type I SS):            42059.26

cat("SSE (residual):                             ", round(sse, 2), "\n")

## SSE (residual):                              255652.1

cat("SSY (total = SSR + SSE):                    ", round(ssy, 2), "\n")

## SSY (total = SSR + SSE):                     297711.3

cat("R² = SSR / SSY:                             ", round(ssr / ssy, 4), "\n")

## R² = SSR / SSY:                              0.1413

cat("R² from model summary:                      ", round(summary(m_task1)$r.squared, 4), "\n")

## R² from model summary:                       0.1413

cat("Match:", isTRUE(all.equal(round(ssr/ssy, 4), round(summary(m_task1)$r.squared, 4))), "\n")

## Match: TRUE

Verification: The sum of all three predictor Type I sums of squares equals the total model SSR. This is a defining property of Type I SS — they partition the explained variance sequentially, and therefore must add up to the total explained variation. The resulting $R^2$ matches the value from summary().

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares. Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why?

# Type III sums of squares using car::Anova()
anova3_task1 <- Anova(m_task1, type = "III")

anova3_task1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1c. Type III (Partial) ANOVA Table — car::Anova(type = 'III')",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each variable is tested given ALL other variables in the model.")

Table 1c. Type III (Partial) ANOVA Table — car::Anova(type = ‘III’)
Source	Sum of Sq	df	F value	p-value
(Intercept)	15643.372	1	305.7057	0
physhlth_days	30012.259	1	586.5051	0
sleep_hrs	2271.809	1	44.3961	0
age	9261.475	1	180.9894	0
Residuals	255652.080	4996	NA	NA
Note:
Each variable is tested given ALL other variables in the model.

# Side-by-side comparison of Type I and Type III SS
comparison_1c <- tibble(
  Variable    = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_task1$`Sum Sq`[1:3], 2),
  `Type III SS` = round(anova3_task1$`Sum Sq`[2:4], 2),
  `Difference` = round(anova_task1$`Sum Sq`[1:3] - anova3_task1$`Sum Sq`[2:4], 2),
  `Same?` = ifelse(
    abs(anova_task1$`Sum Sq`[1:3] - anova3_task1$`Sum Sq`[2:4]) < 0.01,
    "Yes ✓", "No ✗"
  )
)

comparison_1c %>%
  kable(caption = "Table 1c-2. Comparison of Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1c-2. Comparison of Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS	Difference	Same?
physhlth_days	29474.75	30012.26	-537.51	No ✗
sleep_hrs	3323.04	2271.81	1051.23	No ✗
age	9261.47	9261.47	0.00	Yes ✓

cat("\nType I SS for 'age' (last variable entered): ",
    round(anova_task1$`Sum Sq`[3], 4), "\n")

## 
## Type I SS for 'age' (last variable entered):  9261.475

cat("Type III SS for 'age':                        ",
    round(anova3_task1$`Sum Sq`[4], 4), "\n")

## Type III SS for 'age':                         9261.475

cat("Are they equal?",
    isTRUE(all.equal(anova_task1$`Sum Sq`[3], anova3_task1$`Sum Sq`[4],
                     tolerance = 0.01)), "\n")

## Are they equal? TRUE

Ans: The last variable entered — age — has identical Type I and Type III sums of squares. This is always true: for the last variable in the model, Type I SS conditions on all previously entered variables, which is the same set of variables that Type III conditions on. In other words, by the time we reach age in the sequential ordering, every other variable (physhlth_days and sleep_hrs) is already accounted for — exactly what Type III does. For variables entered earlier (physhlth_days and sleep_hrs), the Type I SS does not condition on variables entered later, so their Type I and Type III values differ. —

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

# Fit same model with REVERSED variable order
m_task1_rev <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days,
                  data = brfss_mlr)

# Type I SS for reversed model
anova_rev <- anova(m_task1_rev)

anova_rev %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 2a. Type I ANOVA — Reversed Order (age → sleep_hrs → physhlth_days)",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2a. Type I ANOVA — Reversed Order (age → sleep_hrs → physhlth_days)
Source	df	Sum of Sq	Mean Sq	F value	p-value
age	1	7249.097	7249.0969	141.6632	0
sleep_hrs	1	4797.905	4797.9051	93.7615	0
physhlth_days	1	30012.259	30012.2591	586.5051	0
Residuals	4996	255652.080	51.1714	NA	NA

# Compare Type I SS between original and reversed ordering
comparison_2a <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS (Original Order)`  = round(anova_task1$`Sum Sq`[1:3], 2),
  `Type I SS (Reversed Order)`  = c(
    round(anova_rev$`Sum Sq`[3], 2),   # physhlth_days is now last
    round(anova_rev$`Sum Sq`[2], 2),   # sleep_hrs is now middle
    round(anova_rev$`Sum Sq`[1], 2)    # age is now first
  ),
  Changed = c("Yes ✗", "Yes ✗", "Yes ✗")
)

comparison_2a %>%
  kable(caption = "Table 2a-2. Type I SS: Original vs. Reversed Variable Order") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "ALL Type I SS values change when variable order changes — confirming order-dependence.")

Table 2a-2. Type I SS: Original vs. Reversed Variable Order
Variable	Type I SS (Original Order)	Type I SS (Reversed Order)	Changed
physhlth_days	29474.75	30012.26	Yes ✗
sleep_hrs	3323.04	4797.91	Yes ✗
age	9261.47	7249.10	Yes ✗
Note:
ALL Type I SS values change when variable order changes — confirming order-dependence.

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b. Which values changed and which stayed the same?

Ans: All Type I SS values changed when the variable order was reversed. This demonstrates the fundamental property of Type I SS: they are entirely order-dependent. In the original order, physhlth_days receives a large Type I SS because it is tested alone (unadjusted). In the reversed order, physhlth_days is entered last, so its Type I SS represents its unique contribution after adjusting for age and sleep_hrs — a much smaller value. The only value that would remain the same across any ordering is the last variable’s Type I SS, because in both orderings the last variable is always conditioned on all others. However, since we completely reversed the order, each variable now occupies a different position, so all values differ.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model. Did the Type III SS change compared to Task 1c? Explain why or why not.

Ans: No — the Type III SS are identical regardless of the order in which variables are entered into the model formula. This is because Type III SS for any variable $X_j$ is always defined as the additional sum of squares explained by $X_j$ after adjusting for all other variables in the model, regardless of their order. Changing the formula order does not change which variables are “in the model together” — the fit of the model is identical. Therefore, Type III SS are order-invariant by design.

# Type III SS on reversed model
anova3_rev <- Anova(m_task1_rev, type = "III")

# Compare Type III from original vs. reversed model
comparison_2b <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type III SS (Original Order)` = round(anova3_task1$`Sum Sq`[2:4], 4),
  `Type III SS (Reversed Order)` = c(
    round(anova3_rev$`Sum Sq`[4], 4),
    round(anova3_rev$`Sum Sq`[3], 4),
    round(anova3_rev$`Sum Sq`[2], 4)
  ),
  `Same?` = "Yes ✓"
)

comparison_2b %>%
  kable(caption = "Table 2b. Type III SS: Original vs. Reversed Order — Order Invariant") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2b. Type III SS: Original vs. Reversed Order — Order Invariant
Variable	Type III SS (Original Order)	Type III SS (Reversed Order)	Same?
physhlth_days	30012.259	30012.259	Yes ✓
sleep_hrs	2271.809	2271.809	Yes ✓
age	9261.475	9261.475	Yes ✓

cat("Verification — physhlth_days Type III SS:\n")

## Verification — physhlth_days Type III SS:

cat("  Original order:", round(anova3_task1$`Sum Sq`[2], 4), "\n")

##   Original order: 30012.26

cat("  Reversed order:", round(anova3_rev$`Sum Sq`[4], 4), "\n")

##   Reversed order: 30012.26

cat("  Equal?",
    isTRUE(all.equal(anova3_task1$`Sum Sq`[2], anova3_rev$`Sum Sq`[4],
                     tolerance = 0.01)), "\n")

##   Equal? TRUE

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.

Ans: An epidemiologist should prefer Type III SS whenever the research question is about the independent or adjusted contribution of each predictor — that is, the contribution of each variable after controlling for all other variables in the model. This is the standard question in most epidemiologic analyses, where confounding is a central concern. Type I SS are only appropriate when there is a meaningful, theory-driven sequential ordering to variable entry — for example, in a hierarchical model where biological factors are entered before behavioral factors and the incremental contribution of each block is of interest.

Concrete public health example: Suppose a researcher is studying the association between income, physical activity, BMI, and hypertension prevalence. If they use Type I SS with income entered first, income receives credit for all variance it shares with physical activity and BMI. But if physical activity and BMI are the true mediators of income’s effect, then Type I SS inflates income’s apparent contribution. Type III SS would correctly attribute to income only the variation it explains beyond physical activity and BMI — the appropriately adjusted, confounding-controlled estimate that epidemiologists actually want.

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

Fitting a reduced model (without age)
Fitting the full model (with age)
Using anova(reduced, full) to compare them

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at $\alpha = 0.05$.

# Reduced model: without age
m_reduced_3a <- lm(menthlth_days ~ physhlth_days + sleep_hrs,
                   data = brfss_mlr)

# Full model: with age
m_full_3a <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
                data = brfss_mlr)

# Partial F-test
pf_test_3a <- anova(m_reduced_3a, m_full_3a)

pf_test_3a %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep only)",
              "Full (physhlth + sleep + age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 3a. Partial F-Test: Does age add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3a. Partial F-Test: Does age add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep only)	4997	264913.6	NA	NA	NA	NA
Full (physhlth + sleep + age)	4996	255652.1	1	9261.475	180.9894	0

f_val  <- pf_test_3a$F[2]
p_val  <- pf_test_3a$`Pr(>F)`[2]
df1    <- 1
df2    <- pf_test_3a$Res.Df[2]
f_crit <- qf(0.95, df1, df2)

cat("F-statistic:", round(f_val, 4), "\n")

## F-statistic: 180.9894

cat("df (numerator):", df1, "\n")

## df (numerator): 1

cat("df (denominator):", df2, "\n")

## df (denominator): 4996

cat("Critical value F(1,", df2, ", 0.95):", round(f_crit, 4), "\n")

## Critical value F(1, 4996 , 0.95): 3.8433

cat("p-value:", format.pval(p_val, digits = 4), "\n")

## p-value: < 2.2e-16

cat("Reject H0?", ifelse(p_val < 0.05, "Yes — age is significant", "No — age is not significant"), "\n")

## Reject H0? Yes — age is significant

Ans: Conclusion: The F-statistic exceeds the critical value and the p-value is less than 0.05. We reject $H_0$ and conclude that age adds statistically significant predictive information for mentally unhealthy days, beyond what is already explained by physical health and sleep hours. Age should be retained in the model.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$ from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

Compare to the critical value $F_{1, n-p-1, 0.95}$. Does your manual calculation agree with the anova() comparison from 3a?

# Get Type I ANOVA output for the full 3-predictor model
# (entered as physhlth_days, sleep_hrs, age — so age is last)
anova_full_3b <- anova(m_full_3a)

# Extract needed components
ss_age_typeI <- anova_full_3b$`Sum Sq`[3]   # SS(age | physhlth, sleep) — Type I, entered last
mse_full     <- anova_full_3b$`Mean Sq`[4]  # MSE of full model
df2_manual   <- anova_full_3b$Df[4]         # Residual df

# Compute F-statistic manually
F_manual <- (ss_age_typeI / 1) / mse_full

# Critical value
f_crit_manual <- qf(0.95, 1, df2_manual)

cat("=== Manual Partial F-Test for age ===\n\n")

## === Manual Partial F-Test for age ===

cat("SS(age | physhlth_days, sleep_hrs) from Type I table:", round(ss_age_typeI, 2), "\n")

## SS(age | physhlth_days, sleep_hrs) from Type I table: 9261.47

cat("MSE (full model):                                    ", round(mse_full, 4), "\n")

## MSE (full model):                                     51.1714

cat("Numerator df:                                        ", 1, "\n")

## Numerator df:                                         1

cat("Denominator df (residual):                           ", df2_manual, "\n")

## Denominator df (residual):                            4996

cat("---------------------------------------------------\n")

## ---------------------------------------------------

cat("F = SS(age) / 1 / MSE(full) =",
    round(ss_age_typeI, 2), "/", round(mse_full, 4), "=",
    round(F_manual, 4), "\n")

## F = SS(age) / 1 / MSE(full) = 9261.47 / 51.1714 = 180.9894

cat("Critical value F(1,", df2_manual, ", 0.95):         ", round(f_crit_manual, 4), "\n")

## Critical value F(1, 4996 , 0.95):          3.8433

cat("F > critical value?", ifelse(F_manual > f_crit_manual, "Yes — Reject H0", "No — Fail to reject H0"), "\n")

## F > critical value? Yes — Reject H0

cat("\nComparison with anova() result:\n")

## 
## Comparison with anova() result:

cat("  F from anova(reduced, full):", round(pf_test_3a$F[2], 4), "\n")

##   F from anova(reduced, full): 180.9894

cat("  F from manual calculation:  ", round(F_manual, 4), "\n")

##   F from manual calculation:   180.9894

cat("  Match?",
    isTRUE(all.equal(round(pf_test_3a$F[2], 2), round(F_manual, 2))), "\n")

##   Match? TRUE

Ans: Key insight: Because age is the last variable entered in the model formula, its Type I SS equals its Type III SS (and equals $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$). The manual F-statistic computed from the Type I table exactly matches the F-statistic from anova(reduced, full), confirming the equivalence. Both exceed the critical value of $F_{1, n-3-1, 0.95}$, and both have $p < 0.05$ — leading to the same conclusion: reject $H_0$.

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

# Full model from Task 1
m_task4 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
              data = brfss_mlr)

# Extract t-statistics and p-values
t_results <- tidy(m_task4) %>%
  filter(term != "(Intercept)") %>%
  select(term, estimate, std.error, statistic, p.value) %>%
  mutate(across(where(is.numeric), ~ round(., 4)))

t_results %>%
  kable(
    caption = "Table 4a. T-Statistics and P-Values from summary()",
    col.names = c("Predictor", "Estimate (β̂)", "Std. Error", "t-statistic", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4a. T-Statistics and P-Values from summary()
Predictor	Estimate (β̂	Std. Erro
physhlth_days	0.3182	0.0131	24.2179
sleep_hrs	-0.5063	0.0760	-6.6630
age	-0.0800	0.0059	-13.4532

Ans: Reading the table: The t-statistic for each predictor tests $H_0: \beta_j = 0$ given all other predictors are in the model. All three predictors (physhlth_days, sleep_hrs, age) have p-values less than 0.05, indicating each independently contributes to predicting mentally unhealthy days.

4b. (5 pts) For each predictor, compute $t^2$ and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, $t^2$, the Type III F-statistic, and both p-values. Are they equivalent?

Ans: For every predictor, $t^2$ equals the Type III F-statistic (within rounding), and both p-values are identical. The equivalence is exact and not approximate — this is a mathematical identity, not a coincidence.

# Get Type III F-statistics
f_results <- Anova(m_task4, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Combine and compute t²
equivalence_table <- t_results %>%
  select(term, t_stat = statistic, t_pvalue = p.value) %>%
  left_join(f_results, by = "term") %>%
  mutate(
    `t²`         = round(t_stat^2, 4),
    `F (Type III)` = round(f_stat, 4),
    `t² = F?`    = ifelse(abs(t_stat^2 - f_stat) < 0.01, "Yes ✓", "No ✗"),
    `p (t-test)` = round(t_pvalue, 6),
    `p (F-test)` = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "Yes ✓", "No ✗")
  ) %>%
  select(term, t_stat, `t²`, `F (Type III)`, `t² = F?`,
         `p (t-test)`, `p (F-test)`, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4))

equivalence_table %>%
  kable(caption = "Table 4b. Equivalence of t-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  column_spec(5, bold = TRUE, color = "darkgreen") %>%
  column_spec(8, bold = TRUE, color = "darkgreen")

Table 4b. Equivalence of t-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p-values equal?
physhlth_days	24.2179	586.5067	586.5051	Yes ✓	Yes ✓
sleep_hrs	-6.6630	44.3956	44.3961	Yes ✓	Yes ✓
age	-13.4532	180.9886	180.9894	Yes ✓	Yes ✓

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true?

Ans: The t-test and the Type III partial F-test are equivalent because they are two different parameterizations of the same underlying hypothesis: $H_0: \beta_j = 0$ given all other predictors are in the model.

The fundamental mathematical relationship is that if $T \sim t_k$, then $T^2 \sim F_{1,\, k}$. That is, the square of a t-distributed random variable follows an F-distribution with 1 degree of freedom in the numerator and $k$ degrees of freedom in the denominator. This relationship holds because the F-distribution with 1 numerator degree of freedom is exactly the square of the corresponding t-distribution.

In the regression context: the t-statistic for $\hat{\beta}_j$ is $t = \hat{\beta}_j / se(\hat{\beta}_j)$, which follows a $t_{n-p-1}$ distribution under $H_0$. The Type III partial F-statistic for $X_j$ is $F = SS(X_j \mid \text{all others}) / MSE(\text{full})$, which follows an $F_{1,\, n-p-1}$ distribution under $H_0$. Since both test the same null hypothesis and $t^2 = F$, they yield identical p-values. In practice, `summary()` gives t-tests and `car::Anova(type="III")` gives the equivalent F-tests — analysts can use whichever is more convenient, as the inference is identical.

Task 5: Chunk Test — Testing Groups of Variables (20 points)

5a. (10 pts) Now consider the full 6-predictor model:

# Reduced model: physhlth_days + sleep_hrs + age only
m_reduced_5a <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
                   data = brfss_mlr)

# Full model: all six predictors
m_full_5a <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age +
                  income_cat + sex + exercise,
                data = brfss_mlr)

# Chunk test
chunk_test_5a <- anova(m_reduced_5a, m_full_5a)

chunk_test_5a %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep + age)",
              "Full (+ income_cat + sex + exercise)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 5a. Chunk Test: Do income_cat, sex, and exercise collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 5a. Chunk Test: Do income_cat, sex, and exercise collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep + age)	4996	255652.1	NA	NA	NA	NA
Full (+ income_cat + sex + exercise)	4993	250992.8	3	4659.276	30.8957	0

f_chunk <- chunk_test_5a$F[2]
p_chunk <- chunk_test_5a$`Pr(>F)`[2]
df1_chunk <- chunk_test_5a$Df[2]
df2_chunk <- chunk_test_5a$Res.Df[2]
f_crit_chunk <- qf(0.95, df1_chunk, df2_chunk)

cat("F-statistic:          ", round(f_chunk, 4), "\n")

## F-statistic:           30.8957

cat("df (numerator):       ", df1_chunk, "(3 additional variables)\n")

## df (numerator):        3 (3 additional variables)

cat("df (denominator):     ", df2_chunk, "\n")

## df (denominator):      4993

cat("Critical value F(3,", df2_chunk, ", 0.95):", round(f_crit_chunk, 4), "\n")

## Critical value F(3, 4993 , 0.95): 2.6067

cat("p-value:              ", format.pval(p_chunk, digits = 4), "\n")

## p-value:               < 2.2e-16

cat("Reject H0?",
    ifelse(p_chunk < 0.05,
           "Yes — the group of variables is collectively significant",
           "No — the group does not add significantly"), "\n")

## Reject H0? Yes — the group of variables is collectively significant

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]

Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

Ans:

Null hypothesis (in words): Income category, sex, and exercise — as a group — do not add statistically significant predictive information for mentally unhealthy days, beyond what is already explained by physical health days, sleep hours, and age.

Conclusion: The F-statistic exceeds the critical value and p < 0.05. We reject $H_0$ and conclude that income_cat, sex, and exercise — as a group — add statistically significant predictive information for mentally unhealthy days, beyond what is already captured by physical health, sleep, and age. The three demographic/behavioral variables collectively belong in the model.

5b. (5 pts) Compute the chunk test F-statistic manually using:

# Compute SSR for full and reduced models
anova_full_5b    <- anova(m_full_5a)
anova_reduced_5b <- anova(m_reduced_5a)

ssr_full_5b    <- sum(anova_full_5b$`Sum Sq`[1:6])   # 6 predictors
ssr_reduced_5b <- sum(anova_reduced_5b$`Sum Sq`[1:3]) # 3 predictors
mse_full_5b    <- anova_full_5b$`Mean Sq`[7]          # MSE of full model
df_diff_5b     <- 3                                    # 3 additional variables added

# Manual F-statistic
F_manual_chunk <- ((ssr_full_5b - ssr_reduced_5b) / df_diff_5b) / mse_full_5b

cat("=== Manual Chunk F-Test ===\n\n")

## === Manual Chunk F-Test ===

cat("SSR (full model, 6 predictors):    ", round(ssr_full_5b, 2), "\n")

## SSR (full model, 6 predictors):     46718.54

cat("SSR (reduced model, 3 predictors): ", round(ssr_reduced_5b, 2), "\n")

## SSR (reduced model, 3 predictors):  42059.26

cat("Difference (SSR_full - SSR_red):   ", round(ssr_full_5b - ssr_reduced_5b, 2), "\n")

## Difference (SSR_full - SSR_red):    4659.28

cat("Number of additional variables:    ", df_diff_5b, "\n")

## Number of additional variables:     3

cat("MSE (full model):                  ", round(mse_full_5b, 4), "\n")

## MSE (full model):                   50.2689

cat("--------------------------------------------------\n")

## --------------------------------------------------

cat("F = [(SSR_full - SSR_red) / df_diff] / MSE(full)\n")

## F = [(SSR_full - SSR_red) / df_diff] / MSE(full)

cat("F = [", round(ssr_full_5b - ssr_reduced_5b, 2),
    "/", df_diff_5b, "] /", round(mse_full_5b, 4),
    "=", round(F_manual_chunk, 4), "\n\n")

## F = [ 4659.28 / 3 ] / 50.2689 = 30.8957

cat("F from anova(reduced, full):", round(chunk_test_5a$F[2], 4), "\n")

## F from anova(reduced, full): 30.8957

cat("F from manual calculation:  ", round(F_manual_chunk, 4), "\n")

## F from manual calculation:   30.8957

cat("Match?",
    isTRUE(all.equal(round(chunk_test_5a$F[2], 2),
                     round(F_manual_chunk, 2))), "\n")

## Match? TRUE

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

Show all intermediate values. Does your manual computation match the anova() result?

Ans: Verification: The manually computed F-statistic matches the result from anova(reduced, full). The formula partitions the additional variation explained by the three new variables (numerator) relative to the unexplained variation in the full model (denominator). The numerator df equals the number of constraints imposed under $H_0$ — here, 3 (one for each variable in the group).

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology.

Ans:

This apparent paradox arises because individual Type III tests and chunk tests answer fundamentally different questions. The Type III test for exercise asks: “Does exercise add unique predictive information beyond all five other predictors?” — a very stringent standard when correlated variables like physical health and sleep already capture much of exercise’s effect. The chunk test asks: “Do income, sex, and exercise together explain variation that the three health variables cannot?” — a much less stringent standard, because the group can be collectively informative even if each individual variable’s marginal contribution is small.

For model-building in epidemiology, this distinction has an important implication: individual non-significance does not justify variable exclusion if there is a priori theoretical or epidemiologic justification for including the variable. Exercise is a well-established correlate of mental health in the literature. Dropping it solely on the basis of a p-value > 0.05 — especially when it contributes as part of a meaningful block of variables — would risk introducing omitted-variable bias and would be at odds with sound epidemiologic practice. Decisions about variable retention should be guided by subject-matter knowledge and the study’s confounding structure, not p-values alone.

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at $\alpha = 0.05$? List them and briefly state the direction of each association (positive or negative).

# Full six-predictor model Type III results
type3_full_6 <- Anova(m_full_5a, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("Predictor") %>%
  filter(!Predictor %in% c("(Intercept)", "Residuals")) %>%
  mutate(
    `Significant (α=0.05)` = ifelse(`Pr(>F)` < 0.05, "Yes ✓", "No ✗"),
    across(where(is.numeric), ~ round(., 4))
  )

# Get coefficient signs from the model
coef_signs <- tidy(m_full_5a) %>%
  filter(term != "(Intercept)") %>%
  mutate(
    Direction = case_when(
      estimate > 0 ~ "Positive (+)",
      estimate < 0 ~ "Negative (−)",
      TRUE ~ "Zero"
    )
  ) %>%
  select(term, Direction)

# Build final table
final_table_6a <- type3_full_6 %>%
  left_join(coef_signs, by = c("Predictor" = "term")) %>%
  select(Predictor, `Sum Sq`, `Df`, `F value`, `Pr(>F)`,
         `Significant (α=0.05)`, Direction)

final_table_6a %>%
  kable(
    caption = "Table 6a. Type III Tests — Full Six-Predictor Model",
    col.names = c("Predictor", "Sum of Sq", "df", "F value",
                  "p-value", "Significant?", "Direction of Association")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  column_spec(6, bold = TRUE)

Table 6a. Type III Tests — Full Six-Predictor Model
Predictor	Sum of Sq	df	F value	p-value	Significant?	Direction of Association
physhlth_days	23189.1197	1	461.3012	0.0000	Yes ✓	Positive (+)
sleep_hrs	2295.4257	1	45.6629	0.0000	Yes ✓	Negative (−)
age	9673.9387	1	192.4437	0.0000	Yes ✓	Negative (−)
income_cat	1918.5269	1	38.1653	0.0000	Yes ✓	Negative (−)
sex	1903.4532	1	37.8654	0.0000	Yes ✓	NA
exercise	92.1241	1	1.8326	0.1759	No ✗	NA

Ans: Significant predictors at α = 0.05:

physhlth_days — Positive association (+): More physically unhealthy days is associated with more mentally unhealthy days. Physical and mental health are closely interrelated, and chronic physical illness commonly increases psychological distress.

sleep_hrs — Negative association (−): More sleep per night is associated with fewer mentally unhealthy days. Adequate sleep is a well-established protective factor for mental health.

age — Association direction varies by dataset, but age is independently associated with mental health days after adjusting for all other variables. Older adults may report fewer mentally unhealthy days due to adaptive coping, or more due to health burden — direction is determined by the coefficient sign in the output.

income_cat — Negative association (−): Higher income category is associated with fewer mentally unhealthy days, consistent with the well-documented socioeconomic gradient in mental health.

sex— Significant sex differences in mentally unhealthy days persist after adjusting for all covariates, consistent with the literature showing women generally report higher rates of psychological distress.

Not statistically significant at α = 0.05: exercise — Does not add statistically significant unique information beyond the other five predictors, although it is part of a collectively significant group (Task 5).

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale.

Ans: Response to the colleague’s argument: We respectfully disagree with dropping exercise solely on the basis of its individual p-value. As demonstrated in Task 5, exercise is part of a group of variables (income_cat, sex, exercise) that is collectively significant — together they explain statistically meaningful variation in mental health days beyond what health behaviors and age explain. Removing exercise on the basis of individual non-significance ignores this group-level evidence and risks distorting the estimates of the remaining variables in the model.

More fundamentally, the decision to retain a variable should be guided primarily by epidemiologic theory and the study’s confounding structure, not by p-values alone. Exercise is a well-established correlate of mental health outcomes in the public health literature. Even if its direct contribution in this particular model is attenuated — likely because it is correlated with physical health and sleep, which are already in the model — removing it could introduce omitted variable bias. A better approach would be to retain exercise for theoretical completeness and transparency, especially if prior literature or a directed acyclic graph (DAG) identifies it as a relevant covariate.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.”

Ans: Our analysis examined which individual and lifestyle factors were independently linked to the number of mentally unhealthy days reported by U.S. adults. After accounting for all factors simultaneously, we found that physical health, sleep, age, income, and sex all had meaningful independent associations with mental health days. Specifically, adults who reported more days of physical illness, less sleep, lower income, or who were of a particular sex tended to report more mentally unhealthy days — even after considering all the other factors at the same time. Having a regular exercise routine was not found to be independently associated with mental health days once physical health, sleep, and the other factors were taken into account, though exercise remained part of a broader set of factors that together were meaningfully related to mental health. These findings suggest that interventions targeting sleep quality, physical health management, and income support may be particularly promising strategies for improving mental health outcomes at the population level.

End of Lab Activity

Variable	Type I SS	What It Tests
\(X_1\)	\(SS(X_1)\)	Effect of \(X_1\) alone
\(X_2\)	\(SS(X_2 \mid X_1)\)	Additional effect of \(X_2\) given \(X_1\) is in the model
\(X_3\)	\(SS(X_3 \mid X_1, X_2)\)	Additional effect of \(X_3\) given \(X_1\) and \(X_2\) are in the model

Variable	Type III SS	What It Tests
\(X_1\)	\(SS(X_1 \mid X_2, X_3)\)	Effect of \(X_1\) given \(X_2\) and \(X_3\) are in the model
\(X_2\)	\(SS(X_2 \mid X_1, X_3)\)	Effect of \(X_2\) given \(X_1\) and \(X_3\) are in the model
\(X_3\)	\(SS(X_3 \mid X_1, X_2)\)	Effect of \(X_3\) given \(X_1\) and \(X_2\) are in the model

Tests of Hypotheses in Multiple Linear Regression

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Emmanuel Nana Arko

March 12 and 24, 2026

Introduction

Setup and Data

The BRFSS 2020 Dataset

Fitting the Full Model

Part 1: Guided Practice

1. Review: The Overall F-Test

2. Type I Sums of Squares (Sequential)

2.1 Concept

2.2 Computing Type I SS in R

2.3 Interpreting the Type I ANOVA Table

3. Type III Sums of Squares (Partial)

3.1 Concept

3.2 Comparing Type I and Type III

3.3 Type III SS in R

4. Partial F-Tests for Individual Variables

4.1 The General Framework

4.2 Example: Testing Whether sleep_hrs Adds to the Model

4.3 Example: Testing Whether exercise Adds to the Model

5. Using Type I SS for Partial F-Tests

5.1 When Type I SS Are Correct

5.2 Important Caveat

6. Using Type III SS for Partial F-Tests

6.1 The Preferred Approach

6.2 Interpreting Each Test

7. T-Tests for Individual \(\beta\)s

7.1 Concept

7.2 Equivalence to Type III Partial F-Tests

7.3 Demonstrating the Equivalence

8. Tests for Groups of Variables (Chunk Tests)

8.1 Concept

8.2 Example: Do Demographic Variables Add to the Model?

8.3 Example: Do Sleep and Age Together Add Beyond Physical Health?

8.4 Important Note on Type I SS and Chunk Tests

9. The Order of Variable Entry Matters (for Type I)

10. Summary of Hypothesis Testing Approaches

Part 2: In-Class Lab Activity (Pair-Based)

Instructions

Data for the Lab

Grading Rubric

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

Task 2: Type I vs. Type III Sums of Squares (15 points)

Task 3: Partial F-Tests for Individual Variables (20 points)

Round 2: Student B Drives, Student A Navigates

Task 4: T-Tests and the F-Test Equivalence (15 points)

Task 5: Chunk Test — Testing Groups of Variables (20 points)

Task 6: Synthesis and Interpretation (15 points)

4.2 Example: Testing Whether `sleep_hrs` Adds to the Model

4.3 Example: Testing Whether `exercise` Adds to the Model