Tests of Hypotheses in Multiple Linear Regression

Instructions

This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:

Driver: Types the code and writes answers in the .Rmd file
Navigator: Reviews the code, checks logic, suggests improvements, and helps interpret results

Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.

Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.

Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

# Load packages and data
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(car)
library(gtsummary)

brfss_mlr <- readRDS("data/brfss_mlr_2020.rds")

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
             data = brfss_mlr)

tidy(m_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

\[\text{menthlth_days} = \beta_0 + 0.32 \cdot \text{physhlth_days} + -0.51 \cdot \text{sleep_hrs} + -0.08 \cdot \text{age} + \varepsilon\] 1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

anova_type1 <- anova(m_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 3. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 3. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	576.00	0
sleep_hrs	1	3323.04	3323.04	64.94	0
age	1	9261.47	9261.47	180.99	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

type1_ss <- anova_type1$`Sum Sq`
ssr_full <- sum(type1_ss[1:3])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[4]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 42059.26

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 255652.1

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1413

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares. Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why?

anova_type3 <- Anova(m_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:4], 1)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	30012.3
sleep_hrs	3323.0	2271.8
age	9261.5	9261.5
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

The age variable has the same SS for both type I and type III ANOVA tables because age is the final covariate in the model, meaning that order does not have an impact.

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b. Which values changed and which stayed the same?

m_full_reverse <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days,
             data = brfss_mlr)

anova_type1_reverse <- anova(m_full_reverse)

anova_type1_reverse %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 5. Reverse Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 5. Reverse Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
age	1	7249.10	7249.10	141.66	0
sleep_hrs	1	4797.91	4797.91	93.76	0
physhlth_days	1	30012.26	30012.26	586.51	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

type1_ss <- anova_type1_reverse$`Sum Sq`
ssr_full <- sum(type1_ss[1:3])  # Model SS (sum of all predictor Type I SS)
sse_full <- type1_ss[4]          # Residual SS

cat("SSR (Model):", round(ssr_full, 2), "\n")

## SSR (Model): 42059.26

cat("SSE (Residual):", round(sse_full, 2), "\n")

## SSE (Residual): 255652.1

cat("SSY (Total):", round(ssr_full + sse_full, 2), "\n")

## SSY (Total): 297711.3

cat("R² = SSR/SSY =", round(ssr_full / (ssr_full + sse_full), 4), "\n")

## R² = SSR/SSY = 0.1413

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type I SS Reverse` = round(anova_type1_reverse$`Sum Sq`[3:1], 1)
)

comparison %>%
  kable(caption = "Table 6. Type I vs. Type I Reverse Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Type I vs. Type I Reverse Sums of Squares
Variable	Type I SS	Type I SS Reverse
physhlth_days	29474.7	30012.3
sleep_hrs	3323.0	4797.9
age	9261.5	7249.1

Since the order of covariates changed, all three variables have different SS between the two models. The residual SS did not change between models nor did the model SS, therefore, the R2 remains the same.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model. Did the Type III SS change compared to Task 1c? Explain why or why not.

anova_type3_reverse <- Anova(m_full_reverse, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("age", "sleep_hrs", "physhlth_days"),
  `Type I SS` = round(anova_type3_reverse$`Sum Sq`[2:4], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[4:2], 1)
)

comparison %>%
  kable(caption = "Table 7. Type III vs. Type III Reverse Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 7. Type III vs. Type III Reverse Sums of Squares
Variable	Type I SS	Type III SS
age	9261.5	9261.5
sleep_hrs	2271.8	2271.8
physhlth_days	30012.3	30012.3

The type III did not change for any variables because order does not matter.

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.

Type III SS is preferable for epidemiologists because in public health we often do not know the order of health factors. While controlled studies or longitudinal studies can determine the timeline of events, epidemiologists conducting cross-sectional studies do not know the direction of association between covariates.

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

Fitting a reduced model (without age)
Fitting the full model (with age)
Using anova(reduced, full) to compare them

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at \(\alpha = 0.05\).

m_no_age <- lm(menthlth_days ~ physhlth_days + sleep_hrs,
             data = brfss_mlr)

anova(m_no_age, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no age)", "Full (with age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 6. Partial F-Test: Does sleep_hrs add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Partial F-Test: Does sleep_hrs add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no age)	4997	264913.6	NA	NA	NA	NA
Full (with age)	4996	255652.1	1	9261.475	180.9894	0

The null hypothesis is that age does not add statistically significant information to the prediction of mental health days, given that physical health and sleep are already in the model. The F statistic is 180.99 and the p-value < 0.05 so we reject 𝐻0 and conclude that age adds statistically significant information to the prediction of mental health days given that physical health and sleep in the mode.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify \(SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})\) from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

Compare to the critical value \(F_{1, n-p-1, 0.95}\). Does your manual calculation agree with the anova() comparison from 3a?

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:3])
ssr_reduced <- sum(anova(m_no_age)$`Sum Sq`[1:2])
mse_full    <- anova(m_full)$`Mean Sq`[4]  # MSE of full model

F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 42059.26

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("SS(sleep | others):", round(ssr_full - ssr_reduced, 2), "\n")

## SS(sleep | others): 9261.47

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 51.17

cat("F-statistic:", round(F_stat, 4), "\n")

## F-statistic: 180.9894

cat("Critical value F(1, 4996, 0.95):", round(qf(0.95, 1, 4996), 4), "\n")

## Critical value F(1, 4996, 0.95): 3.8433

cat("p-value:", format.pval(pf(F_stat, 1, 4996, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Yes, my manual calculation shows the same F-statistic as the anova comparison and both have significant p-values.

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

4b. (5 pts) For each predictor, compute \(t^2\) and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, \(t^2\), the Type III F-statistic, and both p-values. Are they equivalent?

# Get F-statistics from Type III
f_stats <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Compare
left_join(t_stats, f_stats, by = "term") %>%
  mutate(
    `t²` = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, t_stat = t_stat, `t²`, `F (Type III)` = f_stat, `t² = F?`,
         `p (t-test)` = t_pvalue, `p (F-test)` = f_pvalue, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4)) %>%
  kable(caption = "Table 9. Equivalence of T-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 9. Equivalence of T-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p-values equal?
physhlth_days	24.2179	586.5051	586.5051	✓	✓
sleep_hrs	-6.6630	44.3961	44.3961	✓	✓
age	-13.4532	180.9894	180.9894	✓	✓

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true? The t-test equals the type III F-test because both are testing the same hypothesis. —

Task 5: Chunk Test — Testing Groups of Variables (20 points)

5a. (10 pts) Now consider the full 6-predictor model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]

Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

m_health_only <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr)

m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

anova(m_health_only, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep and age)", "Full (+ demographics)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 10. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 10. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep and age)	4996	255652.1	NA	NA	NA	NA
Full (+ demographics)	4993	250992.8	3	4659.276	30.8957	0

The null hypothesis is that income, sex, and exercise do not collectively add statistically significant information to the model, given that physical health, sleep, and age are already in the model (𝐻0:𝛽income=𝛽sex=𝛽exercise=0). The group of demographic variables (income, sex, exercise) collectively adds statistically significant information to the model.

5b. (5 pts) Compute the chunk test F-statistic manually using:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

Show all intermediate values. Does your manual computation match the anova() result?

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_health_only)$`Sum Sq`[1:3])
mse_full    <- anova(m_full)$`Mean Sq`[7]
df_diff     <- 3

F_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 42059.26

cat("Difference:", round(ssr_full - ssr_reduced, 2), "\n")

## Difference: 4659.28

cat("df (number of added variables):", df_diff, "\n")

## df (number of added variables): 3

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk, 4), "\n")

## F-statistic: 30.8957

cat("Critical value F(4, 4993, 0.95):", round(qf(0.95, df_diff, 4993), 4), "\n")

## Critical value F(4, 4993, 0.95): 2.6067

cat("p-value:", format.pval(pf(F_chunk, df_diff, 4993, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Yes, the results are the same for the manual computation, as evidence by identical f-statistics and SS.

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology. Even though exercise wasn’t individually significant, this does not mean that it does not have an impact when combined with a group of variables. The model shows that this chunk is still important in describing mental health. —

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at \(\alpha = 0.05\)? List them and briefly state the direction of each association (positive or negative). Physical health: p<0.05 positive Sleep: p<0.05 negative Age: p<0.05 negative Income: p<0.05 negative Sex: p<0.05 positive

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale. My decision to drop exercise from my model would depend on the literature and study aims. If exercise is generally used in models studying mental health and my plan was to include exercise, I would argue that exercise should be kept so as to avoid potential confounding. It is unwise to change research plans solely based on p-value changes.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.” The test showed that many variables are related to mental health wellness. The model showed that physical health is most associated with poor mental health days, but also sex, sleep, age, and income were also associated. This is important to understand which groups may be at higher risk for developing mental health disorders, and to show which health behaviors have protective functions for mental health.

Task 7: Reflection (Not Graded, strongly recommended)

In 2–3 sentences, reflect on the process of working with a partner in this lab. What did you find helpful about the Driver-Navigator approach? How did it affect your understanding of the material? I enjoyed working with a partner, although we didn’t have enough time to complete the lab together. It was nice being able to speak to other classmates. —

End of Lab Activity