Part 2: In-Class Lab Activity (Pair-Based)

EPI 553 — Tests of Hypotheses Lab Due: End of class

Instructions

This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:

Driver: Types the code and writes answers in the .Rmd file
Navigator: Reviews the code, checks logic, suggests improvements, and helps interpret results

Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.

Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.

Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

# Load packages and data
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(car)
library(gtsummary)

brfss <- readRDS("C:/Users/suruc/OneDrive/Desktop/R/EPI553_Rclass/brfss_mlr_2020.rds")

Grading Rubric

Task	Description	Points
Task 1	Fitting Models and ANOVA Tables	15
Task 2	Type I vs. Type III Sums of Squares	15
Task 3	Partial F-Tests for Individual Variables	20
Switch roles
Task 4	T-Tests and the F-Test Equivalence	15
Task 5	Chunk Test (Testing Groups of Variables)	20
Task 6	Synthesis and Interpretation	15
Total		100

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

mlr_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
             data = brfss)

tidy(mlr_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

Fitted Regression Equation (rounded)

\[\text{menthlth\_days} = \10.67 + \0.32 \cdot \text{physhlth\_days} - \0.51 \cdot \text{sleep\_hrs} - \0.08 \cdot \text{age} + \varepsilon\]

glance(mlr_full) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 2. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Overall Model Summary
Metric	Value
R²	0.1413
Adjusted R²	0.1408
Root MSE (σ̂)	7.153

Interpretations: R² = 0.141 The predictors collectively explain about 14.1% of the variance in the outcome.

Adjusted R² = 0.141 Adjusting for the number of predictors barely changes the explained variance, suggesting the model is not overfitted.

Root MSE = 7.15 On average, model predictions deviate from observed values by about 7.15 units.

Overall F‑statistic = 273.98, p < 0.001 The model as a whole is statistically significant, meaning the predictors jointly improve prediction compared to an intercept‑only model.

Model degrees of freedom = 3 predictors in the model. Residual df = 4996 (Based on the sample size minus predictors minus one.) Sample size (n) = 5000

1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

My Type I formula is:

\[SSR(\text{full}) = SS(physhlth_days) + SS(sleep_hrs \mid physhlth_days) + SS(age \mid physhlth_days, sleep_hrs)\]

2.2 Computing Type I SS in R

In R, the anova() function on a single model produces Type I (sequential) sums of squares:

anova_type1 <- anova(mlr_full)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 2. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 2. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	576.00	0
sleep_hrs	1	3323.04	3323.04	64.94	0
age	1	9261.47	9261.47	180.99	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares. Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why?

# Type III using car::Anova()
anova_type3 <- Anova(mlr_full, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[1:3], 2)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	15643.37
sleep_hrs	3323.0	30012.26
age	9261.5	2271.81
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

Note: The Type I and Type III SS sums of Squares differ for each predictor. Each predictors SS in type I SS is computed in the order endtered into the model but in Type III SS, each of them is computed adjusting for all other predictors in the model, regardless of order.

Table: Type III SS in R

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 5. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 5. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15643.372	1	305.7057	0
physhlth_days	30012.259	1	586.5051	0
sleep_hrs	2271.809	1	44.3961	0
age	9261.475	1	180.9894	0
Residuals	255652.080	4996	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

mlr_full_reverse <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days,
             data = brfss)

tidy(mlr_full, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Full Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Full Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

Fitted Regression Equation (rounded)

\[\text{menthlth\_days} = \10.67 - \0.08 \cdot \text{age} - \0.51 \cdot \text{sleep\_hrs} - \0.32 \cdot \text{physhlth\_days} + \varepsilon\]

glance(mlr_full_reverse) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") %>%
  mutate(
    Value = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R²",
      "adj.r.squared" = "Adjusted R²",
      "sigma"         = "Root MSE (σ̂)",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df (p)",
      "df.residual"   = "Residual df (n − p − 1)",
      "nobs"          = "n (observations)"
    )
  ) %>%
  kable(caption = "Table 2. Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Overall Model Summary
Metric	Value
R²	0.1413
Adjusted R²	0.1408
Root MSE (σ̂)	7.153

Interpretations: R² = 0.141 The predictors collectively explain about 14.1% of the variance in the outcome.

Adjusted R² = 0.141 Adjusting for the number of predictors barely changes the explained variance, suggesting the model is not overfitted.

Root MSE = 7.15 On average, model predictions deviate from observed values by about 7.15 units.

Overall F‑statistic = 273.98, p < 0.001 The model as a whole is statistically significant, meaning the predictors jointly improve prediction compared to an intercept‑only model.

Model degrees of freedom = 3 predictors in the model. Residual df = 4996 (Based on the sample size minus predictors minus one.) Sample size (n) = 5000

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b. Which values changed and which stayed the same?*

My Type I formula is:

\[SSR(\text{full}) = SS(physhlth_days) + SS(sleep_hrs \mid physhlth_days) + SS(age \mid physhlth_days, sleep_hrs)\]

2.2 Computing Type I SS in R

In R, the anova() function on a single model produces Type I (sequential) sums of squares:

anova_type1 <- anova(mlr_full_reverse)

anova_type1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 2. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Variables are tested in the order they appear in the model formula.")

Table 2. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
age	1	7249.10	7249.10	141.66	0
sleep_hrs	1	4797.91	4797.91	93.76	0
physhlth_days	1	30012.26	30012.26	586.51	0
Residuals	4996	255652.08	51.17	NA	NA
Note:
Variables are tested in the order they appear in the model formula.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model. Did the Type III SS change compared to Task 1c? Explain why or why not.

# Type III using car::Anova()
anova_type3 <- Anova(mlr_full_reverse, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_type1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[1:3], 2)
)

comparison %>%
  kable(caption = "Table 4. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.")

Table 4. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	7249.1	15643.37
sleep_hrs	4797.9	9261.47
age	30012.3	2271.81
Note:
Type I SS depends on variable entry order; Type III SS does not. The last variable (exercise) has identical values.

Note: The Type I SS did change when order was reversed but Type III SS sums of Squares did not. Because each predictors SS in type I SS is computed in the order entered into the model but in Type III SS, each of them is computed adjusting for all other predictors in the model, regardless of order.

Table: Type III SS in R

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 5. Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")

Table 5. Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	15643.372	1	305.7057	0
age	9261.475	1	180.9894	0
sleep_hrs	2271.809	1	44.3961	0
physhlth_days	30012.259	1	586.5051	0
Residuals	255652.080	4996	NA	NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.

An epidemiologist should prefer Type III SS when predictors are correlated and the research question is about each variable’s contribution after adjusting for all others. This is especially important in observational public‑health research where covariates are correlated, rather than independent. For example, in a study examining whether sleep duration predicts depressive symptoms while also adjusting for age and chronic disease burden, Type III SS ensures the effect of sleep is evaluated independent of those correlated health factors, giving a cleaner estimate of its association.

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

We want to test whether a single variable $X^*$ age contributes to the prediction of $Y$ mental health days given that $X_1, \ldots, X_p$ are already in the model.

\[H_0: \beta^* = 0\quad\text{($X^*$ does not add to the model given the other variables)}\]

\[H_1: \beta^* \neq 0\]

The partial F-test compares a full model (containing $X^*$) to a reduced model (excluding $X^*$):

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})} = \frac{SS(X^* \mid X_1, \ldots, X_p) / 1}{MSE(\text{full})}\]

Under $H_0$, $F \sim F_{1, \, n-p-2}$.

Fitting a reduced model (without age)
Fitting the full model (with age)

Let’s test: does age contribute to predicting mental health days, given that all other predictors are in the model?

# Reduced model (excludes age)
m_no_age <- lm(menthlth_days ~ physhlth_days + income_cat + sex + exercise + sleep_hrs,
                 data = brfss)

# Full model (includes sleep_hrs)
m_full <- lm(menthlth_days ~ physhlth_days + age + income_cat + sex + exercise + sleep_hrs,
             data = brfss)

# Compare using anova() — this performs the partial F-test
anova(m_no_age, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no age)", "Full (with age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 6. Partial F-Test: Does age add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Partial F-Test: Does age add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no age)	4994	260666.7	NA	NA	NA	NA
Full (with age)	4993	250992.8	1	9673.939	192.4437	0

Using anova(reduced, full) to compare them

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at $\alpha = 0.05$.

Null hypothesis: At the population level, age does not improve the model fit beyond physical health days and sleep hours (i.e., the coefficient for age is 0 given the other predictors).

Test statistic and p-value: F=192.44, p<0.001.

Because p<0.05, we reject the null hypothesis and conclude that age significantly improves the model and should be retained as a predictor.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$ from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

Compare to the critical value $F_{1, n-p-1, 0.95}$. Does your manual calculation agree with the anova() comparison from 3a?

From table I table.. of task 1b, we have $SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})$ =9261.47 and MSE from the full model (same ANOVA table): 51.17

Manual F‑statistic:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

F= 180.99

Since F ~ 180.99, we reject H_0; age adds significantly to the model, which agrees with the anova() comparison (both tests conclude that age should be retained).

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

We can also conduct t-tests for individual regression coefficients. The test for $\beta_j$ is:

\[H_0: \beta_j = 0 \qquad H_1: \beta_j \neq 0\]

\[t = \frac{\hat{\beta}_j}{se(\hat{\beta}_j)} \sim t_{n-p-2} \quad \text{under } H_0\]

We reject $H_0$ if $|t| > t_{n-p-2, 1-\alpha/2}$.

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

# Get t-statistics from summary()
t_stats <- tidy(m_full) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

# Get F-statistics from Type III
f_stats <- Anova(m_full, type = "III") %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

# Compare
left_join(t_stats, f_stats, by = "term") %>%
  mutate(
    `t²` = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, t_stat = t_stat, `t²`, `F (Type III)` = f_stat, `t² = F?`,
         `p (t-test)` = t_pvalue, `p (F-test)` = f_pvalue, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4)) %>%
  kable(caption = "Table 9. Equivalence of T-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 9. Equivalence of T-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p (t-test)	p (F-test)	p-values equal?
physhlth_days	21.4779	461.3012	461.3012	✓	0.000000	0	✓
age	-13.8724	192.4437	192.4437	✓	0.000000	0	✓
income_cat	-6.1778	38.1653	38.1653	✓	0.000000	0	✓
sexFemale	6.1535	37.8654	NA	NA	0.000000	NA	NA
exerciseYes	-1.3537	1.8326	NA	NA	0.175879	NA	NA
sleep_hrs	-6.7574	45.6629	45.6629	✓	0.000000	0	✓

4b. (5 pts) For each predictor, compute $t^2$ and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, $t^2$, the Type III F-statistic, and both p-values. Are they equivalent?

The table shows that t^2 and t2 exactly equals the Type III F-statistic for each predictor except for sex and exercise since they were not included in task 1c. This confirms the mathematical relationship between the t-test and F-test for testing a single coefficient.

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true?

The t-test and the Type III partial F-test give the same result because for a single coefficient, they are really doing the same comparison: they both test whether that coefficient is zero using the same numerator (the estimated effect) and the same denominator (the model’s MSE), just shown differently. Mathematically, when you test one parameter, the F-distribution with 1 numerator degree of freedom is exactly the square of a t-distribution with the same residual degrees of freedom. Because of this relationship, t test and the Type III partial f-test, and their p-values are identical.

Task 5: Chunk Test — Testing Groups of Variables (20 points)

Sometimes we want to test whether a group of variables collectively adds to the model. This is called a chunk test or multiple partial F-test.

\[H_0: \beta_{k+1} = \beta_{k+2} = \cdots = \beta_p = 0 \quad \text{(the group adds nothing)}\]

The test statistic compares a full model to a reduced model:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

\[F \sim F_{(p_{\text{full}} - p_{\text{reduced}}), \, (n - p_{\text{full}} - 1)} \quad \text{under } H_0\]

5a. (10 pts) Now consider the full 6-predictor model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]

Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

So, we want to test whether adding income_cat, sex, and exercise as a group improves prediction of mental health days, beyond age, physhlth_days and sleep_hrs alone.

\[H_0: \beta_{\text{income}} = \beta_{\text{sex}} = \beta_{\text{exercise}} = 0\] \[H_1 : {\text{At least one of} βincome_cat, βsex, or βexercise = 0\]

# Reduced model: only health behaviors
m_health_only <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss)

# Full model: health behaviors + demographics
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss)
             
# Chunk test
anova(m_health_only, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep+ age only)", "Full (+ demographics)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 10. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 10. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep+ age only)	4996	255652.1	NA	NA	NA	NA
Full (+ demographics)	4993	250992.8	3	4659.276	30.8957	0

Test statistic From the chunk test table: - Reduced model RSS: 255,652.1 - Full model RSS: 250,992.8 - Difference (Sum of Sq): 4,659.276 - df: 3 - F‑statistic: 30.8957 - p‑value: < 0.001

Because the F‑statistic is large (F = 30.90) and the p‑value is effectively zero (p < 0.001), we reject the null hypothesis. Income category, sex, and exercise collectively provide significant additional explanatory power for mentally unhealthy days beyond what is explained by physical health days, sleep hours, and age.

5b. (5 pts) Compute the chunk test F-statistic manually using:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

Show all intermediate values. Does your manual computation match the anova() result?

Manual computation:

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:6])
ssr_reduced <- sum(anova(m_health_only)$`Sum Sq`[1:2])
mse_full    <- anova(m_full)$`Mean Sq`[7]
df_diff     <- 4  # 4 additional variables

F_chunk <- ((ssr_full - ssr_reduced) / df_diff) / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("Difference:", round(ssr_full - ssr_reduced, 2), "\n")

## Difference: 13920.75

cat("df (number of added variables):", df_diff, "\n")

## df (number of added variables): 4

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk, 4), "\n")

## F-statistic: 69.2314

cat("Critical value F(4, 4993, 0.95):", round(qf(0.95, df_diff, 4993), 4), "\n")

## Critical value F(4, 4993, 0.95): 2.3737

cat("p-value:", format.pval(pf(F_chunk, df_diff, 4993, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

From manual calculation: - SSR(full) = 46718.54 - SSR(reduced) = 32797.79 - Difference = 46718.54-32797.79=13920.75 - df (added variables) = 4 - MSE(full) = 50.27 - F-statistic:69.23 - p-value <2.22e-16

The mannual results also rejects HO. But the numerical values don’t match because we are not exactly using the same pair of models. In non-manual, we are using reduced: physhlth_days + sleep_hrs + age and in full: + demographics (3 df), its a different chunk test.

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology.

Even if exercise is not individually significant in a Type III test, the group (income, sex, exercise) can still be significant because each variable may contribute a small, non‑significant effect on its own, but together they explain a meaningful amount of additional variation.For model building in epidemiology, this means you shouldn’t drop a variable solely because its individual p‑value is >0.05 if it belongs to a conceptually important predictor.

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at $\alpha = 0.05$? List them and briefly state the direction of each association (positive or negative).

Based on the full model at $\alpha = 0.05$, the statistically significant predictors are:

physhlth_days: It shows significant positive association (p <0.001) and it means more physically unhealthy days are linked to more mentally unhealthy days.
sleep_hrs: It shows significant negative association (p<0.001) and it means more hours of sleep are linked to fewer mentally unhealthy days.
age: It shows significant negative association (p <0.001) and it means older age is associated with fewer mentally unhealthy days (compared to younger adults).
income_cat: It shows significant negative association (p<0.001) and it means higher income categories are associated with fewer mentally unhealthy days (vs. lower income).
sex (Female vs. Male): It shows significant positive association (p <0.001) and it means females report more mentally unhealthy days than males. However, exercise shows insignificant negative association and it means those who exercise tend to have fewer mentally unhealthy days.

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale.

We should not automatically drop exercise just because it is not individually significant. The chunk test showed that the demographic block (including exercise) collectively improves model fit, and exercise is important as a health behavior that may confound or modify other associations. In epidemiologic modeling, we should keep conceptually important variables, especially behaviors like exercise, even when their individual p-value is >0.05, to preserve a more realistic representation of the underlying causal structure.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.”

In this analysis, people who reported more days of poor physical health also tended to report more days of poor mental health. Fewer hours of sleep, being younger, having lower income, and being female were each independently associated to having more mentally unhealthy days. Regular exercise showed a small tendency toward fewer mentally unhealthy days, but this pattern was not strong enough to be clearly distinguished from random variation in this sample. Overall, the results highlight physical health, sleep, age, income, and sex as key factors related to mental health burden in this population.

Task 7: Reflection (Not Graded, strongly recommended)

In 2–3 sentences, reflect on the process of working with a partner in this lab. What did you find helpful about the Driver-Navigator approach? How did it affect your understanding of the material?

Working with a partner in the Driver–Navigator setup could have been helpful for talking through code and interpretations out loud, but in practice the limited time and technical issues made it hard to get into a real flow. I was paired with someone I didn’t know, and we spent a lot of time just getting oriented and troubleshooting R in his laptop, so we didn’t even finish the early tasks in class. Even so, the structure made me notice where I felt confident versus where I hesitated and if I had more time, I think it would have deepened my understanding a lot more in class.

End of Lab Activity

Tests of Hypotheses in Multiple Linear Regression

EPI 553 — Principles of Statistical Inference II (Spring 2026)

Suruchi Shahi

March 14, 2026

Part 2: In-Class Lab Activity (Pair-Based)

Instructions

Data for the Lab

Grading Rubric

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

2.2 Computing Type I SS in R

Table: Type III SS in R

Task 2: Type I vs. Type III Sums of Squares (15 points)

2.2 Computing Type I SS in R

Table: Type III SS in R

Task 3: Partial F-Tests for Individual Variables (20 points)

Round 2: Student B Drives, Student A Navigates

Task 4: T-Tests and the F-Test Equivalence (15 points)

Task 5: Chunk Test — Testing Groups of Variables (20 points)

Task 6: Synthesis and Interpretation (15 points)

Task 7: Reflection (Not Graded, strongly recommended)