Part 2: In-Class Lab Activity (Pair-Based)

EPI 553 — Tests of Hypotheses Lab Due: End of class

Instructions

This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:

Driver: Types the code and writes answers in the .Rmd file
Navigator: Reviews the code, checks logic, suggests improvements, and helps interpret results

Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.

Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.

Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.

Data for the Lab

Use the same BRFSS 2020 dataset from the guided practice.

Grading Rubric

Task	Description	Points
Task 1	Fitting Models and ANOVA Tables	15
Task 2	Type I vs. Type III Sums of Squares	15
Task 3	Partial F-Tests for Individual Variables	20
Switch roles
Task 4	T-Tests and the F-Test Equivalence	15
Task 5	Chunk Test (Testing Groups of Variables)	20
Task 6	Synthesis and Interpretation	15
Total		100

Round 1: Student A Drives, Student B Navigates

Task 1: Fitting Models and ANOVA Tables (15 points)

1a. (5 pts) Fit the following model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \varepsilon\]

Use tidy() with conf.int = TRUE to display the coefficients. Report the fitted equation with rounded coefficients.

model1 <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr) 

tidy(model1, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Table 1. Model Coefficients",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 1. Model Coefficients
Term	Estimate	Std. Error	t-statistic	95% CI Lower	95% CI Upper
(Intercept)	10.6684	0.6102	17.4844	9.4722	11.8646
physhlth_days	0.3182	0.0131	24.2179	0.2924	0.3439
sleep_hrs	-0.5063	0.0760	-6.6630	-0.6553	-0.3573
age	-0.0800	0.0059	-13.4532	-0.0916	-0.0683

\[\text{menthlth\_days} = 10.6684 + 0.3182 + -0.5063 +-0.0800 + \varepsilon\] 1b. (5 pts) Use anova() on this model to obtain the Type I (sequential) sums of squares. Present the ANOVA table and verify that the sum of all predictor Type I SS equals the model SSR. Show this calculation explicitly.

anova_model1 <- anova(model1)

anova_model1 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 2. Type I (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 2. Type I (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
physhlth_days	1	29474.75	29474.75	576.00	0
sleep_hrs	1	3323.04	3323.04	64.94	0
age	1	9261.47	9261.47	180.99	0
Residuals	4996	255652.08	51.17	NA	NA

type1_ss_model1 <- anova_model1$`Sum Sq`
ssr_model1 <- sum(type1_ss_model1[1:3])  
sse_model1 <- type1_ss_model1[4]     
cat("SSR (Model):", round(ssr_model1, 2), "\n")

## SSR (Model): 42059.26

cat("SSE (Residual):", round(sse_model1, 2), "\n")

## SSE (Residual): 255652.1

cat("SSY (Total):", round(ssr_model1 + sse_model1, 2), "\n")

## SSY (Total): 297711.3

1c. (5 pts) Use car::Anova() with type = "III" on the same model to obtain the Type III sums of squares. Compare the Type I and Type III SS for each variable. Which variable’s SS is the same in both tables? Why?

# Type III using car::Anova()
anova_type3 <- Anova(model1, type = "III")

# Side-by-side comparison
comparison <- tibble(
  Variable = c("physhlth_days", "sleep_hrs", "age"),
  `Type I SS` = round(anova_model1$`Sum Sq`[1:3], 1),
  `Type III SS` = round(anova_type3$`Sum Sq`[2:4], 1)
)

comparison %>%
  kable(caption = "Table 3. Type I vs. Type III Sums of Squares") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3. Type I vs. Type III Sums of Squares
Variable	Type I SS	Type III SS
physhlth_days	29474.7	30012.3
sleep_hrs	3323.0	2271.8
age	9261.5	9261.5

Age has the same SS for both Type I and Type III. Type I SS depends on variable order, whereas Type III SS evaluates the last predictor after adjusting for all other predictors. Therefore the SS for the last variable will be the same for both types.

Task 2: Type I vs. Type III Sums of Squares (15 points)

2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]

Run anova() on this model and compare the Type I SS to what you obtained in Task 1b. Which values changed and which stayed the same?

model2 <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days, data = brfss_mlr)

anova_model2 <- anova(model2)

anova_model2 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 2))) %>%
  kable(
    caption = "Table 4. Type I Model 2 (Sequential) Sums of Squares — anova()",
    col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 4. Type I Model 2 (Sequential) Sums of Squares — anova()
Source	df	Sum of Sq	Mean Sq	F value	p-value
age	1	7249.10	7249.10	141.66	0
sleep_hrs	1	4797.91	4797.91	93.76	0
physhlth_days	1	30012.26	30012.26	586.51	0
Residuals	4996	255652.08	51.17	NA	NA

All Type I SS values changed when we reversed the order of the predictors. Type I is sequential, each predictor is now entered at a different stage than in task 1b, and adjusted for different predictors.

2b. (5 pts) Run car::Anova(type = "III") on this reordered model. Did the Type III SS change compared to Task 1c? Explain why or why not.

anova_type3_model2 <- Anova(model2, type = "III")
anova_type3_model2

## Anova Table (Type III tests)
## 
## Response: menthlth_days
##               Sum Sq   Df F value    Pr(>F)    
## (Intercept)    15643    1 305.706 < 2.2e-16 ***
## age             9261    1 180.989 < 2.2e-16 ***
## sleep_hrs       2272    1  44.396 2.972e-11 ***
## physhlth_days  30012    1 586.505 < 2.2e-16 ***
## Residuals     255652 4996                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

anova_type3

## Anova Table (Type III tests)
## 
## Response: menthlth_days
##               Sum Sq   Df F value    Pr(>F)    
## (Intercept)    15643    1 305.706 < 2.2e-16 ***
## physhlth_days  30012    1 586.505 < 2.2e-16 ***
## sleep_hrs       2272    1  44.396 2.972e-11 ***
## age             9261    1 180.989 < 2.2e-16 ***
## Residuals     255652 4996                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The Type III SS values stayed the same as in Task 1c. The predictors are in a different order but values are identical. Type II evaluates predictors after adjustign for all other predictors, unlike Type I where order matters.

2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.

An epidemiologist should prefer Type III SS over Type I SS when trying to understand the contribution of a predictor after adjustment for all other predictors. An example of this would be assessing whether race/ethnicity remains associated with hypertension after adjusting for age, income, and access to care.

Task 3: Partial F-Tests for Individual Variables (20 points)

3a. (10 pts) Conduct a partial F-test to determine whether age adds significantly to the prediction of mental health days, given that physhlth_days and sleep_hrs are already in the model. Do this by:

Fitting a reduced model (without age)
Fitting the full model (with age)
Using anova(reduced, full) to compare them

State the null hypothesis, report the F-statistic and p-value, and state your conclusion at \(\alpha = 0.05\).

#fitting reduced model
m_no_age <- lm(menthlth_days ~ physhlth_days + sleep_hrs,
                 data = brfss_mlr)
#Fit full model
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
                 data = brfss_mlr)
#using anova to compare
anova(m_no_age, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no age)", "Full (with age)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 6. Partial F-Test: Does age add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 6. Partial F-Test: Does age add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (no age)	4997	264913.6	NA	NA	NA	NA
Full (with age)	4996	255652.1	1	9261.475	180.9894	0

The null hypothesis is age does not add significance to the model after physical health days and hours of sleep are already in the model. The F-statistic is 180.9894. Since the p-value is <0.05, we reject the null hypothesis and conclude that age adds statistically significant information to the prediction of mental health days, given that physical health days and hours of sleep are already in the model.

3b. (10 pts) Now verify your result from 3a manually. Using the anova() output from the full model (Task 1b), identify \(SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs})\) from the Type I table. Compute the F-statistic as:

\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]

Compare to the critical value \(F_{1, n-p-1, 0.95}\). Does your manual calculation agree with the anova() comparison from 3a?

ssr_full    <- sum(anova(m_full)$`Sum Sq`[1:3])
ssr_reduced <- sum(anova(m_no_age)$`Sum Sq`[1:2])
mse_full    <- anova(m_full)$`Mean Sq`[4]  

F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full

cat("SSR(full):", round(ssr_full, 2), "\n")

## SSR(full): 42059.26

cat("SSR(reduced):", round(ssr_reduced, 2), "\n")

## SSR(reduced): 32797.79

cat("SS(age | others):", round(ssr_full - ssr_reduced, 2), "\n")

## SS(age | others): 9261.47

cat("MSE(full):", round(mse_full, 2), "\n")

## MSE(full): 51.17

cat("F-statistic:", round(F_stat, 4), "\n")

## F-statistic: 180.9894

cat("Critical value F(1, 4996, 0.95):", round(qf(0.95, 1, 4996), 4), "\n")

## Critical value F(1, 4996, 0.95): 3.8433

cat("p-value:", format.pval(pf(F_stat, 1, 4996, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

The manual calculations match the partial F-test from 3a. The observed F-value (180.99) far exceeds the critical value (3.84), so we reach the same conclusion as in part 3a and reject the null hypothesis that age does not improve prediction of mentally unhealthy days.

Round 2: Student B Drives, Student A Navigates

⟳ Switch roles now!

Task 4: T-Tests and the F-Test Equivalence (15 points)

4a. (5 pts) Using the full model from Task 1 (menthlth_days ~ physhlth_days + sleep_hrs + age), run summary() and extract the t-statistics and p-values for each coefficient.

t_statistic <- tidy(model1) %>%
  filter(term != "(Intercept)") %>%
  select(term, t_stat = statistic, t_pvalue = p.value)

t_statistic

## # A tibble: 3 × 3
##   term          t_stat  t_pvalue
##   <chr>          <dbl>     <dbl>
## 1 physhlth_days  24.2  1.32e-122
## 2 sleep_hrs      -6.66 2.97e- 11
## 3 age           -13.5  1.49e- 40

4b. (5 pts) For each predictor, compute \(t^2\) and compare it to the Type III F-statistic from Task 1c. Create a table showing the t-statistic, \(t^2\), the Type III F-statistic, and both p-values. Are they equivalent?

f_statistic <- anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("term") %>%
  filter(!term %in% c("(Intercept)", "Residuals")) %>%
  select(term, f_stat = `F value`, f_pvalue = `Pr(>F)`)

left_join(t_statistic, f_statistic, by = "term") %>%
  mutate(
    `t²` = round(t_stat^2, 4),
    f_stat = round(f_stat, 4),
    `t² = F?` = ifelse(abs(t_stat^2 - f_stat) < 0.001, "✓", "✗"),
    t_pvalue = round(t_pvalue, 6),
    f_pvalue = round(f_pvalue, 6),
    `p-values equal?` = ifelse(abs(t_pvalue - f_pvalue) < 0.0001, "✓", "✗")
  ) %>%
  select(term, t_stat = t_stat, `t²`, `F (Type III)` = f_stat, `t² = F?`,
         `p (t-test)` = t_pvalue, `p (F-test)` = f_pvalue, `p-values equal?`) %>%
  mutate(t_stat = round(t_stat, 4)) %>%
  kable(caption = "Table 7. Equivalence of T-Tests and Type III Partial F-Tests") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 7. Equivalence of T-Tests and Type III Partial F-Tests
term	t_stat	t²	F (Type III)	t² = F?	p-values equal?
physhlth_days	24.2179	586.5051	586.5051	✓	✓
sleep_hrs	-6.6630	44.3961	44.3961	✓	✓
age	-13.4532	180.9894	180.9894	✓	✓

Yes, the t‑tests and Type III partial F‑tests are equivalent.

4c. (5 pts) In your own words, explain why the t-test and the Type III partial F-test give the same result. What is the fundamental relationship between the t-distribution and the F-distribution that makes this true?

The two tests give the same result because they are testing the same hypothesis, with the same adjustments for all predictors. the F-distribution is equal to the t-distribution making this true.

Task 5: Chunk Test — Testing Groups of Variables (20 points)

5a. (10 pts) Now consider the full 6-predictor model:

\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]

Test whether income_cat, sex, and exercise — as a group — significantly add to the prediction of mental health days, given that physhlth_days, sleep_hrs, and age are already in the model.

State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.

m_reduced <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr)

full_m <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
             data = brfss_mlr)

anova(m_reduced, full_m) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (physhlth + sleep + age)", "Full (+ demographics)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Table 8. Chunk Test: Do demographic variables collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 8. Chunk Test: Do demographic variables collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (physhlth + sleep + age)	4996	255652.1	NA	NA	NA	NA
Full (+ demographics)	4993	250992.8	3	4659.276	30.8957	0

Null Hypothesis: H0: Bincome = Bsex = Bexercise = 0 HO: Income category, sex, and exercise do not improve prediction of mentally unhealthy days once physical health days, sleep hours, and age are already in the model

Conclusion: The F-statistic is 30.8957 and with a p-value <0.05, we reject the null hypothesis and conclude that income, sex, and exercise improve the prediction of mentally unhealthy days once physical healthry days, hours of sleep, and age are already in the model.

5b. (5 pts) Compute the chunk test F-statistic manually using:

\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]

Show all intermediate values. Does your manual computation match the anova() result?

ssr_full5b    <- sum(anova(full_m)$`Sum Sq`[1:6])
ssr_reduced5b <- sum(anova(m_reduced)$`Sum Sq`[1:3])
mse_full5b   <- anova(full_m)$`Mean Sq`[7]
df_diff5b     <- 3

F_chunk5b <- ((ssr_full5b - ssr_reduced5b) / df_diff5b) / mse_full5b

cat("SSR(full):", round(ssr_full5b, 2), "\n")

## SSR(full): 46718.54

cat("SSR(reduced):", round(ssr_reduced5b, 2), "\n")

## SSR(reduced): 42059.26

cat("Difference:", round(ssr_full5b - ssr_reduced5b, 2), "\n")

## Difference: 4659.28

cat("df (number of added variables):", df_diff5b, "\n")

## df (number of added variables): 3

cat("MSE(full):", round(mse_full5b, 2), "\n")

## MSE(full): 50.27

cat("F-statistic:", round(F_chunk5b, 4), "\n")

## F-statistic: 30.8957

cat("Critical value F(3, 4993, 0.95):", round(qf(0.95, df_diff5b, 4993), 4), "\n")

## Critical value F(3, 4993, 0.95): 2.6067

cat("p-value:", format.pval(pf(F_chunk5b, df_diff5b, 4993, lower.tail = FALSE)), "\n")

## p-value: < 2.22e-16

Yes, the manual computation matches the anova() result.

5c. (5 pts) Note that exercise was not individually significant in the Type III table, yet it is part of a group that is collectively significant. In 2–3 sentences, explain how this is possible and what it means for model building in epidemiology.

Even though exercise was not individually significant in the Type III table, it can still contribute meaningful variance when combined with income and sex. Group tests evaluate the joint contribution of several predictors, so a variable that is weak on its own may still matter as part of a multivariable construct. In epidemiological model building, this reminds us that decisions shouldn’t rely solely on individual p‑values.

Task 6: Synthesis and Interpretation (15 points)

6a. (5 pts) Based on the full model, which predictors are statistically significant at \(\alpha = 0.05\)? List them and briefly state the direction of each association (positive or negative).

summary(full_m)

## 
## Call:
## lm(formula = menthlth_days ~ physhlth_days + sleep_hrs + age + 
##     income_cat + sex + exercise, data = brfss_mlr)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.9192  -3.4262  -1.7803   0.2948  30.0568 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   12.475489   0.716959  17.401  < 2e-16 ***
## physhlth_days  0.291657   0.013579  21.478  < 2e-16 ***
## sleep_hrs     -0.509160   0.075348  -6.757 1.57e-11 ***
## age           -0.082307   0.005933 -13.872  < 2e-16 ***
## income_cat    -0.321323   0.052012  -6.178 7.02e-10 ***
## sexFemale      1.245053   0.202333   6.153 8.17e-10 ***
## exerciseYes   -0.342685   0.253138  -1.354    0.176    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.09 on 4993 degrees of freedom
## Multiple R-squared:  0.1569, Adjusted R-squared:  0.1559 
## F-statistic: 154.9 on 6 and 4993 DF,  p-value: < 2.2e-16

The predictors that are statistically significant are: physhlth_days: This is a positive association where more physically unhealthy days are association with more mentally unhealhty days.

sleep_hrs: Sleep_hrs has a negative association on mentally unhealthy days. The more hours of sleep the less mentally unhelathy days.

age: Age has a negative association. Older adults report fewer mentally unhealthy days. income_cat: This is a negative association. Higher income individuals report lower number of mentally unhealthy days.

sex: This is a positive association. Women report a higher number of mentally unhealty days.

6b. (5 pts) A colleague argues: “We should drop exercise from the model because it’s not significant.” Do you agree? Write a 2–3 sentence response explaining your reasoning. Consider the chunk test results and epidemiologic rationale.

I don’t think we should drop exercise because it’s not significant. The significance value is individual. The chunk test showed that exercise contributes meaningfully when considered together with income and sex.

6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.”

Adults reported more mentally unhealthy days when they were also experiencing physically unhealthy days, fewer hours of sleep, were younger, had lower incomes. These patterns were still prevalent after accounting for all other factors in the model. Regular exercise did not show a clear independent relationship with mental health days, although it still contributed when considered together with other demographic characteristics.

End of Lab Activity

Test of Hypotheses lab

Sarah Andres

2026-03-11