EPI 553 — Tests of Hypotheses Lab Due: End of class
This lab uses a Driver-Navigator pair programming approach. You will work with a partner, taking turns in two roles:
Round 1 (Tasks 1–3): Student A is the Driver, Student B is the Navigator. Round 2 (Tasks 4–6): Student B is the Driver, Student A is the Navigator.
Switch roles after completing Task 3. Both partners should understand all tasks — the Navigator should actively participate by checking the work, asking questions, and discussing interpretations.
Submission: Each pair submits one knitted HTML file with both partners’ names. Upload to Brightspace by end of class.
Use the same BRFSS 2020 dataset from the guided practice.
| Task | Description | Points |
|---|---|---|
| Task 1 | Fitting Models and ANOVA Tables | 15 |
| Task 2 | Type I vs. Type III Sums of Squares | 15 |
| Task 3 | Partial F-Tests for Individual Variables | 20 |
| Switch roles | ||
| Task 4 | T-Tests and the F-Test Equivalence | 15 |
| Task 5 | Chunk Test (Testing Groups of Variables) | 20 |
| Task 6 | Synthesis and Interpretation | 15 |
| Total | 100 |
2a. (5 pts) Fit the same model from Task 1 but reverse the variable order:
\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{age} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{physhlth\_days} + \varepsilon\]
Run anova() on this model and compare the Type I SS to
what you obtained in Task 1b. Which values changed and which stayed the
same?
model2 <- lm(menthlth_days ~ age + sleep_hrs + physhlth_days, data = brfss_mlr)
anova_model2 <- anova(model2)
anova_model2 %>%
as.data.frame() %>%
rownames_to_column("Source") %>%
mutate(across(where(is.numeric), ~ round(., 2))) %>%
kable(
caption = "Table 4. Type I Model 2 (Sequential) Sums of Squares — anova()",
col.names = c("Source", "df", "Sum of Sq", "Mean Sq", "F value", "p-value")
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
| Source | df | Sum of Sq | Mean Sq | F value | p-value |
|---|---|---|---|---|---|
| age | 1 | 7249.10 | 7249.10 | 141.66 | 0 |
| sleep_hrs | 1 | 4797.91 | 4797.91 | 93.76 | 0 |
| physhlth_days | 1 | 30012.26 | 30012.26 | 586.51 | 0 |
| Residuals | 4996 | 255652.08 | 51.17 | NA | NA |
All Type I SS values changed when we reversed the order of the predictors. Type I is sequential, each predictor is now entered at a different stage than in task 1b, and adjusted for different predictors.
2b. (5 pts) Run
car::Anova(type = "III") on this reordered model. Did the
Type III SS change compared to Task 1c? Explain why or why not.
anova_type3_model2 <- Anova(model2, type = "III")
anova_type3_model2
## Anova Table (Type III tests)
##
## Response: menthlth_days
## Sum Sq Df F value Pr(>F)
## (Intercept) 15643 1 305.706 < 2.2e-16 ***
## age 9261 1 180.989 < 2.2e-16 ***
## sleep_hrs 2272 1 44.396 2.972e-11 ***
## physhlth_days 30012 1 586.505 < 2.2e-16 ***
## Residuals 255652 4996
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova_type3
## Anova Table (Type III tests)
##
## Response: menthlth_days
## Sum Sq Df F value Pr(>F)
## (Intercept) 15643 1 305.706 < 2.2e-16 ***
## physhlth_days 30012 1 586.505 < 2.2e-16 ***
## sleep_hrs 2272 1 44.396 2.972e-11 ***
## age 9261 1 180.989 < 2.2e-16 ***
## Residuals 255652 4996
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The Type III SS values stayed the same as in Task 1c. The predictors are in a different order but values are identical. Type II evaluates predictors after adjustign for all other predictors, unlike Type I where order matters.
2c. (5 pts) In 2–3 sentences, explain when an epidemiologist should prefer Type III SS over Type I SS. Give a concrete example from public health research where the choice matters.
3a. (10 pts) Conduct a partial F-test to determine
whether age adds significantly to the prediction of mental
health days, given that physhlth_days and
sleep_hrs are already in the model. Do this by:
age)age)anova(reduced, full) to compare themState the null hypothesis, report the F-statistic and p-value, and state your conclusion at \(\alpha = 0.05\).
#fitting reduced model
m_no_age <- lm(menthlth_days ~ physhlth_days + sleep_hrs,
data = brfss_mlr)
#Fit full model
m_full <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age,
data = brfss_mlr)
#using anova to compare
anova(m_no_age, m_full) %>%
as.data.frame() %>%
rownames_to_column("Model") %>%
mutate(
Model = c("Reduced (no age)", "Full (with age)"),
across(where(is.numeric), ~ round(., 4))
) %>%
kable(
caption = "Table 6. Partial F-Test: Does age add to the model?",
col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
| Model | Res. df | RSS | df | Sum of Sq | F | p-value |
|---|---|---|---|---|---|---|
| Reduced (no age) | 4997 | 264913.6 | NA | NA | NA | NA |
| Full (with age) | 4996 | 255652.1 | 1 | 9261.475 | 180.9894 | 0 |
The null hypothesis is age does not add significance to the model after physical health days and hours of sleep are already in the model. The F-statistic is 180.9894. Since the p-value is <0.05, we reject the null hypothesis and conclude that age adds statistically significant information to the prediction of mental health days, given that physical health days and hours of sleep are already in the model.
3b. (10 pts) Now verify your result from 3a
manually. Using the anova() output from the full model
(Task 1b), identify \(SS(\text{age} \mid
\text{physhlth\_days}, \text{sleep\_hrs})\) from the Type I
table. Compute the F-statistic as:
\[F = \frac{SS(\text{age} \mid \text{physhlth\_days}, \text{sleep\_hrs}) / 1}{MSE(\text{full model})}\]
Compare to the critical value \(F_{1,
n-p-1, 0.95}\). Does your manual calculation agree with the
anova() comparison from 3a?
ssr_full <- sum(anova(m_full)$`Sum Sq`[1:3])
ssr_reduced <- sum(anova(m_no_age)$`Sum Sq`[1:2])
mse_full <- anova(m_full)$`Mean Sq`[4]
F_stat <- (ssr_full - ssr_reduced) / 1 / mse_full
cat("SSR(full):", round(ssr_full, 2), "\n")
## SSR(full): 42059.26
cat("SSR(reduced):", round(ssr_reduced, 2), "\n")
## SSR(reduced): 32797.79
cat("SS(age | others):", round(ssr_full - ssr_reduced, 2), "\n")
## SS(age | others): 9261.47
cat("MSE(full):", round(mse_full, 2), "\n")
## MSE(full): 51.17
cat("F-statistic:", round(F_stat, 4), "\n")
## F-statistic: 180.9894
cat("Critical value F(1, 4996, 0.95):", round(qf(0.95, 1, 4996), 4), "\n")
## Critical value F(1, 4996, 0.95): 3.8433
cat("p-value:", format.pval(pf(F_stat, 1, 4996, lower.tail = FALSE)), "\n")
## p-value: < 2.22e-16
The manual calculations match the partial F-test from 3a. The observed F-value (180.99) far exceeds the critical value (3.84), so we reach the same conclusion as in part 3a and reject the null hypothesis that age does not improve prediction of mentally unhealthy days.
5a. (10 pts) Now consider the full 6-predictor model:
\[\text{menthlth\_days} = \beta_0 + \beta_1 \cdot \text{physhlth\_days} + \beta_2 \cdot \text{sleep\_hrs} + \beta_3 \cdot \text{age} + \beta_4 \cdot \text{income\_cat} + \beta_5 \cdot \text{sex} + \beta_6 \cdot \text{exercise} + \varepsilon\]
Test whether income_cat, sex, and
exercise — as a group — significantly add to the prediction
of mental health days, given that physhlth_days,
sleep_hrs, and age are already in the
model.
State the null hypothesis (in both words and mathematical notation), conduct the test, and state your conclusion.
m_reduced <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age, data = brfss_mlr)
full_m <- lm(menthlth_days ~ physhlth_days + sleep_hrs + age + income_cat + sex + exercise,
data = brfss_mlr)
anova(m_reduced, full_m) %>%
as.data.frame() %>%
rownames_to_column("Model") %>%
mutate(
Model = c("Reduced (physhlth + sleep + age)", "Full (+ demographics)"),
across(where(is.numeric), ~ round(., 4))
) %>%
kable(
caption = "Table 8. Chunk Test: Do demographic variables collectively add to the model?",
col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
| Model | Res. df | RSS | df | Sum of Sq | F | p-value |
|---|---|---|---|---|---|---|
| Reduced (physhlth + sleep + age) | 4996 | 255652.1 | NA | NA | NA | NA |
| Full (+ demographics) | 4993 | 250992.8 | 3 | 4659.276 | 30.8957 | 0 |
Null Hypothesis: H0: Bincome = Bsex = Bexercise = 0 HO: Income category, sex, and exercise do not improve prediction of mentally unhealthy days once physical health days, sleep hours, and age are already in the model
Conclusion: The F-statistic is 30.8957 and with a p-value <0.05, we reject the null hypothesis and conclude that income, sex, and exercise improve the prediction of mentally unhealthy days once physical healthry days, hours of sleep, and age are already in the model.
5b. (5 pts) Compute the chunk test F-statistic manually using:
\[F = \frac{\{SSR(\text{full}) - SSR(\text{reduced})\} / (df_{\text{full}} - df_{\text{reduced}})}{MSE(\text{full})}\]
Show all intermediate values. Does your manual computation match the
anova() result?
ssr_full5b <- sum(anova(full_m)$`Sum Sq`[1:6])
ssr_reduced5b <- sum(anova(m_reduced)$`Sum Sq`[1:3])
mse_full5b <- anova(full_m)$`Mean Sq`[7]
df_diff5b <- 3
F_chunk5b <- ((ssr_full5b - ssr_reduced5b) / df_diff5b) / mse_full5b
cat("SSR(full):", round(ssr_full5b, 2), "\n")
## SSR(full): 46718.54
cat("SSR(reduced):", round(ssr_reduced5b, 2), "\n")
## SSR(reduced): 42059.26
cat("Difference:", round(ssr_full5b - ssr_reduced5b, 2), "\n")
## Difference: 4659.28
cat("df (number of added variables):", df_diff5b, "\n")
## df (number of added variables): 3
cat("MSE(full):", round(mse_full5b, 2), "\n")
## MSE(full): 50.27
cat("F-statistic:", round(F_chunk5b, 4), "\n")
## F-statistic: 30.8957
cat("Critical value F(3, 4993, 0.95):", round(qf(0.95, df_diff5b, 4993), 4), "\n")
## Critical value F(3, 4993, 0.95): 2.6067
cat("p-value:", format.pval(pf(F_chunk5b, df_diff5b, 4993, lower.tail = FALSE)), "\n")
## p-value: < 2.22e-16
Yes, the manual computation matches the anova() result.
5c. (5 pts) Note that exercise was
not individually significant in the Type III table, yet
it is part of a group that is collectively significant. In 2–3
sentences, explain how this is possible and what it means for model
building in epidemiology.
Even though exercise was not individually significant in the Type III table, it can still contribute meaningful variance when combined with income and sex. Group tests evaluate the joint contribution of several predictors, so a variable that is weak on its own may still matter as part of a multivariable construct. In epidemiological model building, this reminds us that decisions shouldn’t rely solely on individual p‑values.
6a. (5 pts) Based on the full model, which predictors are statistically significant at \(\alpha = 0.05\)? List them and briefly state the direction of each association (positive or negative).
summary(full_m)
##
## Call:
## lm(formula = menthlth_days ~ physhlth_days + sleep_hrs + age +
## income_cat + sex + exercise, data = brfss_mlr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9192 -3.4262 -1.7803 0.2948 30.0568
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.475489 0.716959 17.401 < 2e-16 ***
## physhlth_days 0.291657 0.013579 21.478 < 2e-16 ***
## sleep_hrs -0.509160 0.075348 -6.757 1.57e-11 ***
## age -0.082307 0.005933 -13.872 < 2e-16 ***
## income_cat -0.321323 0.052012 -6.178 7.02e-10 ***
## sexFemale 1.245053 0.202333 6.153 8.17e-10 ***
## exerciseYes -0.342685 0.253138 -1.354 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.09 on 4993 degrees of freedom
## Multiple R-squared: 0.1569, Adjusted R-squared: 0.1559
## F-statistic: 154.9 on 6 and 4993 DF, p-value: < 2.2e-16
The predictors that are statistically significant are: physhlth_days: This is a positive association where more physically unhealthy days are association with more mentally unhealhty days.
sleep_hrs: Sleep_hrs has a negative association on mentally unhealthy days. The more hours of sleep the less mentally unhelathy days.
age: Age has a negative association. Older adults report fewer mentally unhealthy days. income_cat: This is a negative association. Higher income individuals report lower number of mentally unhealthy days.
sex: This is a positive association. Women report a higher number of mentally unhealty days.
6b. (5 pts) A colleague argues: “We should drop
exercise from the model because it’s not significant.” Do
you agree? Write a 2–3 sentence response explaining your reasoning.
Consider the chunk test results and epidemiologic rationale.
I don’t think we should drop exercise because it’s not significant. The significance value is individual. The chunk test showed that exercise contributes meaningfully when considered together with income and sex.
6c. (5 pts) Write a 3–4 sentence summary of the hypothesis testing results for a non-statistical audience (e.g., a public health program manager). Your summary should convey which factors were identified as independently associated with mental health days and which were not, without using jargon like “p-value,” “F-test,” or “sums of squares.”
Adults reported more mentally unhealthy days when they were also experiencing physically unhealthy days, fewer hours of sleep, were younger, had lower incomes. These patterns were still prevalent after accounting for all other factors in the model. Regular exercise did not show a clear independent relationship with mental health days, although it still contributed when considered together with other demographic characteristics.
End of Lab Activity