Part 2: In-Class Lab Activity
EPI 553 — Multiple Linear Regression Lab
Due: End of class, March 10, 2026
Instructions
In this lab, you will build and interpret multiple linear regression
models using the BRFSS 2020 analytic dataset. Work through each task
systematically. You may discuss concepts with classmates, but your
written answers and R code must be your own.
Submission: Knit your .Rmd to HTML and upload to
Brightspace by end of class.
Data for the Lab
Use the saved analytic dataset from today’s lecture. It contains
5,000 randomly sampled BRFSS 2020 respondents with the following
variables:
menthlth_days |
Mentally unhealthy days in past 30 |
Continuous (0–30) |
physhlth_days |
Physically unhealthy days in past 30 |
Continuous (0–30) |
sleep_hrs |
Sleep hours per night |
Continuous (1–14) |
age |
Age in years (capped at 80) |
Continuous |
income_cat |
Household income (1 = <$10k to 8 = >$75k) |
Ordinal |
sex |
Sex (Male/Female) |
Factor |
exercise |
Any physical activity past 30 days (Yes/No) |
Factor |
# Load the dataset
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(ggeffects)
brfss_mlr <- readRDS(
"/Users/samriddhi/Downloads/brfss_mlr_2020.rds"
)
Task 1: Exploratory Data Analysis (15 points)
1a. (5 pts) Create a descriptive statistics table
using tbl_summary() that includes all variables in the
dataset. Include means (SD) for continuous variables and n (%) for
categorical variables.
brfss_mlr %>%
select(menthlth_days, physhlth_days, sleep_hrs, age, income_cat,
income_f, sex, exercise, bmi) %>%
tbl_summary(
label = list(
menthlth_days ~ "Mentally unhealthy days (past 30)",
physhlth_days ~ "Physically unhealthy days (past 30)",
sleep_hrs ~ "Sleep (hours/night)",
age ~ "Age (years)",
income_cat ~ "Income category",
income_f ~ "Household income",
sex ~ "Sex",
exercise ~ "Any physical activity (past 30 days)",
bmi ~ "Body Mass Index (kg/m2)"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 1,
missing = "no"
) %>%
add_n() %>%
bold_labels() %>%
modify_caption("**Table 1. Descriptive Statistics — BRFSS 2020 Analytic Sample (n = 5,000)**")
Table 1. Descriptive Statistics — BRFSS 2020 Analytic Sample (n = 5,000)
| Characteristic |
N |
N = 5,000 |
| Mentally unhealthy days (past 30) |
5,000 |
3.8 (7.7) |
| Physically unhealthy days (past 30) |
5,000 |
3.3 (7.8) |
| Sleep (hours/night) |
5,000 |
7.1 (1.3) |
| Age (years) |
5,000 |
54.3 (17.2) |
| Income category |
5,000 |
|
| 1 |
|
190 (3.8%) |
| 2 |
|
169 (3.4%) |
| 3 |
|
312 (6.2%) |
| 4 |
|
434 (8.7%) |
| 5 |
|
489 (9.8%) |
| 6 |
|
683 (14%) |
| 7 |
|
841 (17%) |
| 8 |
|
1,882 (38%) |
| Household income |
5,000 |
|
| <$10k |
|
190 (3.8%) |
| $10-15k |
|
169 (3.4%) |
| $15-20k |
|
312 (6.2%) |
| $20-25k |
|
434 (8.7%) |
| $25-35k |
|
489 (9.8%) |
| $35-50k |
|
683 (14%) |
| $50-75k |
|
841 (17%) |
| >$75k |
|
1,882 (38%) |
| Sex |
5,000 |
|
| Male |
|
2,331 (47%) |
| Female |
|
2,669 (53%) |
| Any physical activity (past 30 days) |
5,000 |
3,874 (77%) |
| Body Mass Index (kg/m2) |
4,706 |
28.4 (6.4) |
1b. (5 pts) Create a histogram of
menthlth_days. Describe the shape of the distribution. Is
it symmetric, right-skewed, or left-skewed? What are the implications of
this shape for regression modeling? The histogram plot is not symmetric
and it is rightly skewed. We can see that the data points are on the
extreme right side of the curve towards 30. It is important to consider
it when evaluating the model assumption later.
p_hist <- ggplot(brfss_mlr, aes(x = menthlth_days)) +
geom_histogram(binwidth = 1, fill = "blue", color = "white", alpha = 0.85) +
labs(
title = "Distribution of Mentally Unhealthy Days in the Past 30 Days",
subtitle = "BRFSS 2020 Analytic Sample (n = 5,000)",
x = "Number of Mentally Unhealthy Days",
y = "Count"
) +
theme_minimal(base_size = 13)
ggplotly(p_hist)
1c. (5 pts) Create a scatterplot matrix (using
GGally::ggpairs() or similar) for the continuous variables:
menthlth_days, physhlth_days,
sleep_hrs, and age. Comment on the direction
and strength of each pairwise correlation with the outcome. Mental
health days and sleep is weekly negative correlation with age. This
means every unit increase in age the mental healthy days and sleep
decreases by 0.156 days and 0.140 hours respectively. While physical
health is moderately positive correlation with mental health days, which
implies that every unit increase in age the mental healthy days
increases by 0.135
# Pairs plot of continuous predictors vs outcome
brfss_mlr %>%
select(menthlth_days, physhlth_days, sleep_hrs, age) %>%
rename(
`Mental Health\nDays` = menthlth_days,
`Physical Health\nDays` = physhlth_days,
`Sleep\n(hrs)` = sleep_hrs,
Age = age
) %>%
ggpairs(
lower = list(continuous = wrap("points", alpha = 0.05, size = 0.5)),
diag = list(continuous = wrap("densityDiag", fill = "blue", alpha = 0.5)),
upper = list(continuous = wrap("cor", size = 4)),
title = "Pairwise Relationships Among Key Variables (BRFSS 2020)"
) +
theme_minimal(base_size = 11)

Task 2: Unadjusted (Simple) Linear Regression (15 points)
2a. (5 pts) Fit a simple linear regression model
regressing menthlth_days on sleep_hrs
alone.
#fitted regression equation
model_mlr <- lm(menthlth_days ~ sleep_hrs, data = brfss_mlr)
# Summary output
summary(model_mlr)
##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs, data = brfss_mlr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.670 -3.845 -3.040 -0.040 31.785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.47429 0.57712 16.42 <2e-16 ***
## sleep_hrs -0.80424 0.08025 -10.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.642 on 4998 degrees of freedom
## Multiple R-squared: 0.0197, Adjusted R-squared: 0.0195
## F-statistic: 100.4 on 1 and 4998 DF, p-value: < 2.2e-16
#coefficient table
tidy(model_mlr, conf.int = TRUE) %>%
mutate(across(where(is.numeric), ~ round(., 4))) %>%
kable(
caption = "Simple Linear Regression: menthlth_days ~ sleep_hrs (BRFSS 2020)",
col.names = c("Term", "Estimate", "Std. Error", "t-statistic",
"p-value", "95% CI Lower", "95% CI Upper"),
align = "lrrrrrrr"
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE)
Simple Linear Regression: menthlth_days ~ sleep_hrs (BRFSS 2020)
|
Term
|
Estimate
|
Std. Error
|
t-statistic
|
p-value
|
95% CI Lower
|
95% CI Upper
|
|
(Intercept)
|
9.4743
|
0.5771
|
16.4165
|
0
|
8.3429
|
10.6057
|
|
sleep_hrs
|
-0.8042
|
0.0802
|
-10.0218
|
0
|
-0.9616
|
-0.6469
|
Write out the fitted regression equation. Poor mental
health=9.4743+(−0.8042)× sleep hours
2b. (5 pts) Interpret the slope for sleep in a
single sentence appropriate for a public health audience (no statistical
jargon). Slope (𝑏1=−0.8042): Each additional hour of sleep was
associated with approximately 0.8 fewer poor mental health days per
month, on average, holding all else constant (though there are no other
variables in this simple model).
2c. (5 pts) State the null and alternative
hypotheses for the slope, report the t-statistic and p-value, and state
your conclusion. What is the degree of freedom for this test? 𝐻0:𝛽1=0 no
linear relationship between poor mental health and sleep hours 𝐻𝐴:𝛽1≠0
there is a linear relationship between poor mental health and sleep
hours t-statistic: −10.02 p-value: < 2x 10(-16) Because the p-value
is less than 0.05, we reject the null hypothesis. This suggests that
sleep hours are associated with the number of mentally unhealthy days.
Specifically, people who sleep more hours tend to report fewer mentally
unhealthy days, with about 0.8 fewer days for each additional hour of
sleep on average. Given n = 5000: df=5000-2=4998
Task 3: Building the Multivariable Model (25 points)
3a. (5 pts) Fit three models:
- Model A:
menthlth_days ~ sleep_hrs
- Model B:
menthlth_days ~ sleep_hrs + age + sex
- Model C:
menthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercise
# Model 1: Unadjusted
m1 <- lm(menthlth_days ~ sleep_hrs, data = brfss_mlr)
# Model 2: Add sleep
m2 <- lm(menthlth_days ~ sleep_hrs + age + sex, data = brfss_mlr)
# Model 3: Full multivariable model
m3 <- lm(menthlth_days ~sleep_hrs + age + sex + physhlth_days + income_cat + exercise,
data = brfss_mlr)
summary(m3)
##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs + age + sex + physhlth_days +
## income_cat + exercise, data = brfss_mlr)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9192 -3.4262 -1.7803 0.2948 30.0568
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.475489 0.716959 17.401 < 2e-16 ***
## sleep_hrs -0.509160 0.075348 -6.757 1.57e-11 ***
## age -0.082307 0.005933 -13.872 < 2e-16 ***
## sexFemale 1.245053 0.202333 6.153 8.17e-10 ***
## physhlth_days 0.291657 0.013579 21.478 < 2e-16 ***
## income_cat -0.321323 0.052012 -6.178 7.02e-10 ***
## exerciseYes -0.342685 0.253138 -1.354 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.09 on 4993 degrees of freedom
## Multiple R-squared: 0.1569, Adjusted R-squared: 0.1559
## F-statistic: 154.9 on 6 and 4993 DF, p-value: < 2.2e-16
3b. (10 pts) Create a table comparing the sleep
coefficient (\(\hat{\beta}\), SE, 95%
CI, p-value) across Models A, B, and C. Does the sleep coefficient
change substantially when you add more covariates? What does this
suggest about confounding? Yes, the sleep coefficient changes when
additional covariates are added, but the change is not very large. In
the previous regression model, each additional hour of sleep was
associated with 0.80 fewer mentally unhealthy days. After adjusting for
other covariates (age, sex, physical unhealthy days, income, and
exercise) in the multiple regression model, the mentally unhealthy days
decreased to 0.51.This reduction suggests that some confounding is
present, meaning that factors such as age, income, and physical health
partially explain the relationship between sleep and mentally unhealthy
days. However, the association remains statistically significant,
indicating that sleep independently contributes to mental health
outcomes.
tidy(m3, conf.int = TRUE) %>%
mutate(
term = dplyr::recode(term,
"(Intercept)" = "Intercept",
"menthlth_days" = "Poor mental health days",
"sleep_hrs" = "Sleep (hours/night)",
"age" = "Age (years)",
"physhlth_days" = "Physical unealthy days",
"income_cat" = "Income (ordinal 1-8)",
"sexFemale" = "Sex: Female (ref = Male)",
"exerciseYes" = "Exercise: Yes (ref = No)"
),
across(where(is.numeric), ~ round(., 4))
) %>%
kable(
caption = "Table 3. Multiple Linear Regression: Mentally Unhealthy Days ~ Multiple Predictors (BRFSS 2020, n = 5,000)",
col.names = c("Term", "Estimate (β̂)", "Std. Error", "t-statistic",
"p-value", "95% CI Lower", "95% CI Upper")
) %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE) %>%
row_spec(c(2, 3), background = "#EBF5FB") # highlight key predictors
Table 3. Multiple Linear Regression: Mentally Unhealthy Days ~ Multiple
Predictors (BRFSS 2020, n = 5,000)
|
Term
|
Estimate (β̂
|
Std. Erro
|
|
Intercept
|
12.4755
|
0.7170
|
17.4006
|
0.0000
|
11.0699
|
13.8810
|
|
Sleep (hours/night)
|
-0.5092
|
0.0753
|
-6.7574
|
0.0000
|
-0.6569
|
-0.3614
|
|
Age (years)
|
-0.0823
|
0.0059
|
-13.8724
|
0.0000
|
-0.0939
|
-0.0707
|
|
Sex: Female (ref = Male)
|
1.2451
|
0.2023
|
6.1535
|
0.0000
|
0.8484
|
1.6417
|
|
Physical unealthy days
|
0.2917
|
0.0136
|
21.4779
|
0.0000
|
0.2650
|
0.3183
|
|
Income (ordinal 1-8)
|
-0.3213
|
0.0520
|
-6.1778
|
0.0000
|
-0.4233
|
-0.2194
|
|
Exercise: Yes (ref = No)
|
-0.3427
|
0.2531
|
-1.3537
|
0.1759
|
-0.8389
|
0.1536
|
3c. (10 pts) For Model C, write out
the full fitted regression equation and interpret every
coefficient in plain language appropriate for a public health report.
Mentally Unhealthy Days =12.48+−0.51(Sleep
Hours)+−0.08(Age)+1.25(Female)+0.29(Physically Unhealthy
Days)−0.32(Income)+−0.34(Exercise) Where:Female = 1 if female, 0 if male
Exercise = 1 if exercises, 0 if does not exercise Income is an ordinal
scale from 1 (lowest) to 8 (highest).
Sleep hours (𝛽̂ = -0.509): Each additional hour of sleep per night is
associated with an estimated 0.509 fewer mentally unhealthy days on
average, adjusting for covariates like age, sex, physical unhealthy
days, income category and excersize (95% CI: -0.657 to -0.361). The
negative sign indicates a protective association. Age (𝛽̂ = -0.082): Each
additional year of age is associated with 0.082 (95% CI: -0.094 to
-0.071) fewer mentally unhealthy days on average (holding all else
constant). This finding is well-documented — older adults often report
fewer mental health difficulties, possibly due to better emotion
regulation, survivor bias, or cohort effects. Sex: Female (𝛽= 1.245):
Compared to males (the reference group), females report an estimated
1.25 (95% CI:0.85 to 1.64) more mentally unhealthy days on average,
holding all other variables constant. Physical unhealthy days (𝛽̂ =
0.292): Each additional day of poor physical health is associated with
an estimated 0.292 (95% CI: 0.27 to 0.32) additional mentally unhealthy
day on average. Income category (𝛽̂ = -0.321): Each one-unit increase in
the income category (on the 1–8 ordinal scale) is associated with 0.321
(CI 95%: -0.42 to -0.22) fewer mentally unhealthy days on average,
consistent with the well-established socioeconomic gradient in mental
health. Exercise:(𝛽̂ = -0.343): People who engaged in any physical
activity in the past 30 days report an estimated 0.343 (95% CI: -0.84 to
0.15) fewer mentally unhealthy days compared to those who did not
exercise.
The model suggests that sleep duration, age, sex, physical health,
and income are significant predictors of mentally unhealthy days.
Individuals who sleep more, are older, and have higher income tend to
report fewer mentally unhealthy days, while women and individuals
experiencing more physical health problems report more mentally
unhealthy days.
Task 4: Model Fit and ANOVA (20 points)
4a. (5 pts) Report \(R^2\) and Adjusted \(R^2\) for each of the three models (A, B,
C). Create a table. Which model explains the most variance in mental
health days? Model 3 has the highest Adjusted R² (0.156), meaning it
explains about 15.6% of the variability in mentally unhealthy days. The
fully adjusted model (Model 3) explains the most variance in mental
health days and therefore provides the best overall fit among the three
models.
# Compare the menthlth_days coefficient across models
tribble(
~Model, ~`menthlth_days β̂`, ~`95% CI`, ~`Adj. R²`,
"M1 (unadjusted)", round(coef(m1)[2], 3),
paste0("(", round(confint(m1)[2,1],3), ", ", round(confint(m1)[2,2],3), ")"),
round(summary(m1)$adj.r.squared, 3),
"M2 (+sleep)", round(coef(m2)[2], 3),
paste0("(", round(confint(m2)[2,1],3), ", ", round(confint(m2)[2,2],3), ")"),
round(summary(m2)$adj.r.squared, 3),
"M3 (full)", round(coef(m3)[2], 3),
paste0("(", round(confint(m3)[2,1],3), ", ", round(confint(m3)[2,2],3), ")"),
round(summary(m3)$adj.r.squared, 3)
) %>%
kable(caption = "Table 4. Mental UnHealthy Days Coefficient Across Sequential Models") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE)
Table 4. Mental UnHealthy Days Coefficient Across Sequential Models
|
Model
|
menthlth_days β
|
|95% CI
|
Adj. R
|
|
M1 (unadjusted)
|
-0.804
|
(-0.962, -0.647)
|
0.020
|
|
M2 (+sleep)
|
-0.734
|
(-0.889, -0.578)
|
0.050
|
|
M3 (full)
|
-0.509
|
(-0.657, -0.361)
|
0.156
|
4b. (5 pts) What is the Root MSE for Model C?
Interpret it in practical terms — what does it tell you about prediction
accuracy?
anova_m3 <- anova(m3)
print(anova_m3)
## Analysis of Variance Table
##
## Response: menthlth_days
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep_hrs 1 5865 5864.8 116.6678 < 2.2e-16 ***
## age 1 6182 6182.2 122.9832 < 2.2e-16 ***
## sex 1 2947 2947.1 58.6266 2.274e-14 ***
## physhlth_days 1 29456 29455.5 585.9585 < 2.2e-16 ***
## income_cat 1 2177 2176.8 43.3031 5.169e-11 ***
## exercise 1 92 92.1 1.8326 0.1759
## Residuals 4993 250993 50.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
glance(m3) %>%
select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
mutate(across(where(is.numeric), ~ round(., 4))) %>%
pivot_longer(everything(), names_to = "Statistic", values_to = "Value") %>%
mutate(Statistic = dplyr::recode(Statistic,
"r.squared" = "R²",
"adj.r.squared" = "Adjusted R²",
"sigma" = "Residual Std. Error (Root MSE)",
"statistic" = "F-statistic",
"p.value" = "p-value (overall F-test)",
"df" = "Model df (p)",
"df.residual" = "Residual df (n − p − 1)",
"nobs" = "n (observations)"
)) %>%
kable(caption = "Table 6. Overall Model Summary — Model C") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 6. Overall Model Summary — Model C
|
Statistic
|
Value
|
|
R²
|
0.1569
|
|
Adjusted R²
|
0.1559
|
|
Residual Std. Error (Root MSE)
|
7.0901
|
|
F-statistic
|
154.8953
|
|
p-value (overall F-test)
|
0.0000
|
|
Model df (p)
|
6.0000
|
|
Residual df (n − p − 1)
|
4993.0000
|
|
n (observations)
|
5000.0000
|
4c. (10 pts) Using the ANOVA output for Model C,
fill in the following table manually (i.e., compute the values using the
output from anova() or glance()):
| Model |
6 |
46719 |
7786.50 |
154.86 |
| Residual |
4993 |
250993 |
50.3 |
|
| Total |
4999 |
297712 |
|
|
State the null hypothesis for the overall F-test and your conclusion.
The null hypothesis states that all regression coefficients are equal to
zero, meaning none of the predictors are associated with mentally
unhealthy days. Since the overall F-test is statistically significant (p
value=2.2 × 10⁻¹⁶), we reject the null hypothesis and conclude that at
least one predictor in the model is significantly associated with
mentally unhealthy days. —
Task 5: Residual Diagnostics (15 points)
5a. (5 pts) For Model C, produce the four standard
diagnostic plots (Residuals vs. Fitted, Normal Q-Q, Scale-Location,
Cook’s Distance). Comment on what each plot tells you about the LINE
assumptions. Residuals vs Fitted Plot: The red line should be relatively
flat and should stay near the 0. In the residual vs fitted plot, The
residuals also shows a downward trend. There is a slight violation of
the linearity.Therefore, we will conclude the linearity and equal
variance assumptions may not be fully satisfied. Q-Q Plot (Normality of
Residuals): For this plot we check that the data points should follow
the dashed diagonal line. Here, the data points follow the diagonal
line. However, there is a deviation at the upper tail, suggesting the
data is rightly skewed. Therefore we conclude that residuals are not
normally distributed. Also, the data is skewed rightly in the
distribution. Scale–Location Plot (Equal Variance): This plot checks if
the “spread” across the fitted values, the red line should be flat. For
the plot below, the red line increases with fitted values.This indicates
heteroscedasticity, meaning the variance of residuals is not constant.
Thus, we conclude that the equal variance assumption is violated. Cook’s
Distance Plot: Most observations should have very small Cook’s distance
values. Most of the observations are close toCook’s distance. There are
only few data points which are away from the cook’s distance. Hence, we
conclude there are some potentially influential observations, but they
do not appear to severely distort the model.
par(mfrow = c(2, 2))
plot(m3, which = 1:4, col = adjustcolor("blue", alpha.f = 0.3), pch = 16)

5b. (5 pts) Given the nature of the outcome (mental
health days, bounded at 0 and 30, heavily right-skewed), which
assumptions are most likely to be violated? Does this invalidate the
analysis? Explain. The assumptions most likely violated are normality of
residuals and constant variance (homoscedasticity) due to the bounded,
right-skewed count outcome. This does not completely invalidate the
regression, especially with a large sample.
5c. (5 pts) Identify any observations with Cook’s
Distance > 1. How many are there? What would you do with them in a
real analysis? There are few observations which are distant from the
crooks line. These are three of them (830, 1439, 1632).These values are
far below the threshold of 1, meaning they are not strongly influential
on the regression model. Hence, its is not required to remove these
values.
Task 6: Interpretation for a Public Health Audience (10 points)
Suppose you are writing the results section of a public health paper.
Write a 3–4 sentence paragraph summarizing the findings
from Model C for a non-statistical audience. Your paragraph should:
- Identify which predictors were significantly associated with mental
health days
- State the direction and approximate magnitude of the most important
associations
- Appropriately caveat the cross-sectional nature of the data (no
causal language)
- Not use any statistical jargon (no “significant,” “coefficient,”
“p-value”)
Based on the results in Table 3, several factors were related to the
number of mentally unhealthy days reported in the past month.
Based upon the study results. People who slept more hours per night
tended to report fewer mentally unhealthy days. On average, each
additional hour of sleep was linked with about half a day fewer of poor
mental health in a month.
Age was also related to mental health days. Older adults reported
fewer mentally unhealthy days than younger adults. The difference was
relatively small for each additional year of age, but it could add up
across many years.
Sex showed a difference as well. Women reported about 1.2 more
mentally unhealthy days per month than men, on average.
The strongest relationship was with physical unhealthy days. People
who reported more days of poor physical health also tended to report
more mentally unhealthy days. For each additional day of poor physical
health, there were about 0.29 more mentally unhealthy days. Income was
also related to mental health. People in higher income categories tended
to report fewer mentally unhealthy days, with roughly 0.3 fewer days for
each step up in the income category. In contrast, exercise was not
clearly related to mentally unhealthy days in this analysis.
It is important to note that these data come from a cross-sectional
survey, meaning all information was collected at one snap shot of time.
Because of this, the results cannot determine cause and effect. The
findings only show that these factors tended to occur together with
differences in mentally unhealthy days among the survey
participants.
End of Lab Activity