knitr::opts_chunk$set(
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.width = 10,
fig.height = 6,
cache = FALSE
)In the previous lectures, we used Simple Linear Regression (SLR) to model a continuous outcome as a function of a single predictor. But real-world health phenomena are never driven by one variable alone. A person’s mental health is shaped by how much they sleep, how physically ill they are, how old they are, their income, their sex, and much more — simultaneously.
Multiple Linear Regression (MLR) extends SLR to accommodate this complexity. It allows us to:
In epidemiology, MLR is our primary tool for multivariable analysis of continuous outcomes.
library(tidyverse)
library(haven)
library(janitor)
library(knitr)
library(kableExtra)
library(plotly)
library(broom)
library(ggeffects)
library(gtsummary)
library(GGally)
library(car)
library(lmtest)
library(corrplot)EPI 553 — Multiple Linear Regression Lab Due: End of class, March 10, 2026
In this lab, you will build and interpret multiple linear regression models using the BRFSS 2020 analytic dataset. Work through each task systematically. You may discuss concepts with classmates, but your written answers and R code must be your own.
Submission: Knit your .Rmd to HTML and upload to Brightspace by end of class.
Use the saved analytic dataset from today’s lecture. It contains 5,000 randomly sampled BRFSS 2020 respondents with the following variables:
| Variable | Description | Type |
|---|---|---|
menthlth_days |
Mentally unhealthy days in past 30 | Continuous (0–30) |
physhlth_days |
Physically unhealthy days in past 30 | Continuous (0–30) |
sleep_hrs |
Sleep hours per night | Continuous (1–14) |
age |
Age in years (capped at 80) | Continuous |
income_cat |
Household income (1 = <$10k to 8 = >$75k) | Ordinal |
sex |
Sex (Male/Female) | Factor |
exercise |
Any physical activity past 30 days (Yes/No) | Factor |
# Load the dataset
library(tidyverse)
library(broom)
library(knitr)
library(kableExtra)
library(gtsummary)
library(ggeffects)
brfss_mlr_new <- readRDS(
"/Users/morganwheat/Epi_553/data/LLCP2020.RDS copy"
)# a. (5 pts)** Create a descriptive statistics table using `tbl_summary()` that includes all variables in the dataset. Include means (SD) for continuous variables and n (%) for categorical variables.
brfss_mlr_new %>%
select(menthlth_days, physhlth_days, sleep_hrs, age,
income_f, sex, exercise) %>%
tbl_summary(
label = list(
menthlth_days ~ "Mentally unhealthy days (past 30)",
physhlth_days ~ "Physically unhealthy days (past 30)",
sleep_hrs ~ "Sleep (hours/night)",
age ~ "Age (years)",
income_f ~ "Household income",
sex ~ "Sex",
exercise ~ "Any physical activity (past 30 days)"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 1,
missing = "no"
) %>%
add_n() %>%
# add_overall() %>%
bold_labels() %>%
#italicize_levels() %>%
modify_caption("**Table 1. Descriptive Statistics — BRFSS 2020 Analytic Sample (n = 5,000)**")| Characteristic | N | N = 5,0001 |
|---|---|---|
| Mentally unhealthy days (past 30) | 5,000 | 3.8 (7.7) |
| Physically unhealthy days (past 30) | 5,000 | 3.3 (7.8) |
| Sleep (hours/night) | 5,000 | 7.1 (1.3) |
| Age (years) | 5,000 | 54.3 (17.2) |
| Household income | 5,000 | |
| <$10k | 190 (3.8%) | |
| $10-15k | 169 (3.4%) | |
| $15-20k | 312 (6.2%) | |
| $20-25k | 434 (8.7%) | |
| $25-35k | 489 (9.8%) | |
| $35-50k | 683 (14%) | |
| $50-75k | 841 (17%) | |
| >$75k | 1,882 (38%) | |
| Sex | 5,000 | |
| Male | 2,331 (47%) | |
| Female | 2,669 (53%) | |
| Any physical activity (past 30 days) | 5,000 | 3,874 (77%) |
| 1 Mean (SD); n (%) | ||
# b. (5 pts)** Create a histogram of `menthlth_days`. Describe the shape of the distribution. Is it symmetric, right-skewed, or left-skewed? What are the implications of this shape for regression modeling?
p_hist <- ggplot(brfss_mlr_new, aes(x = menthlth_days)) +
geom_histogram(binwidth = 1, fill = "steelblue", color = "white", alpha = 0.85) +
labs(
title = "Distribution of Mentally Unhealthy Days in the Past 30 Days",
subtitle = "BRFSS 2020 Analytic Sample (n = 5,000)",
x = "Number of Mentally Unhealthy Days",
y = "Count"
) +
theme_minimal(base_size = 13)
ggplotly(p_hist)# c. (5 pts)** Create a scatterplot matrix (using `GGally::ggpairs()` or similar) for the continuous variables: `menthlth_days`, `physhlth_days`, `sleep_hrs`, and `age`. Comment on the direction and strength of each pairwise correlation with the outcome.
# Pairs plot of continuous predictors vs outcome
brfss_mlr_new %>%
select(menthlth_days, physhlth_days, sleep_hrs, age) %>%
rename(
`Mental Health\nDays` = menthlth_days,
`Physical Health\nDays` = physhlth_days,
`Sleep\n(hrs)` = sleep_hrs,
Age = age
) %>%
ggpairs(
lower = list(continuous = wrap("points", alpha = 0.05, size = 0.5)),
diag = list(continuous = wrap("densityDiag", fill = "steelblue", alpha = 0.5)),
upper = list(continuous = wrap("cor", size = 4)),
title = "Pairwise Relationships Among Key Variables (BRFSS 2020)"
) +
theme_minimal(base_size = 11)ANSWERED QUESTIONS
The created histogram displays that the outcome (Number of Mentally Unhealthy Days in the past 30 days) is highly right-skewed. This may affect regression modeling because it can lead to non-normally distributed residuals, potentially violating the normality assumption required for valid inference in multiple linear regression.
Continuing to use the number of mentally unhealthy days in the past 30 days (menthlth_days) as the outcome, there is a weak positive correlation (r = 0.315) with the number of physically unhealthy days in the past 30 days, a very weak negative correlation (r = -0.140) with sleep hours per night, and a very weak negative correlation (r = -0.156) with age.
# a. (5 pts)** Fit a simple linear regression model regressing `menthlth_days` on `sleep_hrs` alone. Write out the fitted regression equation.
# Model 1: Unadjusted
mo1 <- lm(menthlth_days ~ sleep_hrs, data = brfss_mlr_new)
summary(mo1)##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs, data = brfss_mlr_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.670 -3.845 -3.040 -0.040 31.785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.47429 0.57712 16.42 <2e-16 ***
## sleep_hrs -0.80424 0.08025 -10.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.642 on 4998 degrees of freedom
## Multiple R-squared: 0.0197, Adjusted R-squared: 0.0195
## F-statistic: 100.4 on 1 and 4998 DF, p-value: < 2.2e-16
#b. (5 pts)** Interpret the slope for sleep in a single sentence appropriate for a public health audience (no statistical jargon).]
#c. (5 pts)** State the null and alternative hypotheses for the slope, report the t-statistic and p-value, and state your conclusion. What is the degree of freedom for this test?
slope_test <- tidy(mo1, conf.int = TRUE) %>%
filter(term == "sleep_hrs")
tibble(
Quantity = c("Slope (b1)", "t-statistic", "p-value"),
Value = c(
round(slope_test$estimate, 4),
round(slope_test$statistic, 3),
format.pval(slope_test$p.value, digits = 3)
)
) %>%
kable(caption = "t-Test for the Slope (H₀: β₁ = 0)") %>%
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Quantity | Value |
|---|---|
| Slope (b1) | -0.8042 |
| t-statistic | -10.022 |
| p-value | <2e-16 |
ANSWERED QUESTIONS
\(\hat{Mentally Unhealthy Days} = 9.47429 + -0.80424 (Sleep(hrs/night)\)
For every one-hour increase in sleep per night, the number of mentally unhealthy days in the past 30 days is predicted to decrease by about 1 day.
The null hypothesis ((H₀): β₁ = 0) is that there is no linear relationship between sleep hours per night and mentally unhealthy days in the past 30 day.The alternative hypothesis ((𝐻𝐴):𝛽1≠0) is that there is a linear relationship between sleep hours per night and mentally unhealthy days in the past 30 days. The the test statistic is (t = -10.02) and the p-value is (p = 2e-16). The numerator degree of freedom for this test is (df = 1), and the denominator degree of freedom is (df = 4998).
#a. (5 pts)** Fit three models:
# **Model A:** `menthlth_days ~ sleep_hrs`
# **Model B:** `menthlth_days ~ sleep_hrs + age + sex`
# **Model C:** `menthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercise`
# Model 1: `menthlth_days ~ sleep_hrs`
mo1 <- lm(menthlth_days ~ sleep_hrs, data = brfss_mlr_new)
# mo1 <- lm(menthlth_days ~ sleep_hrs -1, data = brfss_mlr_new)
# Model 2: Add age and sex
mo2 <- lm(menthlth_days ~ sleep_hrs + age + sex, data = brfss_mlr_new)
# Model 3: `menthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercise`
mo3 <- lm(menthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercise,
data = brfss_mlr_new)
summary(mo1)##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs, data = brfss_mlr_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.670 -3.845 -3.040 -0.040 31.785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.47429 0.57712 16.42 <2e-16 ***
## sleep_hrs -0.80424 0.08025 -10.02 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.642 on 4998 degrees of freedom
## Multiple R-squared: 0.0197, Adjusted R-squared: 0.0195
## F-statistic: 100.4 on 1 and 4998 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs + age + sex, data = brfss_mlr_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.306 -3.980 -2.515 -0.307 32.360
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.76563 0.64447 18.256 < 2e-16 ***
## sleep_hrs -0.73391 0.07932 -9.253 < 2e-16 ***
## age -0.06648 0.00622 -10.688 < 2e-16 ***
## sexFemale 1.53992 0.21339 7.217 6.13e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.523 on 4996 degrees of freedom
## Multiple R-squared: 0.05036, Adjusted R-squared: 0.04979
## F-statistic: 88.32 on 3 and 4996 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = menthlth_days ~ sleep_hrs + age + sex + physhlth_days +
## income_cat + exercise, data = brfss_mlr_new)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.9192 -3.4262 -1.7803 0.2948 30.0568
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.475489 0.716959 17.401 < 2e-16 ***
## sleep_hrs -0.509160 0.075348 -6.757 1.57e-11 ***
## age -0.082307 0.005933 -13.872 < 2e-16 ***
## sexFemale 1.245053 0.202333 6.153 8.17e-10 ***
## physhlth_days 0.291657 0.013579 21.478 < 2e-16 ***
## income_cat -0.321323 0.052012 -6.178 7.02e-10 ***
## exerciseYes -0.342685 0.253138 -1.354 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.09 on 4993 degrees of freedom
## Multiple R-squared: 0.1569, Adjusted R-squared: 0.1559
## F-statistic: 154.9 on 6 and 4993 DF, p-value: < 2.2e-16
#b. (10 pts)** Create a table comparing the sleep coefficient ($\hat{\beta}$, SE, 95% CI, p-value) across Models A, B, and C. Does the sleep coefficient change substantially when you add more covariates? What does this suggest about confounding?
b <- round(coef(mo3), 3)
ci <- round(confint(mo3), 3)
# Compare the sleep_hrs coefficient across models
tribble(
~Model, ~`sleep_hrs β̂`, ~`95% CI`, ~`p-value`, ~`SE`,
"Mo1 (unadjusted)", round(coef(mo1)[2], 3),
paste0("(", round(confint(mo1)[2,1],3), ", ", round(confint(mo1)[2,2],3), ")"),
round(summary(mo1)$coefficients[2,4], 3),
round(summary(mo1)$coefficients[2,2], 3),
"Mo2 (+sleep)", round(coef(mo2)[2], 3),
paste0("(", round(confint(mo2)[2,1],3), ", ", round(confint(mo2)[2,2],3), ")"),
round(summary(mo2)$coefficients[2,4], 3),
round(summary(mo2)$coefficients[2,2], 3),
"Mo3 (full)", round(coef(mo3)[2], 3),
paste0("(", round(confint(mo3)[2,1],3), ", ", round(confint(mo3)[2,2],3), ")"),
round(summary(mo3)$coefficients[2,4], 3),
round(summary(mo3)$coefficients[2,2], 3)
) %>%
kable(caption = "Table 4. Sleep Hours Per Night Coefficient Across Sequential Models") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(0, bold = TRUE)| Model | sleep_hrs β | |95% CI |
p-valu
| |
|---|---|---|---|---|
| Mo1 (unadjusted) | -0.804 | (-0.962, -0.647) | 0 | 0.080 |
| Mo2 (+sleep) | -0.734 | (-0.889, -0.578) | 0 | 0.079 |
| Mo3 (full) | -0.509 | (-0.657, -0.361) | 0 | 0.075 |
#c. (10 pts)** For **Model C**, write out the full fitted regression equation and interpret **every** coefficient in plain language appropriate for a public health report.ANSWERED QUESTIONS
The sleep coefficient does change substantially when additional covariates are added to the models. This suggests that these variables may introduce some confounding in the relationship between sleep hours per night and the number of mentally unhealthy days in the past 30 days.
\(\hat{Mentally Unhealthy Days} = 12.475 + -0.509 (Sleep(hrs/night)\) + -0.082(Age) + 1.245(Female) + 0.292(Physical Health Days) + -0.321(Income) + -0.343(Excercise:Yes). To interpret all listed coefficients in plain language, each additional hour of sleep per night is associated with an estimated 1/2 day decrease in mentally unhealthy days on average. For each additional year of age, there is an associated 0.082 fewer mentally unhealthy days on average. As for sex, compared to males, females report an estimated 1.245 more mentally unhealthy days on average. Looking at physical health days, for each additional day of poor physical health, there is an associated 0.292 increase in the number of mentally unhealthy days on average. For the income category (on the 1-8 ordinal scale), each one-unit increase is associated with 0.321 fewer mentally unhealthy days on average. People who engaged in any physical activity in the past 30 days report an estimated 0.343 fewer mentally unhealthy days compared to individuals who did not exercise. Finally, looking at the intercept, the estimated mean number of unhealthy mental health days for a person with zero hours of sleep per night, zero physically unhealthy days, zero years of age, who is male and does not exercise, and has an income level of zero, is about 12 1/2 days within the last 30 days.
# a. (5 pts)** Report $R^2$ and Adjusted $R^2$ for each of the three models (A, B, C). Create a table. Which model explains the most variance in mental health days?
tribble(
~Model, ~Predictors, ~R2, ~`Adj. R²`,
"Mo1: Physical health only", 1, round(summary(mo1)$r.squared, 4), round(summary(mo1)$adj.r.squared, 4),
"Mo2: + Sleep", 2, round(summary(mo2)$r.squared, 4), round(summary(mo2)$adj.r.squared, 4),
"Mo3: Full (+ age, income, sex, exercise)", 6,
round(summary(mo3)$r.squared, 4), round(summary(mo3)$adj.r.squared, 4)
) %>%
kable(caption = "Table 7. R² and Adjusted R² Across Sequential Models") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
row_spec(3, bold = TRUE, background = "#EBF5FB")| Model | Predictors | R2 | Adj. R² |
|---|---|---|---|
| Mo1: Physical health only | 1 | 0.0197 | 0.0195 |
| Mo2: + Sleep | 2 | 0.0504 | 0.0498 |
| Mo3: Full (+ age, income, sex, exercise) | 6 | 0.1569 | 0.1559 |
# b. (5 pts)** What is the Root MSE for Model C? Interpret it in practical terms — what does it tell you about prediction accuracy?
rmse_mo3 <- round(summary(mo3)$sigma, 2)
cat("Root MSE (Model 3):", rmse_mo3, "mentally unhealthy days\n")## Root MSE (Model 3): 7.09 mentally unhealthy days
# c. (10 pts)** Using the ANOVA output for Model C, fill in the following table manually (i.e., compute the values using the output from `anova()` or `glance()`):State the null hypothesis for the overall F-test and your conclusion.
#anova()
anova_mo3 <- anova(mo3)
print(anova_mo3)## Analysis of Variance Table
##
## Response: menthlth_days
## Df Sum Sq Mean Sq F value Pr(>F)
## sleep_hrs 1 5865 5864.8 116.6678 < 2.2e-16 ***
## age 1 6182 6182.2 122.9832 < 2.2e-16 ***
## sex 1 2947 2947.1 58.6266 2.274e-14 ***
## physhlth_days 1 29456 29455.5 585.9585 < 2.2e-16 ***
## income_cat 1 2177 2176.8 43.3031 5.169e-11 ***
## exercise 1 92 92.1 1.8326 0.1759
## Residuals 4993 250993 50.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# glance()
glance(mo3) %>%
select(statistic, p.value, df, df.residual, nobs) %>%
mutate(across(where(is.numeric), ~ round(., 4))) %>%
pivot_longer(everything(), names_to = "Statistic", values_to = "Value") %>%
mutate(Statistic = dplyr::recode(Statistic,
"statistic" = "F-statistic",
"p.value" = "p-value (overall F-test)",
"df" = "Model df (p)",
"df.residual" = "Residual df (n − p − 1)",
"nobs" = "n (observations)"
)) %>%
kable(caption = "Table 6. Overall Model Summary — Model 3") %>%
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Statistic | Value |
|---|---|
| F-statistic | 154.8953 |
| p-value (overall F-test) | 0.0000 |
| Model df (p) | 6.0000 |
| Residual df (n − p − 1) | 4993.0000 |
| n (observations) | 5000.0000 |
ANSWERED QUESTIONS
| Source | df | SS | MS | F |
|---|---|---|---|---|
| Model | 6 | 46,719 | 7,791.2336 | 154.8953 |
| Residual | 4,993 | 250,993 | 50.3 | |
| Total | 4,499 | 297,712 |
The r-squared for the first model (menthlth_days ~ sleep_hrs) is 0.0197 and the adjusted r-squared is 0.0195. The r-squared for the second model (menthlth_days ~ sleep_hrs + sleep_hrs) is 0.0504 and the adjusted r-sqaured is 0.0498. Finally, the r-squared for the third model (menthlth_days ~ sleep_hrs + age + sex + physhlth_days + income_cat + exercise) is 0.1569 and the adjusted r-squared is 0.1559. The third model explains most of the variance in mental health days as it has the largest r-squared and adjusted r-squared values in comparison to the first two models. However, when comparing the adjusted r-squared values, all three models explain relatively little variance overall (model 1 = 2%, model 2 = 5%, model 3 = 16%).
The Root MSE for Model C is 7.09 mentally unhealthy days. In practical terms, this indicates that the model’s predicted number of mentally unhealthy days differs from the observed value by about 7 days on average.
# a. (5 pts)** For Model C, produce the four standard diagnostic plots (Residuals vs. Fitted, Normal Q-Q, Scale-Location, Cook's Distance). Comment on what each plot tells you about the LINE assumptions.
par(mfrow = c(2, 2))
plot(mo3, which = 1:4, col = adjustcolor("steelblue", alpha.f = 0.3), pch = 16)# b. (5 pts)** Given the nature of the outcome (mental health days, bounded at 0 and 30, heavily right-skewed), which assumptions are most likely to be violated? Does this invalidate the analysis? Explain.
# c. (5 pts)** Identify any observations with Cook's Distance > 1. How many are there? What would you do with them in a real analysis?
# Cook's distance
n <- nobs(mo3)
augmented <- augment(mo3) %>%
mutate(
obs_num = row_number(),
cooks_d = cooks.distance(mo3),
influential = ifelse(cooks_d > 1, "Potentially influential", "Not influential")
)
n_influential <- sum(augmented$cooks_d > 1)
p_cooks <- ggplot(augmented, aes(x = obs_num, y = cooks_d, color = influential)) +
geom_point(alpha = 0.6, size = 1.2) +
geom_hline(yintercept = 1, linetype = "dashed",
color = "red", linewidth = 1) +
scale_color_manual(values = c("Potentially influential" = "tomato",
"Not influential" = "steelblue")) +
labs(
title = "Cook's Distance",
subtitle = paste0("Dashed line = D > 1 threshold | ",
n_influential, " potentially influential observations"),
x = "Observation Number",
y = "Cook's Distance",
color = ""
) +
theme_minimal(base_size = 13) +
theme(legend.position = "top")
p_cooksANSWERED QUESTIONS
The Residuals vs Fitted plot does not show complete random scatter around zero, and the red line is not perfectly horizontal, suggesting potential non-linearity in the relationship. However, there appears to be no clear “fan shape” which indicates that heteroscedasticity may not be severe. The Q-Q residuals plot indicates that the residuals deviate from the reference line, suggesting that they are not normally distributed and may be right-skewed.The scale-location plot which is another equal variance check (homoscedasticity), is not flat, indicating some evidence of non-constant variance. Finally, the cook’s distance plot suggests the presence of potentially influential observations.
Given the nature of the outcome (mental health days, bounded at 0 and 30, heavily right-skewed), the normality assumption is most likely to be violated as the outcome is heavily right-skewed indicating that the residuals are not approximately normally distributed. The equal variance assumption (homoscedasticity) is also most likely to be violated since the outcome cannot go below 0-30 since this variable measures the number of unhealthy mental health days WITHIN THE LAST 30 DAYS ONLY. However, the violation of these assumptions do not necessarily mean the analysis is invalid. This is because with large sample sizes such as this, the Central Limit Theorem means mild violations cause minimal bias and homoscedasticity can withstand mild departures.
When Cook’s Distance > 1, there are 0 potentially influential observations. In a real analysis, if I had one, I would first investigate whether these observations are legitimate data points or a result from data entry or measurement error problems. If these points are determined to be legitimate but extreme, I would use robust regression techniques or assess how removing these points would affect the model results.
Suppose you are writing the results section of a public health paper. Write a 3–4 sentence paragraph summarizing the findings from Model C for a non-statistical audience. Your paragraph should:
ANSWERED QUESTION
The number of hours of sleep per night, age, sex, number of poor physical health days, and income category were associated with the number of mentally unhealthy days reported in the past 30 days. Two notable associations were observed. First, for each additional hour of sleep per night, there was an estimated decrease of about half a mentally unhealthy day on average. Second, compared to males, females reported an estimated 1.2 more mentally unhealthy days on average in the past 30 days. However, because this data is from a cross-sectional study, or a study that collects data at a single point in time, we cannot say that these factors directly cause the number of mentally unhealthy days.
End of Lab Activity