This assignment revisits a previous analysis of mentally unhealthy days using the Diabetes Health Indicators dataset. In Assignment 6, we used count data models to assess how health and lifestyle factors such as high blood pressure, age, and physical activity impact mental health. Here, we improve the presentation by integrating formal model summaries, well-structured tables, and proper citations using the modelsummary
and tinytable
packages.
# Load and clean data
diabetes <- read_csv("Diabetes Health Indicators.csv") |> clean_names()
# Subset and format relevant variables
diabetes_model <- diabetes |>
dplyr::select(ment_hlth, high_bp, age, phys_activity) |>
drop_na() |>
mutate(
high_bp = factor(high_bp, labels = c("No", "Yes")),
phys_activity = factor(phys_activity, labels = c("No", "Yes")),
age = as.numeric(age)
)
Below is a summary of key variables used in the analysis, including mentally unhealthy days and primary predictors
datasummary_skim(diabetes_model)
Unique | Missing Pct. | Mean | SD | Min | Median | Max | Histogram | |
---|---|---|---|---|---|---|---|---|
ment_hlth | 31 | 0 | 3.2 | 7.4 | 0.0 | 0.0 | 30.0 | |
age | 13 | 0 | 8.0 | 3.1 | 1.0 | 8.0 | 13.0 | |
N | % | |||||||
high_bp | No | 144851 | 57.1 | |||||
Yes | 108829 | 42.9 | ||||||
phys_activity | No | 61760 | 24.3 | |||||
Yes | 191920 | 75.7 |
ggplot(diabetes_model, aes(x = ment_hlth)) +
geom_histogram(binwidth = 1, fill = "cadetblue") +
labs(
title = "Distribution of Mentally Unhealthy Days",
x = "Days", y = "Frequency"
) +
theme_minimal()
We estimate both Poisson and Negative Binomial models to account for overdispersion in the count outcome.
models <- list(
"Poisson" = poisson_model,
"Negative Binomial" = nb_model
)
modelsummary(models,
output = "markdown",
stars = TRUE,
statistic = "std.error",
gof_omit = "IC|Log|RMSE")
Poisson | Negative Binomial | |
---|---|---|
(Intercept) | 2.171*** | 2.261*** |
(0.003) | (0.020) | |
high_bpYes | 0.415*** | 0.392*** |
(0.002) | (0.013) | |
age | -0.101*** | -0.110*** |
(0.000) | (0.002) | |
phys_activityYes | -0.606*** | -0.628*** |
(0.002) | (0.014) | |
Num.Obs. | 253680 | 253680 |
F | 47553.921 | 1656.365 |
|
Standard errors are shown in parentheses. Significance levels are marked with stars.
Physical activity is associated with significantly fewer mentally unhealthy days, across both models.
High blood pressure predicts more unhealthy days.
Age has a small but statistically significant negative relationship, suggesting fewer mentally unhealthy days as age increases, although this may reflect differences in perception or reporting.
The Negative Binomial model is better suited here due to evidence of overdispersion, confirmed in Assignment 6.
This assignment improved upon earlier work by using reproducible workflows to render clean summary statistics and model output tables. These tools, especially modelsummary
and tinytable
, enhance the clarity and interpretability of results in applied health research. By leveraging simulation-ready output and automated table generation, this workflow promotes transparency and consistency in applied health analytics.