Assignment 8

Jesse Y
2025-04-05

Introduction

This assignment revisits a previous analysis of mentally unhealthy days using the Diabetes Health Indicators dataset. In Assignment 6, we used count data models to assess how health and lifestyle factors such as high blood pressure, age, and physical activity impact mental health. Here, we improve the presentation by integrating formal model summaries, well-structured tables, and proper citations using the modelsummary and tinytable packages.

Load Packages and Prepare Data

# Load and clean data
diabetes <- read_csv("Diabetes Health Indicators.csv") |> clean_names()

# Subset and format relevant variables
diabetes_model <- diabetes |>
  dplyr::select(ment_hlth, high_bp, age, phys_activity) |>
  drop_na() |>
  mutate(
    high_bp = factor(high_bp, labels = c("No", "Yes")),
    phys_activity = factor(phys_activity, labels = c("No", "Yes")),
    age = as.numeric(age)
  )

Summary Table of Key Variables

Below is a summary of key variables used in the analysis, including mentally unhealthy days and primary predictors

datasummary_skim(diabetes_model)
Unique Missing Pct. Mean SD Min Median Max Histogram
ment_hlth 31 0 3.2 7.4 0.0 0.0 30.0
age 13 0 8.0 3.1 1.0 8.0 13.0
N %
high_bp No 144851 57.1
Yes 108829 42.9
phys_activity No 61760 24.3
Yes 191920 75.7

Distribution of Mentally Unhealthy Days

ggplot(diabetes_model, aes(x = ment_hlth)) +
  geom_histogram(binwidth = 1, fill = "cadetblue") +
  labs(
    title = "Distribution of Mentally Unhealthy Days",
    x = "Days", y = "Frequency"
  ) +
  theme_minimal()

Count Models

We estimate both Poisson and Negative Binomial models to account for overdispersion in the count outcome.

# Poisson model
poisson_model <- glm(ment_hlth ~ high_bp + age + phys_activity,
                     family = poisson(link = "log"),
                     data = diabetes_model)

# Negative Binomial model
nb_model <- MASS::glm.nb(ment_hlth ~ high_bp + age + phys_activity,
                         data = diabetes_model)

Model Results Table

models <- list(
  "Poisson" = poisson_model,
  "Negative Binomial" = nb_model
)

modelsummary(models,
             output = "markdown",
             stars = TRUE,
             statistic = "std.error",
             gof_omit = "IC|Log|RMSE")
Poisson Negative Binomial
(Intercept) 2.171*** 2.261***
(0.003) (0.020)
high_bpYes 0.415*** 0.392***
(0.002) (0.013)
age -0.101*** -0.110***
(0.000) (0.002)
phys_activityYes -0.606*** -0.628***
(0.002) (0.014)
Num.Obs. 253680 253680
F 47553.921 1656.365
  • p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Standard errors are shown in parentheses. Significance levels are marked with stars.

Key Findings and Interpretation

Conclusion

This assignment improved upon earlier work by using reproducible workflows to render clean summary statistics and model output tables. These tools, especially modelsummary and tinytable, enhance the clarity and interpretability of results in applied health research. By leveraging simulation-ready output and automated table generation, this workflow promotes transparency and consistency in applied health analytics.

References

(Teboul 2023)

Teboul, Alex. 2023. “Diabetes Health Indicators.” https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset.

References