Predicting Diabetes Risk: The Role of Socioeconomic and Behavioral Factors in U.S. Adults
Caitlin Kennedy, PharmD, MHA
University of Rhode Island
DSP 552: Computer-Based Data Exploration
Instructor: Nathan Graff
This study explored the relationship between socioeconomic and behavioral risk factors and the prevalence of diabetes among adults in the United States (U.S.), using data from the 2023 Behavioral Risk Factor Surveillance System (BRFSS). A survey-weighted logistic regression model was used to analyze the relationship between demographic, socioeconomic, and behavioral variables and a diabetes diagnosis. Additionally, a secondary survey-weighted linear regression model examined the predictors of body mass index (BMI).
The results of the study revealed that higher BMI, older age, lower income, and physical inactivity were all significantly linked to an increased likelihood of developing diabetes. Furthermore, the female-coded group had lower odds of diabetes compared to the male-coded reference group. The study also found a strong association between physical inactivity and higher BMI, suggesting both direct and indirect pathways through which behavioral and socioeconomic factors contribute to diabetes risk.
These findings highlight the importance of addressing modifiable risk factors and the disparities in social determinants of health (SDOH), as these issues significantly increase the risk of developing diabetes.
Diabetes is one of the most prevalent and costly chronic conditions in the United States and continues to affect vulnerable populations disproportionately. Persistent disparities in diabetes prevalence and outcomes suggest that the condition is shaped not only by biological factors but also by broader social and behavioral determinants. This project examined the extent to which socioeconomic and behavioral factors are associated with the prevalence of diabetes among U.S. adults.
Social determinants of health are an important consideration in population health because identifying high-risk populations and modifiable risk factors can guide targeted prevention strategies and reduce long-term healthcare burden. The project was selected for its direct relevance to population health management and the value of data-driven approaches in better understanding and addressing chronic disease risk.
Prior literature shows that lower socioeconomic status, higher BMI, and lower physical activity are associated with higher diabetes prevalence. SDOH, including income and educational attainment, influence chronic disease risk through pathways such as access to care, access to healthy foods, preventive services, and opportunities for physical activity. These findings support the use of multivariable models that evaluate multiple predictors simultaneously.
This study specifically examines how socioeconomic and behavioral factors influence the likelihood of a diabetes diagnosis among U.S. adults using a multivariable analytic approach.
The primary research question was: How do socioeconomic and behavioral factors influence the likelihood of diabetes among U.S. adults?
The data for this analysis came from the 2023 Behavioral Risk Factor Surveillance System (BRFSS), a CDC-supported telephone survey of noninstitutionalized U.S. adults. BRFSS collects demographic, socioeconomic, behavioral, and chronic disease information from a large national sample.
All code necessary to reproduce the analysis is included in this report. Due to memory constraints in the cloud environment, a subset of 15,000 observations from the 2023 BRFSS dataset was used for model fitting.
analytic_df <- readRDS("analytic_df.rds")
analytic_design <- readRDS("analytic_design.rds")
odds_ratio_table <- readRDS("odds_ratio_table.rds")
logit_model <- readRDS("logit_model.rds")
bmi_model <- readRDS("bmi_model.rds")
The primary outcome was diabetes status and the main predictors were age, BMI, income, education, physical activity, and sex.
ggplot(analytic_df, aes(x = bmi, fill = factor(diabetes))) +
geom_histogram(position = "identity", alpha = 0.5, bins = 30) +
labs(
title = "BMI Distribution by Diabetes Status",
x = "BMI (kg/m²)",
y = "Count",
fill = "Diabetes"
)
ggplot(analytic_df, aes(x = physical_activity, fill = factor(diabetes))) +
geom_bar(position = "fill") +
labs(
title = "Diabetes Prevalence by Physical Activity",
x = "Physical Activity",
y = "Proportion",
fill = "Diabetes"
)
The final analytic dataset included adults with non-missing values for diabetes status, age, BMI, sex, income, education, and physical activity. The BMI distribution showed that respondents with diabetes were more concentrated at higher BMI values than those without diabetes. Diabetes prevalence also appeared higher among respondents classified as inactive than among those classified as active, suggesting that physical activity may be an important behavioral predictor of diabetes risk.
This study used survey-weighted regression methods to account for the
complex sampling design of the BRFSS. The BRFSS incorporates weighting,
stratification, and clustering, so the variables LLCPWT,
STSTR, and PSU were included in all regression
analyses.
The primary outcome variable was diabetes status, coded as a binary variable (1 = diabetes, 0 = no diabetes). The primary predictors were BMI, age, sex, income, education, and physical activity. A survey-weighted logistic regression model was used as the primary analytic approach because the outcome was binary and the goal was to estimate adjusted odds ratios.
This approach allows for unbiased population-level estimates while accounting for the complex survey design of BRFSS.
A secondary survey-weighted linear regression model was used to examine predictors of BMI as an upstream risk factor for diabetes. This second model helped identify socioeconomic and behavioral factors that may influence diabetes indirectly through obesity.
Several limitations should be considered when interpreting these results. First, because the BRFSS is a cross-sectional survey, causal relationships cannot be established. Second, all variables are self-reported and may be subject to recall or reporting bias. Third, due to computational constraints, this analysis used a subset of the full BRFSS dataset, which may limit generalizability. Despite these limitations, the findings remain consistent with the existing literature and provide useful insights into diabetes risk factors.
kable(odds_ratio_table, caption = "Odds Ratios from Survey-Weighted Logistic Regression")
| Term | Odds_Ratio | CI_Lower | CI_Upper | |
|---|---|---|---|---|
| (Intercept) | (Intercept) | 0.001 | 0.000 | 0.002 |
| bmi | bmi | 1.074 | 1.058 | 1.090 |
| age | age | 1.047 | 1.041 | 1.054 |
| sexFemale | sexFemale | 0.777 | 0.641 | 0.942 |
| income<$10,000 | income<$10,000 | 4.042 | 1.876 | 8.706 |
| income$10,000-$14,999 | income$10,000-$14,999 | 3.331 | 1.544 | 7.186 |
| income$15,000-$19,999 | income$15,000-$19,999 | 2.645 | 1.354 | 5.164 |
| income$20,000-$24,999 | income$20,000-$24,999 | 2.719 | 1.406 | 5.256 |
| income$25,000-$34,999 | income$25,000-$34,999 | 2.522 | 1.383 | 4.601 |
| income$35,000-$49,999 | income$35,000-$49,999 | 2.550 | 1.453 | 4.475 |
| income$50,000-$74,999 | income$50,000-$74,999 | 1.745 | 1.002 | 3.038 |
| income$75,000-$99,999 | income$75,000-$99,999 | 1.849 | 1.039 | 3.290 |
| income$100,000-$149,999 | income$100,000-$149,999 | 1.591 | 0.900 | 2.812 |
| income$150,000-$199,999 | income$150,000-$199,999 | 1.263 | 0.596 | 2.674 |
| educationNever attended/Kindergarten only | educationNever attended/Kindergarten only | 0.000 | 0.000 | 0.000 |
| educationGrades 1-8 | educationGrades 1-8 | 1.568 | 0.811 | 3.035 |
| educationGrades 9-11 | educationGrades 9-11 | 0.823 | 0.517 | 1.311 |
| educationHigh school graduate/GED | educationHigh school graduate/GED | 0.990 | 0.766 | 1.280 |
| educationSome college/technical school | educationSome college/technical school | 1.159 | 0.927 | 1.449 |
| physical_activityInactive | physical_activityInactive | 1.731 | 1.406 | 2.131 |
A survey-weighted logistic regression model was used to evaluate the association between BMI, age, sex, income, education, physical activity, and diabetes status. Higher BMI and older age were both significantly associated with increased odds of diabetes. Each 1-unit increase in BMI was associated with a 7.4% increase in the odds of diabetes (OR = 1.074, 95% CI: 1.058–1.090), and each additional year of age was associated with a 4.7% increase in the odds of diabetes (OR = 1.047, 95% CI: 1.041–1.054). Compared with the male-coded reference group, the female-coded group had significantly lower odds of diabetes (OR = 0.777, 95% CI: 0.641–0.942). Physical inactivity was also significantly associated with diabetes, with inactive respondents showing 73.1% higher odds of diabetes than active respondents (OR = 1.731, 95% CI: 1.406–2.131).
Compared with the highest income category ($200,000+), respondents in several lower-income categories had significantly higher odds of diabetes, with the strongest association observed in the lowest income group (OR = 4.042, 95% CI: 1.876–8.706). Educational attainment showed mixed associations, and some sparse education categories produced unstable estimates, suggesting caution in interpretation.
A secondary survey-weighted linear regression model was used to examine predictors of BMI. Physical inactivity was significantly associated with higher BMI, with inactive respondents averaging approximately 1.42 BMI units higher than active respondents after adjusting for age, sex, income, and education. The female-coded group also had slightly higher BMI on average compared with the male-coded reference group. Several lower- and middle-income categories were associated with higher BMI relative to the highest-income category. These findings suggest that socioeconomic and behavioral factors may influence diabetes risk both directly and indirectly through BMI as an upstream risk factor.
This project examined the relationship between socioeconomic and behavioral factors and diabetes prevalence among U.S. adults using BRFSS 2023 data. The results indicated that higher BMI, older age, physical inactivity, and lower household income were associated with increased odds of diabetes. Supplementary analysis further showed that physical inactivity and several lower socioeconomic categories were associated with higher BMI, supporting the role of obesity as an upstream diabetes risk factor.
These findings are consistent with prior literature demonstrating that social determinants of health and behavioral risk factors play an important role in shaping diabetes outcomes and disparities. The results suggest that interventions focused on physical activity, obesity prevention, and support for lower-income populations may help reduce diabetes disparities. Future research could extend this work by using the full BRFSS dataset, examining additional social determinants such as geographic access to care and food environments, testing interaction effects, and comparing traditional regression approaches with machine-learning classification models.
head(analytic_df, 10)
Only the outputs referenced in the report are included. All code shown in this file reproduces the analyses presented in the main text.
AI tools were used to support code refinement, editing, and organization of the written report. All data preparation, statistical analysis, interpretation of results, and final review were completed by the author.
Bewick, V., Cheek, L., & Ball, J. (2005). Statistics review 14: Logistic regression. Critical Care, 9(1), 112–118.
Centers for Disease Control and Prevention. (2023). National Diabetes Statistics Report 2023. U.S. Department of Health and Human Services.
Hill-Briggs, F., Adler, N. E., Berkowitz, S. A., Chin, M. H., Gary-Webb, T. L., Navas-Acien, A., Thornton, P. L., & Haire-Joshu, D. (2021). Social determinants of health and diabetes: A scientific review. Diabetes Care, 44(1), 258–279.
Walker, R. J., Smalls, B. L., & Egede, L. E. (2014). Social determinants of health in adults with type 2 diabetes—Contribution of mutable and immutable factors. Diabetes Research and Clinical Practice, 106(2), 193–201.