Multiple regression power calculation
u = 8
v = 99.13893
f2 = 0.15
sig.level = 0.05
power = 0.8
Physiological vs. Behavioral Predictors of Diabetes
1 Dataset Selection and Rationale
For this assignment, I selected the Diabetes Health Indicators (BRFSS 2015) dataset (diabetes_012_health_indicators_BRFSS2015.csv). This dataset is an excellent choice due to its large sample size, comprehensive coverage of both behavioral and physiological health indicators, and readiness for statistical analysis. It includes individual-level observations, where each record represents a unique participant, and contains a clearly defined outcome variable (Diabetes_012), indicating diabetes status. Furthermore, the dataset is already cleaned and formatted, eliminating the need for preprocessing or merging with other sources. Its scale and structure make it particularly well-suited for conducting robust power analyses and inferential modeling, ensuring reliable and generalizable findings.
2 Research Question
This study aims to investigate which category of health factors serves as a stronger predictor of diabetes among adults in the United States, using the Diabetes Health Indicators (BRFSS 2015) dataset. Specifically, the research question is:
Which group of factors—physiological (e.g., body mass index, blood pressure, cholesterol) or behavioral (e.g., smoking, physical activity, alcohol consumption)—better predicts the likelihood of diabetes among U.S. adults?
This question seeks to evaluate the relative predictive strength of measurable physiological health indicators compared to modifiable behavioral risk factors, thereby identifying which dimensions of health may be most influential for diabetes prevention and intervention strategies. Emerging evidence suggests that physiological predictors such as BMI and blood pressure often exhibit stronger associations with diabetes onset than behavioral factors; however, lifestyle behaviors such as physical activity, alcohol consumption, and smoking continue to play a critical role in mitigating risk and improving disease management (Landi et al., 2018).
3 Variables of Interest
The primary outcome variable in this study is diabetes status, represented in the dataset by the variable Diabetes_012. This variable categorizes respondents as follows:
0=No diabetes
1=Prediabetes
2=Diagnosed diabetes
The predictor variables were divided into two conceptual groups:
Physiological predictors; indicators reflecting biological and metabolic health status:
HighBP (High Blood Pressure)
HighChol (High Cholesterol)
BMI (Body Mass Index)
Age (Age category)
Sex (Biological sex: male/female)
2.Behavioral predictors; lifestyle-related and modifiable risk factors:
Smoker (Current smoking status)
PhysActivity (Physical activity engagement)
HvyAlcoholConsump (Heavy alcohol consumption status)
These variables were selected because prior epidemiological research has demonstrated their strong associations with diabetes risk (Ismail, Materwala, & Al Kaabi, 2021). By comparing these two groups of predictors, this analysis aims to determine whether behavioral factors or physiological indicators provide a stronger predictive value for diabetes among U.S. adults.
4 Power Analysis
A power analysis was conducted to determine whether the available sample size in the Behavioral Risk Factor Surveillance System (BRFSS) 2015 dataset was sufficient to detect a meaningful effect in the logistic regression model examining physiological and behavioral predictors of diabetes. Using Cohen’s convention for a medium effect size (f² = 0.15), a significance level of α = .05, and eight predictor variables, the analysis indicated that a minimum sample of approximately 109 participants would be required to achieve 80% statistical power. Given that the BRFSS dataset includes more than 250,000 individual observations, this study is highly powered (Power ≈ 1.00), confirming that the sample size is more than adequate for detecting even small-to-moderate effects.
5 Answer
Are physiological or behavioral predictors stronger indicators of diabetes among U.S. adults based on the BRFSS 2015 dataset?
5.0.1 Results and Interpretation
The logistic regression analysis identified several significant predictors of diabetes among U.S. adults in the (diabetes_012_health_indicators_BRFSS2015.csv) dataset.
Among physiological factors, individuals with high blood pressure were about 2.6 times more likely to have diabetes compared to those without high blood pressure (OR = 2.59, 95% CI [2.52, 2.66], p < .001). Similarly, those with high cholesterol were about twice as likely to have diabetes (OR = 2.02, 95% CI [1.97, 2.07], p < .001). A higher BMI was also a significant predictor, with each unit increase associated with an 8% increase in the odds of having diabetes (OR = 1.08, 95% CI [1.07, 1.08], p < .001).
For behavioral factors, smoking was modestly associated with higher odds of diabetes (OR = 1.16, 95% CI [1.14, 1.19], p < .001). Physical activity was a protective factor—individuals who were physically active had 29% lower odds of having diabetes (OR = 0.71, 95% CI [0.69, 0.72], p < .001). Conversely, heavy alcohol consumption showed a strong negative association (OR = 0.44, 95% CI [0.41, 0.47], p < .001), suggesting that frequent heavy drinking was less common among individuals with diabetes, which may reflect lifestyle changes post-diagnosis rather than a protective effect.
Age was a continuous and highly significant predictor (OR = 1.15, 95% CI [1.15, 1.16], p < .001), indicating that the likelihood of diabetes increased steadily with age. Sex also showed a small but significant difference (OR = 1.16, 95% CI [1.14, 1.19], p < .001), with males slightly more likely to have diabetes than females.
Overall, physiological predictors (HighBP, HighChol, BMI, and Age) demonstrated the strongest associations with diabetes, while behavioral predictors had smaller or mixed effects. These findings reinforce that diabetes risk is driven primarily by clinical and physiological health indicators, though lifestyle behaviors such as physical activity still play a meaningful protective role.
# A tibble: 9 × 5
term estimate conf.low conf.high p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 HighBP 2.59 2.52 2.66 0
2 HighChol 2.02 1.97 2.07 0
3 Sex 1.16 1.14 1.19 4.17e- 37
4 Smoker 1.16 1.14 1.19 2.09e- 36
5 Age 1.15 1.15 1.16 0
6 BMI 1.08 1.07 1.08 0
7 PhysActivity 0.706 0.689 0.724 1.38e-162
8 HvyAlcoholConsump 0.441 0.412 0.471 9.62e-126
9 (Intercept) 0.00302 0.00279 0.00327 0
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 0.0030202 | 0.0410931 | -141.20179 | 0 | 0.0027861 | 0.0032730 |
| Smoker | 1.1634549 | 0.0120147 | 12.60075 | 0 | 1.1363764 | 1.1911776 |
| PhysActivity | 0.7061298 | 0.0128055 | -27.17238 | 0 | 0.6886385 | 0.7240889 |
| HvyAlcoholConsump | 0.4407288 | 0.0343502 | -23.85215 | 0 | 0.4118187 | 0.4711874 |
| HighBP | 2.5872013 | 0.0134397 | 70.72899 | 0 | 2.5200021 | 2.6563266 |
| HighChol | 2.0222978 | 0.0125005 | 56.33645 | 0 | 1.9733794 | 2.0724884 |
| BMI | 1.0750314 | 0.0008680 | 83.35667 | 0 | 1.0732066 | 1.0768644 |
| Age | 1.1507126 | 0.0024475 | 57.35779 | 0 | 1.1452114 | 1.1562518 |
| Sex | 1.1648870 | 0.0119919 | 12.72729 | 0 | 1.1378255 | 1.1925901 |
**Legend
In other words:
Interpreting this figure:
5.0.2 Model Performance
The ROC curve shows that the combined model (AUC = 0.786) had the highest accuracy in distinguishing individuals with and without diabetes, followed closely by the physiological model (AUC = 0.781). The behavioral model performed considerably worse (AUC = 0.607). These results indicate that physiological factors—such as blood pressure, cholesterol, BMI, and age—are stronger predictors of diabetes than behavioral factors like smoking, alcohol use, or physical activity.
6 Limitations
Although the diabetes_012_health_indicators_BRFSS2015.csv dataset provided a large and diverse sample, it is cross-sectional and self-reported, which limits causal interpretation and may introduce recall bias. The very large sample size ensured adequate statistical power but may have made small effects statistically significant without being clinically meaningful. Additionally, the dataset lacks clinical biomarkers such as glucose or HbA1c, which would provide more direct measures of diabetes status (Malkani & Mordes, 2011) .
7 Future Data Collection
Future research should use a longitudinal cohort study to track participants over time and observe how behaviors and physiological factors predict diabetes onset. Data would be collected through both clinical assessments (BMI, blood pressure, glucose, cholesterol) and validated surveys on lifestyle behaviors. Using stratified random sampling across age, sex, and socioeconomic groups would improve representativeness. Incorporating wearable devices or digital surveys could enhance accuracy and reduce recall bias, making results more dependable.
8 References
Ismail, L., Materwala, H., & Al Kaabi, J. (2021). Association of risk factors with type 2 diabetes: A systematic review. Computational and structural biotechnology journal, 19, 1759–1785. https://doi.org/10.1016/j.csbj.2021.03.003
Landi, F., Calvani, R., Picca, A., Tosato, M., Martone, A. M., Ortolani, E., Sisto, A., D’Angelo, E., Serafini, E., Desideri, G., Fuga, M. T., & Marzetti, E. (2018). Body Mass Index is Strongly Associated with Hypertension: Results from the Longevity Check-up 7+ Study. Nutrients, 10(12), 1976. https://doi.org/10.3390/nu10121976
Malkani, S., & Mordes, J. P. (2011). Implications of using hemoglobin A1C for diagnosing diabetes mellitus. The American journal of medicine, 124(5), 395–401. https://doi.org/10.1016/j.amjmed.2010.11.025