Loading BRFSS 2023 Data

The BRFSS is a large-scale telephone survey that collects data on health-related risk behaviors, chronic health conditions, and use of preventive services from U.S. residents.

##  [1] "diabetes"       "age_group"      "age_cont"       "sex"           
##  [5] "race"           "education"      "income"         "bmi_cat"       
##  [9] "phys_active"    "current_smoker" "gen_health"     "hypertension"  
## [13] "high_chol"
## Rows: 1,281
## Columns: 13
## $ diabetes       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ age_group      <fct> 65+, 35-44, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 45-…
## $ age_cont       <dbl> 70.0, 39.5, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 7…
## $ sex            <fct> Female, Male, Male, Female, Female, Male, Male, Male, F…
## $ race           <fct> White, Black, White, White, White, White, White, Black,…
## $ education      <fct> Some college, Some college, College graduate, High scho…
## $ income         <fct> "$75,000+", "Unknown", "Unknown", "$50,000-$74,999", "$…
## $ bmi_cat        <fct> Obese, Obese, Normal, Normal, Overweight, Normal, Norma…
## $ phys_active    <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0…
## $ current_smoker <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1…
## $ gen_health     <fct> Good, Fair/Poor, Excellent/Very good, Good, Excellent/V…
## $ hypertension   <dbl> 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1…
## $ high_chol      <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0…

1. Introduction

This lab investigates the association between demographic and behavioral factors and hypertension using data from the Behavioral Risk Factor Surveillance System (BRFSS). The primary research question is: What factors are associated with hypertension, and how do age, sex, BMI, physical activity, and smoking status predict hypertension risk?

Understanding these relationships is important for public health because hypertension is a major risk factor for cardiovascular disease, and identifying key predictors can inform targeted prevention strategies.


2. Methods

Dataset: I used the BRFSS 2023 subset data, which contains health information on adults. The analytic sample included 1281 adults with complete data on all variables of interest.

Variables: - Outcome: Hypertension (binary: 0 = No, 1 = Yes) - Predictors: Age (continuous), Sex (Male/Female), BMI category (Underweight/Normal/Overweight/Obese), Physical activity (Yes/No), Current smoking (Yes/No)

Statistical Analysis: I conducted logistic regression analysis in R, progressing from simple to multiple models. I tested for interaction (Age × BMI), performed model diagnostics, and compared models using AIC and likelihood ratio tests to select the most parsimonious yet well-fitting model.


3. Results

Descriptive Statistics

Table 1: Hypertension Prevalence by Age Group
Age Group N Prevalence (%)
18-24 12 8.3
25-34 77 19.5
35-44 138 30.4
45-54 161 37.9
55-64 266 51.5
65+ 627 66.8

Overall hypertension prevalence was 52.7% in the sample.

Hypertension prevalence increases steadily with age, from 8.3% in young adults to 66.8% in older adults—an eight-fold increase.


Multiple Logistic Regression Results

Table 2: Adjusted Odds Ratios for Hypertension
term OR CI p.value
Age (per year) 1.06 [1.05, 1.07] < 2e-16
Sex (Male vs Female) 1.27 [1, 1.62] 0.051141
BMI: Normal vs Underweight 2.10 [0.76, 6.76] 0.175212
BMI: Overweight vs Underweight 3.24 [1.18, 10.38] 0.030291
BMI: Obese vs Underweight 6.59 [2.39, 21.18] 0.000542
Physically Active 0.90 [0.7, 1.16] 0.419260
Current Smoker 1.07 [0.82, 1.41] 0.620763

Key Findings: - Age: Each year increases odds of hypertension by 6.1% (p < 0.001) - BMI: Clear dose-response relationship - risk increases with higher BMI - Overweight: 3.24× higher odds (p = 0.030) - Obese: 6.59× higher odds (p = 0.001) - Sex: Males had 27% higher odds (borderline significant, p = 0.051) - Physical activity and smoking: Not significant in adjusted model


BMI Dummy Variables

Table 3: Dummy Variable Coding for BMI Categories
BMI Category Dummy (Normal) Dummy (Overweight) Dummy (Obese)
Underweight 0 0 0
Normal 1 0 0
Overweight 0 1 0
Obese 0 0 1
Table 4: BMI Category Odds Ratios (Reference: Underweight)
Comparison OR X95..CI p_value Significant
Normal vs Underweight 2.10 [0.76, 6.76] 0.175212 No
Overweight vs Underweight 3.24 [1.18, 10.38] 0.030291 Yes
Obese vs Underweight 6.59 [2.39, 21.18] 0.000542 Yes

Interaction Test (Age × BMI)

Table 5: Likelihood Ratio Test for Interaction
Test Chi_square df p_value
Age × BMI Interaction 2.24 3 0.525

The interaction is not statistically significant (p = 0.525), indicating that the effect of age on hypertension does NOT differ by BMI category. The relationship between age and hypertension is consistent across all BMI groups.


Model Diagnostics

##                    GVIF Df GVIF^(1/(2*Df))
## age_cont       1.126628  1        1.061428
## sex            1.016509  1        1.008221
## bmi_cat        1.103045  3        1.016480
## phys_active    1.024820  1        1.012334
## current_smoker 1.073574  1        1.036134

All VIF values were below 5, indicating no serious multicollinearity concerns.

Maximum Cook’s Distance was 0.033, with no observations exceeding the threshold of 1. No influential observations were detected.

The diagnostic plots showed random scatter in residuals, points following the diagonal line in the Q-Q plot, and constant variance, indicating that model assumptions were reasonably met.


Table 6: Model Comparison by AIC
Model AIC
Model A: Age only 1636.61
Model B: Age + Sex + BMI 1576.49
Model C: Full model 1579.50
## 
## Model A vs Model B (Adding Sex + BMI):
## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont
## Model 2: hypertension ~ age_cont + sex + bmi_cat
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      1279     1632.6                          
## 2      1275     1564.5  4   68.126 5.643e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Model B vs Model C (Adding Physical Activity + Smoking):
## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont + sex + bmi_cat
## Model 2: hypertension ~ age_cont + sex + bmi_cat + phys_active + current_smoker
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      1275     1564.5                     
## 2      1273     1563.5  2  0.99112   0.6092

4. Interpretation

Main Findings:

  1. Age is a significant predictor of hypertension (OR = 1.061 per year, p < 0.001). For each decade of age, the odds of hypertension increase by approximately 80%.

  2. BMI shows a strong dose-response relationship with hypertension:

    • Overweight adults have 3.24 times higher odds (p = 0.030)
    • Obese adults have 6.59 times higher odds (p = 0.001)
    • Normal weight did not differ significantly from underweight
  3. Sex shows a borderline association (OR = 1.27, p = 0.051), suggesting males may have higher odds of hypertension, though this did not reach conventional significance.

  4. Physical activity and smoking were not significantly associated with hypertension after adjusting for age, sex, and BMI.

  5. No significant interaction was found between age and BMI, indicating the age effect is consistent across BMI categories.

Public Health Implications: - Weight management should be prioritized for hypertension prevention, with even greater urgency for obese individuals - Age-appropriate screening is important regardless of BMI category - The consistent age effect across BMI groups simplifies risk assessment - Interventions targeting physical activity and smoking, while important for overall health, may not directly impact hypertension risk in this population after accounting for age, sex, and BMI


5. Limitations

  1. Cross-sectional design: Cannot establish causality – we can only describe associations, not determine whether risk factors cause hypertension.

  2. Self-reported data: Physical activity and smoking status were self-reported, which may introduce recall bias or social desirability bias.

  3. Wide confidence intervals for some BMI categories (especially Obese: 2.39-21.18) indicate imprecision, likely due to small sample size in the underweight reference group.

  4. Limited generalizability: Results may not apply to populations different from this sample, such as other geographic regions or time periods.

  5. Unmeasured confounders: Variables like diet, medication use, family history of hypertension, and socioeconomic factors were not available in this dataset.

  6. Single year of data: Results may not reflect trends over time or long-term relationships.

  7. Underweight reference group: The small sample size in the underweight category (n < 50) may affect stability of BMI comparisons.


6. Conclusion

Age and BMI are the strongest predictors of hypertension in this population, with a clear dose-response relationship between increasing BMI and hypertension risk. Physical activity and smoking were not significant predictors after adjusting for age, sex, and BMI. The final model (Age + Sex + BMI) provides a parsimonious yet powerful tool for understanding hypertension risk factors, though the cross-sectional design limits causal inference.


Part 2: Student Lab Activity

Lab Instructions

Task 1: Explore the Outcome Variable

Table 1: Frequency of Hypertension Status
Status n percent
No 606 47.3
Yes 675 52.7
Table 2: Hypertension Prevalence by Age Group
age_group N hypertension_cases prevalence
18-24 12 1 8.3
25-34 77 15 19.5
35-44 138 42 30.4
45-54 161 61 37.9
55-64 266 137 51.5
65+ 627 419 66.8
## # A tibble: 1 × 3
##   total_n cases prevalence
##     <int> <dbl>      <dbl>
## 1    1281   675       52.7

Questions:

  1. What is the overall prevalence of hypertension in the dataset?

52.7% of adults in the sample have hypertension

  1. How does hypertension prevalence vary by age group?

Hypertension prevalence increases steadily and dramatically with age, from just 8.3% in young adults to 66.8% in older adults - an 8-fold increase.

Task 2: Build a Simple Logistic Regression Model

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.048 0.296 -10.293 0 0.026 0.084
age_cont 1.055 0.005 10.996 0 1.045 1.065

Questions:

  1. What is the odds ratio for age? Interpret this value.

Odds ratio for age = 1.055

For each 1-year increase in age, the odds of hypertension increase by 5.5%

  1. Is the association statistically significant?

p-value = < 0.001 (highly significant)

✅ Yes, the association is statistically significant

  1. What is the 95% confidence interval for the odds ratio?

Lower bound: 1.045

Upper bound: 1.065

Interpretation: The confidence interval does NOT contain 1, confirming the significant positive association between age and hypertension


Task 3: Create a Multiple Regression Model

term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.008 0.653 -7.355 0.000 0.002 0.028
age_cont 1.061 0.005 11.234 0.000 1.050 1.073
sexMale 1.270 0.123 1.950 0.051 0.999 1.616
bmi_catNormal 2.097 0.546 1.356 0.175 0.759 6.756
bmi_catOverweight 3.241 0.543 2.166 0.030 1.183 10.385
bmi_catObese 6.585 0.545 3.459 0.001 2.394 21.176
phys_active 0.900 0.130 -0.808 0.419 0.697 1.162
current_smoker 1.071 0.139 0.495 0.621 0.817 1.407
Table 3: Multiple Logistic Regression Results
Term OR Std. Error z-statistic p-value 95% CI Lower 95% CI Upper
(Intercept) 0.008 0.653 -7.355 0.000 0.002 0.028
age_cont 1.061 0.005 11.234 0.000 1.050 1.073
sexMale 1.270 0.123 1.950 0.051 0.999 1.616
bmi_catNormal 2.097 0.546 1.356 0.175 0.759 6.756
bmi_catOverweight 3.241 0.543 2.166 0.030 1.183 10.385
bmi_catObese 6.585 0.545 3.459 0.001 2.394 21.176
phys_active 0.900 0.130 -0.808 0.419 0.697 1.162
current_smoker 1.071 0.139 0.495 0.621 0.817 1.407
## 
## 📊 **Age OR Comparison:**
## Simple model (age only): 1.055
## Multiple model (adjusted): 1.061
## Percent change: 0.6 %
## 
## 
## 📊 **BMI Category Results (Reference: Underweight):**
BMI Category Odds Ratios
Term OR Std. Error z-statistic p-value 95% CI Lower 95% CI Upper
bmi_catNormal 2.097 0.546 1.356 0.175 0.759 6.756
bmi_catOverweight 3.241 0.543 2.166 0.030 1.183 10.385
bmi_catObese 6.585 0.545 3.459 0.001 2.394 21.176
## 
## 
## 📊 **Strongest Predictors (Ranked by OR magnitude):**
Predictors Ranked by Effect Size
term p.value OR
bmi_catObese 0.000542 6.59
bmi_catOverweight 0.030291 3.24
bmi_catNormal 0.175212 2.10
sexMale 0.051141 1.27
current_smoker 0.620763 1.07
age_cont < 2e-16 1.06
phys_active 0.419260 0.90

Questions:

  1. How did the odds ratio for age change after adjusting for other variables?

The age OR increased slightly after adjustment, suggesting minimal confounding by the other variables.

  1. What does this suggest about confounding?

The minimal change in the age coefficient after adjustment suggests that the relationship between age and hypertension is largely independent of sex, BMI, physical activity, and smoking status. Age is a strong, independent risk factor for hypertension.

  1. Which variables are the strongest predictors of hypertension?

BMI is the strongest predictor of hypertension after age, with a clear dose-response relationship (higher BMI = higher odds).

Task 4: Interpret Dummy Variables

Table 4a: Dummy Variable Coding for BMI Categories
BMI Category Dummy (Normal) Dummy (Overweight) Dummy (Obese)
Underweight 0 0 0
Normal 1 0 0
Overweight 0 1 0
Obese 0 0 1
## 
## ✅ **Reference category:** Underweight (all others compared to this group)
Table 4b: Odds Ratios for BMI Categories (Reference: Underweight)
Term OR Std. Error z-statistic p-value 95% CI Lower 95% CI Upper
bmi_catNormal 2.097 0.546 1.356 0.175 0.759 6.756
bmi_catOverweight 3.241 0.543 2.166 0.030 1.183 10.385
bmi_catObese 6.585 0.545 3.459 0.001 2.394 21.176
Table 4c: BMI Category Interpretation
Comparison Odds Ratio 95% Confidence Interval p-value Significant?
Normal vs Underweight 2.10 [0.76, 6.76] 0.175212 No
Overweight vs Underweight 3.24 [1.18, 10.38] 0.030291 Yes
Obese vs Underweight 6.59 [2.39, 21.18] 0.000542 Yes

Questions:

  1. What is the reference category for BMI?

The reference category for BMI is Underweight. All odds ratios compare each BMI category to underweight individuals.

  1. Interpret the odds ratio for “Obese” compared to the reference category. Three dummy variables were created:

Normal: 1 if Normal weight, 0 otherwise

Overweight: 1 if Overweight, 0 otherwise

Obese: 1 if Obese, 0 otherwise *Underweight serves as the reference group with all dummy variables = 0.

  1. How would you explain this to a non-statistician? After adjusting for age, sex, physical activity, and smoking:

Normal weight vs Underweight: OR = 2.10 (95% CI: 0.76-6.76, p = 0.175)

Normal weight adults have 2.1 times higher odds of hypertension compared to underweight adults, but this difference is not statistically significant (p > 0.05). The wide confidence interval crossing 1 indicates imprecision, likely due to small sample size in the underweight reference group.

Overweight vs Underweight: OR = 3.24 (95% CI: 1.18-10.38, p = 0.030)

Overweight adults have 3.24 times higher odds of hypertension compared to underweight adults. This difference is statistically significant (p < 0.05).

Obese vs Underweight: OR = 6.59 (95% CI: 2.39-21.18, p = 0.001)

Obese adults have 6.59 times higher odds of hypertension compared to underweight adults. This represents a highly significant, strong association (p < 0.001).


Task 5: Test for Interaction

Table 5a: Logistic Regression with Age × BMI Interaction
Term OR Std. Error z-statistic p-value 95% CI Lower 95% CI Upper
(Intercept) 0.235 2.558 -0.566 0.571 0.000 23.284
age_cont 1.005 0.042 0.117 0.907 0.930 1.110
bmi_catNormal 0.067 2.650 -1.020 0.308 0.001 40.725
bmi_catOverweight 0.073 2.624 -1.000 0.317 0.001 42.717
bmi_catObese 0.286 2.591 -0.484 0.629 0.003 161.547
sexMale 1.278 0.123 1.989 0.047 1.004 1.627
phys_active 0.894 0.131 -0.858 0.391 0.691 1.155
current_smoker 1.079 0.139 0.546 0.585 0.822 1.418
age_cont:bmi_catNormal 1.058 0.043 1.287 0.198 0.956 1.147
age_cont:bmi_catOverweight 1.064 0.043 1.431 0.152 0.962 1.152
age_cont:bmi_catObese 1.052 0.043 1.186 0.236 0.952 1.139
Table 5b: Age × BMI Interaction Terms
Interaction Term OR Std. Error z-statistic p-value 95% CI Lower 95% CI Upper
age_cont:bmi_catNormal 1.058 0.043 1.287 0.198 0.956 1.147
age_cont:bmi_catOverweight 1.064 0.043 1.431 0.152 0.962 1.152
age_cont:bmi_catObese 1.052 0.043 1.186 0.236 0.952 1.139
Table 5c: Likelihood Ratio Test for Interaction
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1273 1563.496 NA NA NA
1270 1561.260 3 2.236 0.525
## 
## 📊 **LIKELIHOOD RATIO TEST RESULTS:**
## Chi-squared statistic: 2.24
## Degrees of freedom: 3
## p-value: 0.5248
## ❌ **CONCLUSION:** The interaction is NOT statistically significant (p > 0.05).
## This means the effect of age on hypertension does NOT significantly differ by BMI category.
## The relationship between age and hypertension is consistent across BMI groups.

## 
## 📊 **STRATIFIED ANALYSIS - Age Effect by BMI Category:**
## 
## Underweight: OR = 1 (95% CI: 0.93-1.11), p = 0.918
## 
## Normal: OR = 1.06 (95% CI: 1.04-1.09), p = 4.18e-08
## 
## Overweight: OR = 1.07 (95% CI: 1.05-1.09), p = 6.73e-12
## 
## Obese: OR = 1.06 (95% CI: 1.04-1.07), p = 4.76e-14

Questions:

  1. Is the interaction term statistically significant?

The likelihood ratio test comparing models with and without the Age × BMI interaction yielded a p-value of 0.525. Since this p-value is greater than 0.05, the interaction is NOT statistically significant.

  1. What does this mean in epidemiologic terms (effect modification)?

The non-significant interaction indicates that effect modification is NOT present: the relationship between age and hypertension is consistent across all BMI categories. This means the effect of age on hypertension risk does not significantly differ between underweight, normal weight, overweight, and obese individuals. The age-hypertension association is uniform regardless of BMI.In epidemiologic terms, we say that BMI is not an effect modifier of the age-hypertension relationship. The absence of interaction simplifies interpretation - we can discuss the main effects of age and BMI independently without worrying about how their combination might alter risk.

  1. Create a visualization showing predicted probabilities by age and BMI category

The plot of predicted probabilities shows roughly parallel lines across BMI categories, with each line increasing at a similar slope. This visual pattern supports the statistical finding of no significant interaction. All BMI groups show the same pattern: as age increases, hypertension probability increases at approximately the same rate.


Task 6: Model Diagnostics

## ========================================
##                    GVIF Df GVIF^(1/(2*Df))
## age_cont       1.126628  1        1.061428
## sex            1.016509  1        1.008221
## bmi_cat        1.103045  3        1.016480
## phys_active    1.024820  1        1.012334
## current_smoker 1.073574  1        1.036134
## 
## VIF Interpretation:
## - VIF < 5: No concern
## - VIF 5-10: Moderate concern
## - VIF > 10: Serious concern
## ========================================
## Cook's D summary:
##   Min: 0
##   Max: 0.0331
##   Mean: 8e-04
##   Observations with Cook's D > 1: 0

Questions:

  1. Are there any concerns about multicollinearity?

All VIF values are < 5, indicating no serious multicollinearity.

  1. Are there any influential observations that might affect your results?

one detected. The Residuals vs Leverage plot shows all points within Cook’s distance contours, indicating no single observation unduly influences the results.

  1. What would you do if you found serious violations?

If violations were found, I would: - For multicollinearity: Remove or combine correlated variables - For influential points: Conduct sensitivity analysis with/without them - For non-normality: Rely on large sample robustness or use transformations - For heteroscedasticity: Use robust standard errors - Always document all decisions transparently


Task 7: Model Comparison

## ========================================
##         df      AIC
## model_A  2 1636.613
## model_B  6 1576.487
## model_C  8 1579.496
## 
## ✅ Best model by AIC: model_B
## ========================================
## 
## Model A vs Model B (Adding Sex + BMI):
## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont
## Model 2: hypertension ~ age_cont + sex + bmi_cat
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      1279     1632.6                          
## 2      1275     1564.5  4   68.126 5.643e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Model B vs Model C (Adding Physical Activity + Smoking):
## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont + sex + bmi_cat
## Model 2: hypertension ~ age_cont + sex + bmi_cat + phys_active + current_smoker
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      1275     1564.5                     
## 2      1273     1563.5  2  0.99112   0.6092

Questions:

  1. Which model has the best fit based on AIC?

Based on the AIC values, Model B (Age + Sex + BMI) has the lowest AIC at 1576.49, indicating it provides the best fit to the data among the three models compared. Model C has a slightly higher AIC (1579.50), and Model A has the highest AIC (1636.61).

  1. Is the added complexity of the full model justified?

“The likelihood ratio tests show that: - Adding sex and BMI (Model A → B) significantly improved model fit (χ² = 68.13, df = 4, p < 0.001). This indicates that sex and BMI are important predictors of hypertension.*

  • Adding physical activity and smoking (Model B → C) did NOT significantly improve model fit (χ² = 0.99, df = 2, p = 0.609). The p-value of 0.609 is well above 0.05, indicating that physical activity and smoking do not add meaningful predictive value beyond age, sex, and BMI.

Therefore, the added complexity of the full model (Model C) is not justified by the data. The non-significant likelihood ratio test suggests that physical activity and smoking can be omitted without loss of predictive power.

  1. Which model would you choose for your final analysis? Why?

Based on these results, I select Model B (Age + Sex + BMI) as the final model. It has the lowest AIC, and the likelihood ratio test confirms that the additional variables in Model C do not significantly improve prediction. This model is both parsimonious and statistically sound, making it the most appropriate choice for addressing the research question.


Lab Report Guidelines

Write a brief report (1-2 pages) summarizing your findings:

  1. Introduction: State your research question
  2. Methods: Describe your analytic approach
  3. Results: Present key findings with tables and figures
  4. Interpretation: Explain what your results mean
  5. Limitations: Discuss potential issues with your analysis

Submission: Submit your completed R Markdown file and knitted HTML report.


Summary

Key Concepts Covered

  1. Statistical modeling describes relationships between variables
  2. Regression types depend on the outcome variable type
  3. Logistic regression is appropriate for binary outcomes
  4. Multiple regression controls for confounding
  5. Dummy variables represent categorical predictors
  6. Interactions test for effect modification
  7. Model diagnostics check assumptions and identify problems
  8. Model comparison helps select the best model

Important Formulas

Logistic Regression:

\[\text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p\]

Odds Ratio:

\[\text{OR} = e^{\beta_i}\]

Predicted Probability:

\[p = \frac{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p}}{1 + e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p}}\]


References

  • Agresti, A. (2018). An Introduction to Categorical Data Analysis (3rd ed.). Wiley.
  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
  • Vittinghoff, E., Glidden, D. V., Shiboski, S. C., & McCulloch, C. E. (2012). Regression Methods in Biostatistics (2nd ed.). Springer.
  • Centers for Disease Control and Prevention. (2023). Behavioral Risk Factor Surveillance System.

Session Info

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggeffects_2.3.2  car_3.1-5        carData_3.0-6    broom_1.0.12    
##  [5] kableExtra_1.4.0 knitr_1.51       lubridate_1.9.3  forcats_1.0.0   
##  [9] stringr_1.5.1    dplyr_1.1.4      purrr_1.0.2      readr_2.1.5     
## [13] tidyr_1.3.1      tibble_3.2.1     ggplot2_4.0.2    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.10        generics_0.1.4     xml2_1.3.6         stringi_1.8.4     
##  [5] hms_1.1.4          digest_0.6.37      magrittr_2.0.3     evaluate_1.0.5    
##  [9] grid_4.4.2         timechange_0.3.0   RColorBrewer_1.1-3 fastmap_1.2.0     
## [13] jsonlite_2.0.0     backports_1.5.0    Formula_1.2-5      viridisLite_0.4.3 
## [17] scales_1.4.0       textshaping_0.4.0  jquerylib_0.1.4    abind_1.4-8       
## [21] cli_3.6.3          rlang_1.1.4        withr_3.0.2        cachem_1.1.0      
## [25] yaml_2.3.10        otel_0.2.0         datawizard_1.3.0   tools_4.4.2       
## [29] tzdb_0.4.0         vctrs_0.6.5        R6_2.6.1           lifecycle_1.0.5   
## [33] insight_1.4.6      pkgconfig_2.0.3    pillar_1.11.1      bslib_0.10.0      
## [37] gtable_0.3.6       glue_1.8.0         systemfonts_1.3.1  haven_2.5.5       
## [41] xfun_0.56          tidyselect_1.2.1   rstudioapi_0.18.0  farver_2.1.2      
## [45] htmltools_0.5.8.1  labeling_0.4.3     rmarkdown_2.30     svglite_2.2.2     
## [49] compiler_4.4.2     S7_0.2.1