Loading BRFSS 2023 Data

The BRFSS is a large-scale telephone survey that collects data on health-related risk behaviors, chronic health conditions, and use of preventive services from U.S. residents.

##  [1] "diabetes"       "age_group"      "age_cont"       "sex"           
##  [5] "race"           "education"      "income"         "bmi_cat"       
##  [9] "phys_active"    "current_smoker" "gen_health"     "hypertension"  
## [13] "high_chol"

## Rows: 1,281
## Columns: 13
## $ diabetes       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0…
## $ age_group      <fct> 65+, 35-44, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 65+, 45-…
## $ age_cont       <dbl> 70.0, 39.5, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 70.0, 7…
## $ sex            <fct> Female, Male, Male, Female, Female, Male, Male, Male, F…
## $ race           <fct> White, Black, White, White, White, White, White, Black,…
## $ education      <fct> Some college, Some college, College graduate, High scho…
## $ income         <fct> "$75,000+", "Unknown", "Unknown", "$50,000-$74,999", "$…
## $ bmi_cat        <fct> Obese, Obese, Normal, Normal, Overweight, Normal, Norma…
## $ phys_active    <dbl> 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0…
## $ current_smoker <dbl> 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1…
## $ gen_health     <fct> Good, Fair/Poor, Excellent/Very good, Good, Excellent/V…
## $ hypertension   <dbl> 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1…
## $ high_chol      <dbl> 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0…

1. Introduction

This lab investigates the association between demographic and behavioral factors and hypertension using data from the Behavioral Risk Factor Surveillance System (BRFSS). The primary research question is: What factors are associated with hypertension, and how do age, sex, BMI, physical activity, and smoking status predict hypertension risk?

Understanding these relationships is important for public health because hypertension is a major risk factor for cardiovascular disease, and identifying key predictors can inform targeted prevention strategies.

2. Methods

Dataset: I used the BRFSS 2023 subset data, which contains health information on adults. The analytic sample included 1281 adults with complete data on all variables of interest.

Variables: - Outcome: Hypertension (binary: 0 = No, 1 = Yes) - Predictors: Age (continuous), Sex (Male/Female), BMI category (Underweight/Normal/Overweight/Obese), Physical activity (Yes/No), Current smoking (Yes/No)

Statistical Analysis: I conducted logistic regression analysis in R, progressing from simple to multiple models. I tested for interaction (Age × BMI), performed model diagnostics, and compared models using AIC and likelihood ratio tests to select the most parsimonious yet well-fitting model.

3. Results

Descriptive Statistics

Table 1: Hypertension Prevalence by Age Group
Age Group	N	Prevalence (%)
18-24	12	8.3
25-34	77	19.5
35-44	138	30.4
45-54	161	37.9
55-64	266	51.5
65+	627	66.8

Overall hypertension prevalence was 52.7% in the sample.

Hypertension prevalence increases steadily with age, from 8.3% in young adults to 66.8% in older adults—an eight-fold increase.

Multiple Logistic Regression Results

Table 2: Adjusted Odds Ratios for Hypertension
term	OR	CI	p.value
Age (per year)	1.06	[1.05, 1.07]	< 2e-16
Sex (Male vs Female)	1.27	[1, 1.62]	0.051141
BMI: Normal vs Underweight	2.10	[0.76, 6.76]	0.175212
BMI: Overweight vs Underweight	3.24	[1.18, 10.38]	0.030291
BMI: Obese vs Underweight	6.59	[2.39, 21.18]	0.000542
Physically Active	0.90	[0.7, 1.16]	0.419260
Current Smoker	1.07	[0.82, 1.41]	0.620763

Key Findings: - Age: Each year increases odds of hypertension by 6.1% (p < 0.001) - BMI: Clear dose-response relationship - risk increases with higher BMI - Overweight: 3.24× higher odds (p = 0.030) - Obese: 6.59× higher odds (p = 0.001) - Sex: Males had 27% higher odds (borderline significant, p = 0.051) - Physical activity and smoking: Not significant in adjusted model

BMI Dummy Variables

Table 3: Dummy Variable Coding for BMI Categories
BMI Category	Dummy (Normal)	Dummy (Overweight)	Dummy (Obese)
Underweight	0	0	0
Normal	1	0	0
Overweight	0	1	0
Obese	0	0	1

Table 4: BMI Category Odds Ratios (Reference: Underweight)
Comparison	OR	X95..CI	p_value	Significant
Normal vs Underweight	2.10	[0.76, 6.76]	0.175212	No
Overweight vs Underweight	3.24	[1.18, 10.38]	0.030291	Yes
Obese vs Underweight	6.59	[2.39, 21.18]	0.000542	Yes

Interaction Test (Age × BMI)

Table 5: Likelihood Ratio Test for Interaction
Test	Chi_square	df	p_value
Age × BMI Interaction	2.24	3	0.525

The interaction is not statistically significant (p = 0.525), indicating that the effect of age on hypertension does NOT differ by BMI category. The relationship between age and hypertension is consistent across all BMI groups.

Model Diagnostics

##                    GVIF Df GVIF^(1/(2*Df))
## age_cont       1.126628  1        1.061428
## sex            1.016509  1        1.008221
## bmi_cat        1.103045  3        1.016480
## phys_active    1.024820  1        1.012334
## current_smoker 1.073574  1        1.036134

All VIF values were below 5, indicating no serious multicollinearity concerns.

Maximum Cook’s Distance was 0.033, with no observations exceeding the threshold of 1. No influential observations were detected.

The diagnostic plots showed random scatter in residuals, points following the diagonal line in the Q-Q plot, and constant variance, indicating that model assumptions were reasonably met.

Table 6: Model Comparison by AIC
Model	AIC
Model A: Age only	1636.61
Model B: Age + Sex + BMI	1576.49
Model C: Full model	1579.50

## 
## Model A vs Model B (Adding Sex + BMI):

## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont
## Model 2: hypertension ~ age_cont + sex + bmi_cat
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      1279     1632.6                          
## 2      1275     1564.5  4   68.126 5.643e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Model B vs Model C (Adding Physical Activity + Smoking):

## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont + sex + bmi_cat
## Model 2: hypertension ~ age_cont + sex + bmi_cat + phys_active + current_smoker
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      1275     1564.5                     
## 2      1273     1563.5  2  0.99112   0.6092

4. Interpretation

Main Findings:

Age is a significant predictor of hypertension (OR = 1.061 per year, p < 0.001). For each decade of age, the odds of hypertension increase by approximately 80%.
BMI shows a strong dose-response relationship with hypertension:
- Overweight adults have 3.24 times higher odds (p = 0.030)
- Obese adults have 6.59 times higher odds (p = 0.001)
- Normal weight did not differ significantly from underweight
Sex shows a borderline association (OR = 1.27, p = 0.051), suggesting males may have higher odds of hypertension, though this did not reach conventional significance.
Physical activity and smoking were not significantly associated with hypertension after adjusting for age, sex, and BMI.
No significant interaction was found between age and BMI, indicating the age effect is consistent across BMI categories.

Public Health Implications: - Weight management should be prioritized for hypertension prevention, with even greater urgency for obese individuals - Age-appropriate screening is important regardless of BMI category - The consistent age effect across BMI groups simplifies risk assessment - Interventions targeting physical activity and smoking, while important for overall health, may not directly impact hypertension risk in this population after accounting for age, sex, and BMI

5. Limitations

Cross-sectional design: Cannot establish causality – we can only describe associations, not determine whether risk factors cause hypertension.
Self-reported data: Physical activity and smoking status were self-reported, which may introduce recall bias or social desirability bias.
Wide confidence intervals for some BMI categories (especially Obese: 2.39-21.18) indicate imprecision, likely due to small sample size in the underweight reference group.
Limited generalizability: Results may not apply to populations different from this sample, such as other geographic regions or time periods.
Unmeasured confounders: Variables like diet, medication use, family history of hypertension, and socioeconomic factors were not available in this dataset.
Single year of data: Results may not reflect trends over time or long-term relationships.
Underweight reference group: The small sample size in the underweight category (n < 50) may affect stability of BMI comparisons.

6. Conclusion

Age and BMI are the strongest predictors of hypertension in this population, with a clear dose-response relationship between increasing BMI and hypertension risk. Physical activity and smoking were not significant predictors after adjusting for age, sex, and BMI. The final model (Age + Sex + BMI) provides a parsimonious yet powerful tool for understanding hypertension risk factors, though the cross-sectional design limits causal inference.

Part 2: Student Lab Activity

Lab Instructions

Task 1: Explore the Outcome Variable

Table 1: Frequency of Hypertension Status
Status	n	percent
No	606	47.3
Yes	675	52.7

Table 2: Hypertension Prevalence by Age Group
age_group	N	hypertension_cases	prevalence
18-24	12	1	8.3
25-34	77	15	19.5
35-44	138	42	30.4
45-54	161	61	37.9
55-64	266	137	51.5
65+	627	419	66.8

## # A tibble: 1 × 3
##   total_n cases prevalence
##     <int> <dbl>      <dbl>
## 1    1281   675       52.7

Questions:

What is the overall prevalence of hypertension in the dataset?

52.7% of adults in the sample have hypertension

How does hypertension prevalence vary by age group?

Hypertension prevalence increases steadily and dramatically with age, from just 8.3% in young adults to 66.8% in older adults - an 8-fold increase.

Task 2: Build a Simple Logistic Regression Model

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.048	0.296	-10.293	0	0.026	0.084
age_cont	1.055	0.005	10.996	0	1.045	1.065

Questions:

What is the odds ratio for age? Interpret this value.

Odds ratio for age = 1.055

For each 1-year increase in age, the odds of hypertension increase by 5.5%

Is the association statistically significant?

p-value = < 0.001 (highly significant)

✅ Yes, the association is statistically significant

What is the 95% confidence interval for the odds ratio?

Lower bound: 1.045

Upper bound: 1.065

Interpretation: The confidence interval does NOT contain 1, confirming the significant positive association between age and hypertension

Task 3: Create a Multiple Regression Model

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	0.008	0.653	-7.355	0.000	0.002	0.028
age_cont	1.061	0.005	11.234	0.000	1.050	1.073
sexMale	1.270	0.123	1.950	0.051	0.999	1.616
bmi_catNormal	2.097	0.546	1.356	0.175	0.759	6.756
bmi_catOverweight	3.241	0.543	2.166	0.030	1.183	10.385
bmi_catObese	6.585	0.545	3.459	0.001	2.394	21.176
phys_active	0.900	0.130	-0.808	0.419	0.697	1.162
current_smoker	1.071	0.139	0.495	0.621	0.817	1.407

Table 3: Multiple Logistic Regression Results
Term	OR	Std. Error	z-statistic	p-value	95% CI Lower	95% CI Upper
(Intercept)	0.008	0.653	-7.355	0.000	0.002	0.028
age_cont	1.061	0.005	11.234	0.000	1.050	1.073
sexMale	1.270	0.123	1.950	0.051	0.999	1.616
bmi_catNormal	2.097	0.546	1.356	0.175	0.759	6.756
bmi_catOverweight	3.241	0.543	2.166	0.030	1.183	10.385
bmi_catObese	6.585	0.545	3.459	0.001	2.394	21.176
phys_active	0.900	0.130	-0.808	0.419	0.697	1.162
current_smoker	1.071	0.139	0.495	0.621	0.817	1.407

## 
## 📊 **Age OR Comparison:**

## Simple model (age only): 1.055

## Multiple model (adjusted): 1.061

## Percent change: 0.6 %

## 
## 
## 📊 **BMI Category Results (Reference: Underweight):**

BMI Category Odds Ratios
Term	OR	Std. Error	z-statistic	p-value	95% CI Lower	95% CI Upper
bmi_catNormal	2.097	0.546	1.356	0.175	0.759	6.756
bmi_catOverweight	3.241	0.543	2.166	0.030	1.183	10.385
bmi_catObese	6.585	0.545	3.459	0.001	2.394	21.176

## 
## 
## 📊 **Strongest Predictors (Ranked by OR magnitude):**

Predictors Ranked by Effect Size
term	p.value	OR
bmi_catObese	0.000542	6.59
bmi_catOverweight	0.030291	3.24
bmi_catNormal	0.175212	2.10
sexMale	0.051141	1.27
current_smoker	0.620763	1.07
age_cont	< 2e-16	1.06
phys_active	0.419260	0.90

Questions:

How did the odds ratio for age change after adjusting for other variables?

The age OR increased slightly after adjustment, suggesting minimal confounding by the other variables.

What does this suggest about confounding?

The minimal change in the age coefficient after adjustment suggests that the relationship between age and hypertension is largely independent of sex, BMI, physical activity, and smoking status. Age is a strong, independent risk factor for hypertension.

Which variables are the strongest predictors of hypertension?

BMI is the strongest predictor of hypertension after age, with a clear dose-response relationship (higher BMI = higher odds).

Task 4: Interpret Dummy Variables

Table 4a: Dummy Variable Coding for BMI Categories
BMI Category	Dummy (Normal)	Dummy (Overweight)	Dummy (Obese)
Underweight	0	0	0
Normal	1	0	0
Overweight	0	1	0
Obese	0	0	1

## 
## ✅ **Reference category:** Underweight (all others compared to this group)

Table 4b: Odds Ratios for BMI Categories (Reference: Underweight)
Term	OR	Std. Error	z-statistic	p-value	95% CI Lower	95% CI Upper
bmi_catNormal	2.097	0.546	1.356	0.175	0.759	6.756
bmi_catOverweight	3.241	0.543	2.166	0.030	1.183	10.385
bmi_catObese	6.585	0.545	3.459	0.001	2.394	21.176

Table 4c: BMI Category Interpretation
Comparison	Odds Ratio	95% Confidence Interval	p-value	Significant?
Normal vs Underweight	2.10	[0.76, 6.76]	0.175212	No
Overweight vs Underweight	3.24	[1.18, 10.38]	0.030291	Yes
Obese vs Underweight	6.59	[2.39, 21.18]	0.000542	Yes

Questions:

What is the reference category for BMI?

The reference category for BMI is Underweight. All odds ratios compare each BMI category to underweight individuals.

Interpret the odds ratio for “Obese” compared to the reference category. Three dummy variables were created:

Normal: 1 if Normal weight, 0 otherwise

Overweight: 1 if Overweight, 0 otherwise

Obese: 1 if Obese, 0 otherwise *Underweight serves as the reference group with all dummy variables = 0.

How would you explain this to a non-statistician? After adjusting for age, sex, physical activity, and smoking:

Normal weight vs Underweight: OR = 2.10 (95% CI: 0.76-6.76, p = 0.175)

Normal weight adults have 2.1 times higher odds of hypertension compared to underweight adults, but this difference is not statistically significant (p > 0.05). The wide confidence interval crossing 1 indicates imprecision, likely due to small sample size in the underweight reference group.

Overweight vs Underweight: OR = 3.24 (95% CI: 1.18-10.38, p = 0.030)

Overweight adults have 3.24 times higher odds of hypertension compared to underweight adults. This difference is statistically significant (p < 0.05).

Obese vs Underweight: OR = 6.59 (95% CI: 2.39-21.18, p = 0.001)

Obese adults have 6.59 times higher odds of hypertension compared to underweight adults. This represents a highly significant, strong association (p < 0.001).

Task 5: Test for Interaction

Table 5a: Logistic Regression with Age × BMI Interaction
Term	OR	Std. Error	z-statistic	p-value	95% CI Lower	95% CI Upper
(Intercept)	0.235	2.558	-0.566	0.571	0.000	23.284
age_cont	1.005	0.042	0.117	0.907	0.930	1.110
bmi_catNormal	0.067	2.650	-1.020	0.308	0.001	40.725
bmi_catOverweight	0.073	2.624	-1.000	0.317	0.001	42.717
bmi_catObese	0.286	2.591	-0.484	0.629	0.003	161.547
sexMale	1.278	0.123	1.989	0.047	1.004	1.627
phys_active	0.894	0.131	-0.858	0.391	0.691	1.155
current_smoker	1.079	0.139	0.546	0.585	0.822	1.418
age_cont:bmi_catNormal	1.058	0.043	1.287	0.198	0.956	1.147
age_cont:bmi_catOverweight	1.064	0.043	1.431	0.152	0.962	1.152
age_cont:bmi_catObese	1.052	0.043	1.186	0.236	0.952	1.139

Table 5b: Age × BMI Interaction Terms
Interaction Term	OR	Std. Error	z-statistic	p-value	95% CI Lower	95% CI Upper
age_cont:bmi_catNormal	1.058	0.043	1.287	0.198	0.956	1.147
age_cont:bmi_catOverweight	1.064	0.043	1.431	0.152	0.962	1.152
age_cont:bmi_catObese	1.052	0.043	1.186	0.236	0.952	1.139

Table 5c: Likelihood Ratio Test for Interaction
Resid. Df	Resid. Dev	Df	Deviance	Pr(>Chi)
1273	1563.496	NA	NA	NA
1270	1561.260	3	2.236	0.525

## 
## 📊 **LIKELIHOOD RATIO TEST RESULTS:**

## Chi-squared statistic: 2.24

## Degrees of freedom: 3

## p-value: 0.5248

## ❌ **CONCLUSION:** The interaction is NOT statistically significant (p > 0.05).
## This means the effect of age on hypertension does NOT significantly differ by BMI category.
## The relationship between age and hypertension is consistent across BMI groups.

## 
## 📊 **STRATIFIED ANALYSIS - Age Effect by BMI Category:**

## 
## Underweight: OR = 1 (95% CI: 0.93-1.11), p = 0.918

## 
## Normal: OR = 1.06 (95% CI: 1.04-1.09), p = 4.18e-08

## 
## Overweight: OR = 1.07 (95% CI: 1.05-1.09), p = 6.73e-12

## 
## Obese: OR = 1.06 (95% CI: 1.04-1.07), p = 4.76e-14

Questions:

Is the interaction term statistically significant?

The likelihood ratio test comparing models with and without the Age × BMI interaction yielded a p-value of 0.525. Since this p-value is greater than 0.05, the interaction is NOT statistically significant.

What does this mean in epidemiologic terms (effect modification)?

The non-significant interaction indicates that effect modification is NOT present: the relationship between age and hypertension is consistent across all BMI categories. This means the effect of age on hypertension risk does not significantly differ between underweight, normal weight, overweight, and obese individuals. The age-hypertension association is uniform regardless of BMI.In epidemiologic terms, we say that BMI is not an effect modifier of the age-hypertension relationship. The absence of interaction simplifies interpretation - we can discuss the main effects of age and BMI independently without worrying about how their combination might alter risk.

Create a visualization showing predicted probabilities by age and BMI category

The plot of predicted probabilities shows roughly parallel lines across BMI categories, with each line increasing at a similar slope. This visual pattern supports the statistical finding of no significant interaction. All BMI groups show the same pattern: as age increases, hypertension probability increases at approximately the same rate.

Task 6: Model Diagnostics

## ========================================

##                    GVIF Df GVIF^(1/(2*Df))
## age_cont       1.126628  1        1.061428
## sex            1.016509  1        1.008221
## bmi_cat        1.103045  3        1.016480
## phys_active    1.024820  1        1.012334
## current_smoker 1.073574  1        1.036134

## 
## VIF Interpretation:

## - VIF < 5: No concern

## - VIF 5-10: Moderate concern

## - VIF > 10: Serious concern

## ========================================

## Cook's D summary:

##   Min: 0

##   Max: 0.0331

##   Mean: 8e-04

##   Observations with Cook's D > 1: 0

Questions:

Are there any concerns about multicollinearity?

All VIF values are < 5, indicating no serious multicollinearity.

Are there any influential observations that might affect your results?

one detected. The Residuals vs Leverage plot shows all points within Cook’s distance contours, indicating no single observation unduly influences the results.

What would you do if you found serious violations?

If violations were found, I would: - For multicollinearity: Remove or combine correlated variables - For influential points: Conduct sensitivity analysis with/without them - For non-normality: Rely on large sample robustness or use transformations - For heteroscedasticity: Use robust standard errors - Always document all decisions transparently

Task 7: Model Comparison

## ========================================

##         df      AIC
## model_A  2 1636.613
## model_B  6 1576.487
## model_C  8 1579.496

## 
## ✅ Best model by AIC: model_B

## ========================================

## 
## Model A vs Model B (Adding Sex + BMI):

## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont
## Model 2: hypertension ~ age_cont + sex + bmi_cat
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1      1279     1632.6                          
## 2      1275     1564.5  4   68.126 5.643e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Model B vs Model C (Adding Physical Activity + Smoking):

## Analysis of Deviance Table
## 
## Model 1: hypertension ~ age_cont + sex + bmi_cat
## Model 2: hypertension ~ age_cont + sex + bmi_cat + phys_active + current_smoker
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1      1275     1564.5                     
## 2      1273     1563.5  2  0.99112   0.6092

Questions:

Which model has the best fit based on AIC?

Based on the AIC values, Model B (Age + Sex + BMI) has the lowest AIC at 1576.49, indicating it provides the best fit to the data among the three models compared. Model C has a slightly higher AIC (1579.50), and Model A has the highest AIC (1636.61).

Is the added complexity of the full model justified?

“The likelihood ratio tests show that: - Adding sex and BMI (Model A → B) significantly improved model fit (χ² = 68.13, df = 4, p < 0.001). This indicates that sex and BMI are important predictors of hypertension.*

Adding physical activity and smoking (Model B → C) did NOT significantly improve model fit (χ² = 0.99, df = 2, p = 0.609). The p-value of 0.609 is well above 0.05, indicating that physical activity and smoking do not add meaningful predictive value beyond age, sex, and BMI.

Therefore, the added complexity of the full model (Model C) is not justified by the data. The non-significant likelihood ratio test suggests that physical activity and smoking can be omitted without loss of predictive power.

Which model would you choose for your final analysis? Why?

Based on these results, I select Model B (Age + Sex + BMI) as the final model. It has the lowest AIC, and the likelihood ratio test confirms that the additional variables in Model C do not significantly improve prediction. This model is both parsimonious and statistically sound, making it the most appropriate choice for addressing the research question.

Lab Report Guidelines

Write a brief report (1-2 pages) summarizing your findings:

Introduction: State your research question
Methods: Describe your analytic approach
Results: Present key findings with tables and figures
Interpretation: Explain what your results mean
Limitations: Discuss potential issues with your analysis

Submission: Submit your completed R Markdown file and knitted HTML report.

Summary

Key Concepts Covered

Statistical modeling describes relationships between variables
Regression types depend on the outcome variable type
Logistic regression is appropriate for binary outcomes
Multiple regression controls for confounding
Dummy variables represent categorical predictors
Interactions test for effect modification
Model diagnostics check assumptions and identify problems
Model comparison helps select the best model

Important Formulas

Logistic Regression:

\[\text{logit}(p) = \log\left(\frac{p}{1-p}\right) = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p\]

Odds Ratio:

\[\text{OR} = e^{\beta_i}\]

Predicted Probability:

\[p = \frac{e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p}}{1 + e^{\beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p}}\]

References

Agresti, A. (2018). An Introduction to Categorical Data Analysis (3rd ed.). Wiley.
Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression (3rd ed.). Wiley.
Vittinghoff, E., Glidden, D. V., Shiboski, S. C., & McCulloch, C. E. (2012). Regression Methods in Biostatistics (2nd ed.). Springer.
Centers for Disease Control and Prevention. (2023). Behavioral Risk Factor Surveillance System.

Session Info

## R version 4.4.2 (2024-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] ggeffects_2.3.2  car_3.1-5        carData_3.0-6    broom_1.0.12    
##  [5] kableExtra_1.4.0 knitr_1.51       lubridate_1.9.3  forcats_1.0.0   
##  [9] stringr_1.5.1    dplyr_1.1.4      purrr_1.0.2      readr_2.1.5     
## [13] tidyr_1.3.1      tibble_3.2.1     ggplot2_4.0.2    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.10        generics_0.1.4     xml2_1.3.6         stringi_1.8.4     
##  [5] hms_1.1.4          digest_0.6.37      magrittr_2.0.3     evaluate_1.0.5    
##  [9] grid_4.4.2         timechange_0.3.0   RColorBrewer_1.1-3 fastmap_1.2.0     
## [13] jsonlite_2.0.0     backports_1.5.0    Formula_1.2-5      viridisLite_0.4.3 
## [17] scales_1.4.0       textshaping_0.4.0  jquerylib_0.1.4    abind_1.4-8       
## [21] cli_3.6.3          rlang_1.1.4        withr_3.0.2        cachem_1.1.0      
## [25] yaml_2.3.10        otel_0.2.0         datawizard_1.3.0   tools_4.4.2       
## [29] tzdb_0.4.0         vctrs_0.6.5        R6_2.6.1           lifecycle_1.0.5   
## [33] insight_1.4.6      pkgconfig_2.0.3    pillar_1.11.1      bslib_0.10.0      
## [37] gtable_0.3.6       glue_1.8.0         systemfonts_1.3.1  haven_2.5.5       
## [41] xfun_0.56          tidyselect_1.2.1   rstudioapi_0.18.0  farver_2.1.2      
## [45] htmltools_0.5.8.1  labeling_0.4.3     rmarkdown_2.30     svglite_2.2.2     
## [49] compiler_4.4.2     S7_0.2.1

Statistical Modeling in Epidemiology

EPI 553 - Advanced Epidemiologic Methods

Fizza Zaheer

`02/24/2026`

Loading BRFSS 2023 Data

1. Introduction

2. Methods

3. Results

Descriptive Statistics

Multiple Logistic Regression Results

BMI Dummy Variables

Interaction Test (Age × BMI)

Model Diagnostics

4. Interpretation

5. Limitations

6. Conclusion

Part 2: Student Lab Activity

Lab Instructions

Task 1: Explore the Outcome Variable

Hypertension prevalence increases steadily and dramatically with age, from just 8.3% in young adults to 66.8% in older adults - an 8-fold increase.

Task 2: Build a Simple Logistic Regression Model

Task 3: Create a Multiple Regression Model

BMI is the strongest predictor of hypertension after age, with a clear dose-response relationship (higher BMI = higher odds).

Task 4: Interpret Dummy Variables

Task 5: Test for Interaction

Task 6: Model Diagnostics

Task 7: Model Comparison

Lab Report Guidelines

Summary

Key Concepts Covered

Important Formulas

References