Time: ~30 minutes
Goal: Learn to work with real public health survey data
in R
Learning Objectives:
The NHANES is the gold standard for population-based health and nutritional data in the United States, conducted by the CDC’s National Center for Health Statistics. It combines:
Real-world use: NHANES data informs Healthy People objectives, food and nutrition guidelines, and health disparities research.
Today’s task: Explore NHANES data on cardiovascular health, physical activity, and demographic disparities—key epidemiological outcomes.
# Select key variables for analysis
nhanes_analysis <- NHANES %>%
dplyr::select(
ID,
Gender, # Sex (Male/Female)
Age, # Age in years
Race1, # Race/ethnicity
Education, # Education level
BMI, # Body Mass Index
Pulse, # Resting heart rate
BPSys1, # Systolic blood pressure (1st reading)
BPDia1, # Diastolic blood pressure (1st reading)
PhysActive, # Physically active (Yes/No)
SmokeNow, # Current smoking status
Diabetes, # Diabetes diagnosis (Yes/No)
HealthGen # General health rating
) %>%
# Create a binary hypertension indicator (BPSys1 >= 140 OR BPDia1 >= 90)
mutate(
Hypertension = factor(ifelse(BPSys1 >= 140 | BPDia1 >= 90, "Yes", "No"))
)
# Create age groups for analysis
#mutate = create
nhanes_analysis <- nhanes_analysis %>%
mutate(
Age_Group = cut(Age,
breaks = c(0, 20, 35, 50, 65, 100),
labels = c("18-20", "21-35", "36-50", "51-65", "65+"))
)“How does hypertension prevalence vary by education level?”
Write code to:
# Your code here:
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)
)
print(health_by_education)## # A tibble: 6 × 4
## Education N Mean_SysBP Pct_Hypertension
## <fct> <int> <dbl> <dbl>
## 1 8th Grade 451 128. 28.3
## 2 9 - 11th Grade 888 124. 17.3
## 3 High School 1517 124. 18.9
## 4 Some College 2267 122. 16.6
## 5 College Grad 2098 119. 13.1
## 6 <NA> 2779 106. 0.72
Create a bar chart showing hypertension by education level:
# Create visualization here:
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = paste0(Pct_Hypertension, "%")),
vjust = -0.5, size = 3) +
labs(
title = "Hypertension Prevalence by Education Level",
x = "Education Level",
y = "Percent with Hypertension (%)",
caption = "Source: NHANES"
) +
ylim(0, 50) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))“What does this pattern tell us about health disparities and social determinants?”
Consider: - Which education groups have highest/lowest hypertension? - What might explain these differences? - Why does this matter for public health?
The data reveals that lower education groups have significantly higher hypertension rates, compared to the college education groups who have the lower rates of hypertension. Individuals who face inequalities in regards to social determinants of health, such as education, are more likely to have poorer health outcomes when compared to those who are more fortunate, when there are inequalities in regards to education individuals are less likely to have the amount of health literacy they may need to keep themselves in good health. This matters for public health because it indicates programs must shift focus toward undeserved populations to effectively combat rising hypertension rates in the population.
| Criteria | Excellent (Full Credit) | Adequate | Needs Work |
|---|---|---|---|
| Identifies pattern | Explicitly states which groups have highest/lowest rates | Mentions direction but lacks specificity | Vague or incorrect about pattern |
| Explains mechanism | References social determinants, access, or health literacy | Mentions inequality but lacks detail | No explanation provided |
| Public health relevance | Discusses implications for policy or programs | Notes importance but general | Missing public health connection |
| Writing quality | Clear, 2-3 well-written sentences | Adequate but could be clearer | Incomplete or unclear |
group_by() (5 pts)Lab01_NHANES_YourName.Rmd✓ Loading data from R packages
✓ Data exploration with str(), summary(),
head()
✓ Grouping and summarizing with group_by() and
summarise()
✓ Creating derived variables with mutate()
✓ Calculating epidemiological statistics
✓ Stratification to reveal disparities
✓ Professional visualization with ggplot2
✓ Publication-ready tables
→ Make sure you ran data(NHANES) after loading the
package
→ This is normal! Always use na.rm = TRUE in
calculations
→ Use filter(!is.na(Variable)) to remove missing
groups
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.4.0 knitr_1.50 NHANES_2.1.0 lubridate_1.9.4 forcats_1.0.0
## [6] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.4 readr_2.1.5 tidyr_1.3.1
## [11] tibble_3.2.1 ggplot2_3.5.2 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.2 tidyselect_1.2.1 xml2_1.3.8
## [6] jquerylib_0.1.4 textshaping_1.0.1 systemfonts_1.3.1 scales_1.4.0 yaml_2.3.10
## [11] fastmap_1.2.0 R6_2.6.1 labeling_0.4.3 generics_0.1.4 svglite_2.2.2
## [16] bslib_0.9.0 pillar_1.10.2 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.1.6
## [21] utf8_1.2.5 cachem_1.1.0 stringi_1.8.7 xfun_0.52 sass_0.4.10
## [26] viridisLite_0.4.2 timechange_0.3.0 cli_3.6.5 withr_3.0.2 magrittr_2.0.3
## [31] digest_0.6.37 grid_4.5.2 rstudioapi_0.17.1 hms_1.1.3 lifecycle_1.0.4
## [36] vctrs_0.6.5 evaluate_1.0.3 glue_1.8.0 farver_2.1.2 rmarkdown_2.29
## [41] tools_4.5.2 pkgconfig_2.0.3 htmltools_0.5.8.1
Lab Activity 1 Complete!
Last updated: January 29, 2026