This document provides complete solutions for all three guided practice tasks. Students should complete their own work first, then use this key to check their understanding.
# Load required packages
library(tidyverse)
library(NHANES)
library(knitr)
library(kableExtra)
# Load the NHANES data
data(NHANES)
# Create analysis dataset (same as in lab)
nhanes_analysis <- NHANES %>%
select(
ID,
Gender,
Age,
Race1,
Education,
BMI,
Pulse,
BPSys1,
BPDia1,
PhysActive,
SmokeNow,
Diabetes,
HealthGen
) %>%
mutate(
Hypertension = ifelse(BPSys1 >= 140 | BPDia1 >= 90, "Yes", "No"),
Age_Group = cut(Age,
breaks = c(0, 20, 35, 50, 65, 100),
labels = c("18-20", "21-35", "36-50", "51-65", "65+"))
)“How does hypertension prevalence vary by education level?”
# Group by education level and calculate key statistics
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
)
print(health_by_education)## # A tibble: 6 × 4
## Education N Mean_SysBP Pct_Hypertension
## <fct> <int> <dbl> <dbl>
## 1 8th Grade 451 128. 28.3
## 2 9 - 11th Grade 888 124. 17.3
## 3 High School 1517 124. 18.9
## 4 Some College 2267 122. 16.6
## 5 College Grad 2098 119. 13.1
## 6 <NA> 2779 106. 0.72
Key findings: - Sample sizes vary across education groups - Mean systolic BP generally increases as education decreases - Hypertension prevalence shows clear social gradient
What to look for: - Higher hypertension rates in lower education groups (social determinant) - This is a classic example of health inequality - Missing education data should be noted and reported
# Create bar chart
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = paste0(Pct_Hypertension, "%")),
vjust = -0.5, size = 3) +
labs(
title = "Hypertension Prevalence by Education Level",
x = "Education Level",
y = "Percent with Hypertension (%)",
caption = "Source: NHANES"
) +
ylim(0, 50) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))| Code Element | Purpose |
|---|---|
filter(!is.na(Education)) |
Removes missing education data |
geom_col() |
Creates bar chart |
geom_text() |
Adds percentage labels on bars |
vjust = -0.5 |
Positions labels above bars |
ylim(0, 50) |
Sets y-axis limit |
angle = 45 |
Rotates x-axis labels for readability |
“What does this pattern tell us about health disparities and social determinants?” This pattern shows a clear education–health gradient, which is a classic example of how social determinants of health shape outcomes like hypertension. What the pattern tells us 1. Lower education is associated with higher hypertension prevalence People with less formal education (e.g., 8th grade or less) show the highest rates of hypertension, while college graduates show the lowest. This suggests that education is not just academic—it’s protective for health. 2. Education acts as a proxy for multiple social determinants Education level is closely linked to: Income and job stability → ability to afford healthy food, housing, and healthcare Health literacy → understanding nutrition, medications, blood pressure screening, and risk factors Access to healthcare → insurance coverage, preventive care, regular checkups Chronic stress exposure → financial strain, job insecurity, unsafe neighborhoods These factors accumulate over time and increase cardiovascular risk. 3. The gradient (not just extremes) matters Hypertension doesn’t suddenly drop only at college graduation—it gradually declines with each higher level of education. That stepwise pattern tells us disparities are systemic, not driven by individual choice alone. 4. It reflects structural—not biological—differences There is no biological reason education itself changes blood pressure. The pattern points to structural inequities in: Economic opportunity Access to preventive care Environmental and occupational exposures Long-term stress and allostatic load
Here’s a strong answer (2-3 sentences):
The data reveal a clear social gradient in hypertension prevalence across education levels, with lower-educated groups showing substantially higher hypertension rates than college-educated groups. This pattern reflects the broader concept of social determinants of health—factors like income, access to preventive care, stress, and health literacy that are closely linked to education. From a public health perspective, this disparity suggests that cardiovascular disease prevention programs should be targeted to reach underserved populations and that addressing education and socioeconomic inequality may be critical for reducing hypertension burden in the population.
| Criteria | Excellent (Full Credit) | Adequate | Needs Work |
|---|---|---|---|
| Identifies pattern | Explicitly states which groups have highest/lowest rates | Mentions direction but lacks specificity | Vague or incorrect about pattern |
| Explains mechanism | References social determinants, access, or health literacy | Mentions inequality but lacks detail | No explanation provided |
| Public health relevance | Discusses implications for policy or programs | Notes importance but general | Missing public health connection |
| Writing quality | Clear, 2-3 well-written sentences | Adequate but could be clearer | Incomplete or unclear |
# How do gender differences in hypertension vary by education?
health_strat <- nhanes_analysis %>%
group_by(Education, Gender) %>%
summarise(
N = n(),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
) %>%
filter(!is.na(Education))
print(health_strat)## # A tibble: 10 × 4
## Education Gender N Pct_Hypertension
## <fct> <fct> <int> <dbl>
## 1 8th Grade female 209 22.2
## 2 8th Grade male 242 33.3
## 3 9 - 11th Grade female 402 16.1
## 4 9 - 11th Grade male 486 18.3
## 5 High School female 770 20.5
## 6 High School male 747 17.3
## 7 Some College female 1197 17.5
## 8 Some College male 1070 15.8
## 9 College Grad female 1099 9.85
## 10 College Grad male 999 16.5
health_strat %>%
ggplot(aes(x = Education, y = Pct_Hypertension, fill = Gender)) +
geom_col(position = "dodge", alpha = 0.8) +
labs(
title = "Hypertension Prevalence by Education and Gender",
x = "Education Level",
y = "Prevalence (%)",
fill = "Gender",
caption = "Source: NHANES"
) +
ylim(0, 60) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))na.rm = TRUE❌ Incorrect:
✅ Correct:
Why it matters: Without na.rm = TRUE,
if there are any NA values in the Hypertension
variable, the sum will return NA instead of a number.
❌ Incorrect:
✅ Correct:
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col()Why it matters: Including NA as a
category creates an empty bar in the chart, making it harder to
read.
❌ Incorrect:
✅ Correct:
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)Why it matters: Rounding to 2 decimal places makes tables more readable and professional.
.groups = 'drop' in Grouped
Summarise❌ Incorrect:
✅ Correct:
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
summarise(N = n(), .groups = 'drop')Why it matters: Without
.groups = 'drop', dplyr creates a grouped tibble, which can
cause unexpected behavior in downstream operations.
group_by() (5 pts)By completing this lab, students should be able to:
Why did we focus on education as a determinant? (Answer: Education is a key social determinant strongly linked to health outcomes; it’s also reliably measured in surveys)
What other stratifications would be informative? (Answer: Income, occupation, healthcare access, geographic region, time trends)
How would you explain the education gradient to a public health administrator? (Answer: Emphasize actionable policy implications—targeted interventions for low-education groups)
What is the causal pathway between education and hypertension? (Answer: Discuss mechanisms: health literacy, income, access to care, stress, health behaviors)
Are there potential confounders we haven’t considered? (Answer: Age, race/ethnicity, gender—note how students might conduct stratified analyses)
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
##
## Matrix products: default
## BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] kableExtra_1.4.0 knitr_1.51 NHANES_2.1.0 lubridate_1.9.4 forcats_1.0.1
## [6] stringr_1.6.0 dplyr_1.1.4 purrr_1.2.1 readr_2.1.6 tidyr_1.3.2
## [11] tibble_3.3.1 ggplot2_4.0.1 tidyverse_2.0.0
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.2 tidyselect_1.2.1 xml2_1.5.2
## [6] jquerylib_0.1.4 textshaping_1.0.4 systemfonts_1.3.1 scales_1.4.0 yaml_2.3.12
## [11] fastmap_1.2.0 R6_2.6.1 labeling_0.4.3 generics_0.1.4 svglite_2.2.2
## [16] bslib_0.10.0 pillar_1.11.1 RColorBrewer_1.1-3 tzdb_0.5.0 rlang_1.1.7
## [21] utf8_1.2.6 stringi_1.8.7 cachem_1.1.0 xfun_0.56 sass_0.4.10
## [26] S7_0.2.1 viridisLite_0.4.2 timechange_0.3.0 cli_3.6.5 withr_3.0.2
## [31] magrittr_2.0.4 digest_0.6.39 grid_4.5.2 rstudioapi_0.18.0 hms_1.1.4
## [36] lifecycle_1.0.5 vctrs_0.7.1 evaluate_1.0.5 glue_1.8.0 farver_2.1.2
## [41] rmarkdown_2.30 tools_4.5.2 pkgconfig_2.0.3 htmltools_0.5.9
Answer Key Last Updated: January 29, 2026