Time: ~30 minutes
Goal: Learn to work with real public health survey data
in R
Learning Objectives:
# Load required packages
library(tidyverse) # Data manipulation (dplyr, ggplot2, etc.)
library(NHANES) # NHANES dataset
library(knitr) # For professional table output
library(kableExtra) # Enhanced tablesTroubleshooting: If you see an error, run this once:
Using the nhanes_analysis data, explore:
“How does hypertension prevalence vary by education level?”
Write code to:
# Group by education status then evaluate summary health statistics:
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
filter(!is.na(Education)) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)
)
print(health_by_education)## # A tibble: 5 × 4
## Education N Mean_SysBP Pct_Hypertension
## <fct> <int> <dbl> <dbl>
## 1 8th Grade 451 128. 28.3
## 2 9 - 11th Grade 888 124. 17.3
## 3 High School 1517 124. 18.9
## 4 Some College 2267 122. 16.6
## 5 College Grad 2098 119. 13.1
Create a bar chart showing hypertension by education level:
# Create visualization comparing education status and select health outcome (hypertension):
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = paste0(Pct_Hypertension, "%")),
vjust = -0.5, size = 3) +
labs(
title = "Hypertension Prevalence by Education Level",
x = "Education Level",
y = "Percent with Hypertension (%)",
caption = "Source: NHANES"
) +
ylim(0, 50) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Write 2-3 sentences:
“What does this pattern tell us about health disparities and social determinants?”
Consider: - Which education groups have highest/lowest hypertension? - What might explain these differences? - Why does this matter for public health?