Lab Activity 1 Answer Key

This document provides complete solutions for all three guided practice tasks. Students should complete their own work first, then use this key to check their understanding.


Setup: Load Data and Packages

# Load required packages
library(tidyverse)
library(NHANES)
library(knitr)
library(kableExtra)

# Load the NHANES data
data(NHANES)

# Create analysis dataset (same as in lab)
nhanes_analysis <- NHANES %>%
  select(
    ID,
    Gender,
    Age,
    Race1,
    Education,
    BMI,
    Pulse,
    BPSys1,
    BPDia1,
    PhysActive,
    SmokeNow,
    Diabetes,
    HealthGen
  ) %>%
  mutate(
    Hypertension = ifelse(BPSys1 >= 140 | BPDia1 >= 90, "Yes", "No"),
    Age_Group = cut(Age, 
                    breaks = c(0, 20, 35, 50, 65, 100),
                    labels = c("18-20", "21-35", "36-50", "51-65", "65+"))
  )

Task 1: Explore Health Disparities by Education

Research Question

“How does hypertension prevalence vary by education level?”

Solution Code

# Group by education level and calculate key statistics
health_by_education <- nhanes_analysis %>%
  group_by(Education) %>%
  summarise(
    N = n(),
    Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
    Pct_Hypertension = round(
      sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
    .groups = 'drop'
  )

print(health_by_education)
## # A tibble: 6 × 4
##   Education          N Mean_SysBP Pct_Hypertension
##   <fct>          <int>      <dbl>            <dbl>
## 1 8th Grade        451       128.            28.3 
## 2 9 - 11th Grade   888       124.            17.3 
## 3 High School     1517       124.            18.9 
## 4 Some College    2267       122.            16.6 
## 5 College Grad    2098       119.            13.1 
## 6 <NA>            2779       106.             0.72

Interpretation Notes

Key findings: - Sample sizes vary across education groups - Mean systolic BP generally increases as education decreases - Hypertension prevalence shows clear social gradient

What to look for: - Higher hypertension rates in lower education groups (social determinant) - This is a classic example of health inequality - Missing education data should be noted and reported


Task 2: Create a Visualization

Solution Code

# Create bar chart
health_by_education %>%
  filter(!is.na(Education)) %>%
  ggplot(aes(x = Education, y = Pct_Hypertension)) +
  geom_col(fill = "steelblue", alpha = 0.7) +
  geom_text(aes(label = paste0(Pct_Hypertension, "%")), 
            vjust = -0.5, size = 3) +
  labs(
    title = "Hypertension Prevalence by Education Level",
    x = "Education Level",
    y = "Percent with Hypertension (%)",
    caption = "Source: NHANES"
  ) +
  ylim(0, 50) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Bar chart of hypertension prevalence by education level

Key Code Elements Explained

Code Element Purpose
filter(!is.na(Education)) Removes missing education data
geom_col() Creates bar chart
geom_text() Adds percentage labels on bars
vjust = -0.5 Positions labels above bars
ylim(0, 50) Sets y-axis limit
angle = 45 Rotates x-axis labels for readability

Task 3: Data Interpretation

Research Question

“What does this pattern tell us about health disparities and social determinants?” This pattern shows a clear education–health gradient, which is a classic example of how social determinants of health shape outcomes like hypertension. What the pattern tells us 1. Lower education is associated with higher hypertension prevalence People with less formal education (e.g., 8th grade or less) show the highest rates of hypertension, while college graduates show the lowest. This suggests that education is not just academic—it’s protective for health. 2. Education acts as a proxy for multiple social determinants Education level is closely linked to: Income and job stability → ability to afford healthy food, housing, and healthcare Health literacy → understanding nutrition, medications, blood pressure screening, and risk factors Access to healthcare → insurance coverage, preventive care, regular checkups Chronic stress exposure → financial strain, job insecurity, unsafe neighborhoods These factors accumulate over time and increase cardiovascular risk. 3. The gradient (not just extremes) matters Hypertension doesn’t suddenly drop only at college graduation—it gradually declines with each higher level of education. That stepwise pattern tells us disparities are systemic, not driven by individual choice alone. 4. It reflects structural—not biological—differences There is no biological reason education itself changes blood pressure. The pattern points to structural inequities in: Economic opportunity Access to preventive care Environmental and occupational exposures Long-term stress and allostatic load

Sample Interpretation

Here’s a strong answer (2-3 sentences):

The data reveal a clear social gradient in hypertension prevalence across education levels, with lower-educated groups showing substantially higher hypertension rates than college-educated groups. This pattern reflects the broader concept of social determinants of health—factors like income, access to preventive care, stress, and health literacy that are closely linked to education. From a public health perspective, this disparity suggests that cardiovascular disease prevention programs should be targeted to reach underserved populations and that addressing education and socioeconomic inequality may be critical for reducing hypertension burden in the population.

Grading Rubric for Task 3

Criteria Excellent (Full Credit) Adequate Needs Work
Identifies pattern Explicitly states which groups have highest/lowest rates Mentions direction but lacks specificity Vague or incorrect about pattern
Explains mechanism References social determinants, access, or health literacy Mentions inequality but lacks detail No explanation provided
Public health relevance Discusses implications for policy or programs Notes importance but general Missing public health connection
Writing quality Clear, 2-3 well-written sentences Adequate but could be clearer Incomplete or unclear

Additional Results and Extensions

Stratified by Gender AND Education

# How do gender differences in hypertension vary by education?
health_strat <- nhanes_analysis %>%
  group_by(Education, Gender) %>%
  summarise(
    N = n(),
    Pct_Hypertension = round(
      sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
    .groups = 'drop'
  ) %>%
  filter(!is.na(Education))

print(health_strat)
## # A tibble: 10 × 4
##    Education      Gender     N Pct_Hypertension
##    <fct>          <fct>  <int>            <dbl>
##  1 8th Grade      female   209            22.2 
##  2 8th Grade      male     242            33.3 
##  3 9 - 11th Grade female   402            16.1 
##  4 9 - 11th Grade male     486            18.3 
##  5 High School    female   770            20.5 
##  6 High School    male     747            17.3 
##  7 Some College   female  1197            17.5 
##  8 Some College   male    1070            15.8 
##  9 College Grad   female  1099             9.85
## 10 College Grad   male     999            16.5

Visualization: Education AND Gender

health_strat %>%
  ggplot(aes(x = Education, y = Pct_Hypertension, fill = Gender)) +
  geom_col(position = "dodge", alpha = 0.8) +
  labs(
    title = "Hypertension Prevalence by Education and Gender",
    x = "Education Level",
    y = "Prevalence (%)",
    fill = "Gender",
    caption = "Source: NHANES"
  ) +
  ylim(0, 60) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Bar chart showing hypertension by education and gender


Common Student Mistakes and Corrections

Mistake 1: Forgetting na.rm = TRUE

Incorrect:

Pct_Hypertension = sum(Hypertension == "Yes") / sum(!is.na(Hypertension)) * 100

Correct:

Pct_Hypertension = sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100

Why it matters: Without na.rm = TRUE, if there are any NA values in the Hypertension variable, the sum will return NA instead of a number.


Mistake 2: Not Filtering Missing Categories

Incorrect:

ggplot(health_by_education, aes(x = Education, y = Pct_Hypertension)) +
  geom_col()

Correct:

health_by_education %>%
  filter(!is.na(Education)) %>%
  ggplot(aes(x = Education, y = Pct_Hypertension)) +
  geom_col()

Why it matters: Including NA as a category creates an empty bar in the chart, making it harder to read.


Mistake 3: Forgetting to Round Percentages

Incorrect:

Pct_Hypertension = sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100

Correct:

Pct_Hypertension = round(
  sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)

Why it matters: Rounding to 2 decimal places makes tables more readable and professional.


Mistake 4: Not Using .groups = 'drop' in Grouped Summarise

Incorrect:

health_by_education <- nhanes_analysis %>%
  group_by(Education) %>%
  summarise(N = n())

Correct:

health_by_education <- nhanes_analysis %>%
  group_by(Education) %>%
  summarise(N = n(), .groups = 'drop')

Why it matters: Without .groups = 'drop', dplyr creates a grouped tibble, which can cause unexpected behavior in downstream operations.


Assessment Rubric: Overall Lab Performance

Scoring Guide (100 points total)

Task 1: Code (25 points)

  • ✓ Correct group_by() (5 pts)
  • ✓ Calculates N correctly (5 pts)
  • ✓ Calculates mean systolic BP correctly (5 pts)
  • ✓ Calculates hypertension percentage correctly (10 pts)

Task 2: Visualization (25 points)

  • ✓ Filters missing values (5 pts)
  • ✓ Correct plot type and aesthetics (10 pts)
  • ✓ Proper labels and formatting (5 pts)
  • ✓ Readable axis labels (5 pts)

Task 3: Interpretation (25 points)

  • ✓ Identifies specific pattern in data (8 pts)
  • ✓ Explains mechanism/social determinants (8 pts)
  • ✓ Connects to public health implications (9 pts)

Overall Code Quality (25 points)

  • ✓ Comments explain code (5 pts)
  • ✓ Code runs without errors (10 pts)
  • ✓ Output is properly formatted (5 pts)
  • ✓ Submitted as HTML file (5 pts)

Learning Objectives Checklist

By completing this lab, students should be able to:


Discussion Questions for Instructors

  1. Why did we focus on education as a determinant? (Answer: Education is a key social determinant strongly linked to health outcomes; it’s also reliably measured in surveys)

  2. What other stratifications would be informative? (Answer: Income, occupation, healthcare access, geographic region, time trends)

  3. How would you explain the education gradient to a public health administrator? (Answer: Emphasize actionable policy implications—targeted interventions for low-education groups)

  4. What is the causal pathway between education and hypertension? (Answer: Discuss mechanisms: health literacy, income, access to care, stress, health behaviors)

  5. Are there potential confounders we haven’t considered? (Answer: Age, race/ethnicity, gender—note how students might conduct stratified analyses)


Additional Resources for Instructors

  • For follow-up: Students can explore multivariate logistic regression to model hypertension risk adjusted for multiple factors
  • Extension activity: Have students compare their findings to published literature on education and hypertension
  • Real-world connection: Link to Healthy People 2030 objectives on health equity and social determinants

sessionInfo()
## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] kableExtra_1.4.0 knitr_1.51       NHANES_2.1.0     lubridate_1.9.4  forcats_1.0.1   
##  [6] stringr_1.6.0    dplyr_1.1.4      purrr_1.2.1      readr_2.1.6      tidyr_1.3.2     
## [11] tibble_3.3.1     ggplot2_4.0.1    tidyverse_2.0.0 
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       jsonlite_2.0.0     compiler_4.5.2     tidyselect_1.2.1   xml2_1.5.2        
##  [6] jquerylib_0.1.4    textshaping_1.0.4  systemfonts_1.3.1  scales_1.4.0       yaml_2.3.12       
## [11] fastmap_1.2.0      R6_2.6.1           labeling_0.4.3     generics_0.1.4     svglite_2.2.2     
## [16] bslib_0.10.0       pillar_1.11.1      RColorBrewer_1.1-3 tzdb_0.5.0         rlang_1.1.7       
## [21] utf8_1.2.6         stringi_1.8.7      cachem_1.1.0       xfun_0.56          sass_0.4.10       
## [26] S7_0.2.1           viridisLite_0.4.2  timechange_0.3.0   cli_3.6.5          withr_3.0.2       
## [31] magrittr_2.0.4     digest_0.6.39      grid_4.5.2         rstudioapi_0.18.0  hms_1.1.4         
## [36] lifecycle_1.0.5    vctrs_0.7.1        evaluate_1.0.5     glue_1.8.0         farver_2.1.2      
## [41] rmarkdown_2.30     tools_4.5.2        pkgconfig_2.0.3    htmltools_0.5.9

Answer Key Last Updated: January 29, 2026