##Part 1: Setting Up Your Works - running the libraries
# Load required packages
library(tidyverse) # Data manipulation (dplyr, ggplot2, etc.)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(NHANES) # NHANES dataset
## Warning: package 'NHANES' was built under R version 4.5.2
library(knitr) # For professional table output
library(kableExtra) # Enhanced tables
## Warning: package 'kableExtra' was built under R version 4.5.2
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
##Part 2: Loading and Exploring NHANES Data
# Load the NHANES data
data(NHANES)
##Part 3: Data Preparation and Exploration
# Select key variables for analysis
nhanes_analysis <- NHANES %>% ## |> - new pipe operator (don't like it) / shortcut CTRL, SHIFT, M
##can also write it dplyr:: select - in the case that you have another package with a select function
select(
ID,
Gender, # Sex (Male/Female)
Age, # Age in years
Race1, # Race/ethnicity
Education, # Education level
BMI, # Body Mass Index
Pulse, # Resting heart rate
BPSys1, # Systolic blood pressure (1st reading)
BPDia1, # Diastolic blood pressure (1st reading)
PhysActive, # Physically active (Yes/No)
SmokeNow, # Current smoking status
Diabetes, # Diabetes diagnosis (Yes/No)
HealthGen # General health rating
) %>%
# Create a binary hypertension indicator (BPSys1 >= 140 OR BPDia1 >= 90)
mutate(
Hypertension = ifelse(BPSys1 >= 140 | BPDia1 >= 90, "Yes", "No")
)
nhanes_analysis2 <- nhanes_analysis %>%
filter(complete.cases(.))
# View the processed dataset
head(nhanes_analysis, 10)
## # A tibble: 10 × 14
## ID Gender Age Race1 Education BMI Pulse BPSys1 BPDia1 PhysActive
## <int> <fct> <int> <fct> <fct> <dbl> <int> <int> <int> <fct>
## 1 51624 male 34 White High School 32.2 70 114 88 No
## 2 51624 male 34 White High School 32.2 70 114 88 No
## 3 51624 male 34 White High School 32.2 70 114 88 No
## 4 51625 male 4 Other <NA> 15.3 NA NA NA <NA>
## 5 51630 female 49 White Some College 30.6 86 118 82 No
## 6 51638 male 9 White <NA> 16.8 82 84 50 <NA>
## 7 51646 male 8 White <NA> 20.6 72 114 46 <NA>
## 8 51647 female 45 White College Grad 27.2 62 106 62 Yes
## 9 51647 female 45 White College Grad 27.2 62 106 62 Yes
## 10 51647 female 45 White College Grad 27.2 62 106 62 Yes
## # ℹ 4 more variables: SmokeNow <fct>, Diabetes <fct>, HealthGen <fct>,
## # Hypertension <chr>
##Lab 1 - NHANES Exploration
Task 1: Explore Health Disparities by Education (15 minutes) Using the nhanes_analysis data, explore:
“How does hypertension prevalence vary by education level?”
Write code to:
Group by education level Calculate sample size, mean systolic BP, and percent with hypertension Print the results
# Your code here:
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)
)
print(health_by_education)
## # A tibble: 6 × 4
## Education N Mean_SysBP Pct_Hypertension
## <fct> <int> <dbl> <dbl>
## 1 8th Grade 451 128. 28.3
## 2 9 - 11th Grade 888 124. 17.3
## 3 High School 1517 124. 18.9
## 4 Some College 2267 122. 16.6
## 5 College Grad 2098 119. 13.1
## 6 <NA> 2779 106. 0.72
Task 2: Create a Visualization (10 minutes) Create a bar chart showing hypertension by education level:
# Your visualization here:
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = paste0(Pct_Hypertension, "%")),
vjust = -0.5, size = 3) +
labs(
title = "Hypertension Prevalence by Education Level",
x = "Education Level",
y = "Percent with Hypertension (%)",
caption = "Source: NHANES"
) +
ylim(0, 50) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Task 3: Write a Data Interpretation (5 minutes) Write 2-3 sentences:
“What does this pattern tell us about health disparities and social determinants?”
Consider: - Which education groups have highest/lowest hypertension? - What might explain these differences? - Why does this matter for public health?
Arielle’s Response: This pattern shows that there is a relationship between health disparities and social determinants of health. Individuals with an 8th grade educational level have hypertension 2 times more than individuals with a college graduate educational level.This matters in public health because it is important to understand the reasons why someone with less education is more prone to disease, in this case - hypertension, and what can be done to help close the gap between groups.