## # A tibble: 10 × 76
## ID SurveyYr Gender Age AgeDecade AgeMonths Race1 Race3 Education MaritalStatus HHIncome
## <int> <fct> <fct> <int> <fct> <int> <fct> <fct> <fct> <fct> <fct>
## 1 51624 2009_10 male 34 " 30-39" 409 White <NA> High School Married 25000-349…
## 2 51624 2009_10 male 34 " 30-39" 409 White <NA> High School Married 25000-349…
## 3 51624 2009_10 male 34 " 30-39" 409 White <NA> High School Married 25000-349…
## 4 51625 2009_10 male 4 " 0-9" 49 Other <NA> <NA> <NA> 20000-249…
## 5 51630 2009_10 female 49 " 40-49" 596 White <NA> Some College LivePartner 35000-449…
## 6 51638 2009_10 male 9 " 0-9" 115 White <NA> <NA> <NA> 75000-999…
## 7 51646 2009_10 male 8 " 0-9" 101 White <NA> <NA> <NA> 55000-649…
## 8 51647 2009_10 female 45 " 40-49" 541 White <NA> College Grad Married 75000-999…
## 9 51647 2009_10 female 45 " 40-49" 541 White <NA> College Grad Married 75000-999…
## 10 51647 2009_10 female 45 " 40-49" 541 White <NA> College Grad Married 75000-999…
## # ℹ 65 more variables: HHIncomeMid <int>, Poverty <dbl>, HomeRooms <int>, HomeOwn <fct>,
## # Work <fct>, Weight <dbl>, Length <dbl>, HeadCirc <dbl>, Height <dbl>, BMI <dbl>,
## # BMICatUnder20yrs <fct>, BMI_WHO <fct>, Pulse <int>, BPSysAve <int>, BPDiaAve <int>,
## # BPSys1 <int>, BPDia1 <int>, BPSys2 <int>, BPDia2 <int>, BPSys3 <int>, BPDia3 <int>,
## # Testosterone <dbl>, DirectChol <dbl>, TotChol <dbl>, UrineVol1 <int>, UrineFlow1 <dbl>,
## # UrineVol2 <int>, UrineFlow2 <dbl>, Diabetes <fct>, DiabetesAge <int>, HealthGen <fct>,
## # DaysPhysHlthBad <int>, DaysMentHlthBad <int>, LittleInterest <fct>, Depressed <fct>, …
## tibble [10,000 × 76] (S3: tbl_df/tbl/data.frame)
## $ ID : int [1:10000] 51624 51624 51624 51625 51630 51638 51646 51647 51647 51647 ...
## $ SurveyYr : Factor w/ 2 levels "2009_10","2011_12": 1 1 1 1 1 1 1 1 1 1 ...
## $ Gender : Factor w/ 2 levels "female","male": 2 2 2 2 1 2 2 1 1 1 ...
## $ Age : int [1:10000] 34 34 34 4 49 9 8 45 45 45 ...
## $ AgeDecade : Factor w/ 8 levels " 0-9"," 10-19",..: 4 4 4 1 5 1 1 5 5 5 ...
## $ AgeMonths : int [1:10000] 409 409 409 49 596 115 101 541 541 541 ...
## $ Race1 : Factor w/ 5 levels "Black","Hispanic",..: 4 4 4 5 4 4 4 4 4 4 ...
## $ Race3 : Factor w/ 6 levels "Asian","Black",..: NA NA NA NA NA NA NA NA NA NA ...
## $ Education : Factor w/ 5 levels "8th Grade","9 - 11th Grade",..: 3 3 3 NA 4 NA NA 5 5 5 ...
## $ MaritalStatus : Factor w/ 6 levels "Divorced","LivePartner",..: 3 3 3 NA 2 NA NA 3 3 3 ...
## $ HHIncome : Factor w/ 12 levels " 0-4999"," 5000-9999",..: 6 6 6 5 7 11 9 11 11 11 ...
## $ HHIncomeMid : int [1:10000] 30000 30000 30000 22500 40000 87500 60000 87500 87500 87500 ...
## $ Poverty : num [1:10000] 1.36 1.36 1.36 1.07 1.91 1.84 2.33 5 5 5 ...
## $ HomeRooms : int [1:10000] 6 6 6 9 5 6 7 6 6 6 ...
## $ HomeOwn : Factor w/ 3 levels "Own","Rent","Other": 1 1 1 1 2 2 1 1 1 1 ...
## $ Work : Factor w/ 3 levels "Looking","NotWorking",..: 2 2 2 NA 2 NA NA 3 3 3 ...
## $ Weight : num [1:10000] 87.4 87.4 87.4 17 86.7 29.8 35.2 75.7 75.7 75.7 ...
## $ Length : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ HeadCirc : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ Height : num [1:10000] 165 165 165 105 168 ...
## $ BMI : num [1:10000] 32.2 32.2 32.2 15.3 30.6 ...
## $ BMICatUnder20yrs: Factor w/ 4 levels "UnderWeight",..: NA NA NA NA NA NA NA NA NA NA ...
## $ BMI_WHO : Factor w/ 4 levels "12.0_18.5","18.5_to_24.9",..: 4 4 4 1 4 1 2 3 3 3 ...
## $ Pulse : int [1:10000] 70 70 70 NA 86 82 72 62 62 62 ...
## $ BPSysAve : int [1:10000] 113 113 113 NA 112 86 107 118 118 118 ...
## $ BPDiaAve : int [1:10000] 85 85 85 NA 75 47 37 64 64 64 ...
## $ BPSys1 : int [1:10000] 114 114 114 NA 118 84 114 106 106 106 ...
## $ BPDia1 : int [1:10000] 88 88 88 NA 82 50 46 62 62 62 ...
## $ BPSys2 : int [1:10000] 114 114 114 NA 108 84 108 118 118 118 ...
## $ BPDia2 : int [1:10000] 88 88 88 NA 74 50 36 68 68 68 ...
## $ BPSys3 : int [1:10000] 112 112 112 NA 116 88 106 118 118 118 ...
## $ BPDia3 : int [1:10000] 82 82 82 NA 76 44 38 60 60 60 ...
## $ Testosterone : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ DirectChol : num [1:10000] 1.29 1.29 1.29 NA 1.16 1.34 1.55 2.12 2.12 2.12 ...
## $ TotChol : num [1:10000] 3.49 3.49 3.49 NA 6.7 4.86 4.09 5.82 5.82 5.82 ...
## $ UrineVol1 : int [1:10000] 352 352 352 NA 77 123 238 106 106 106 ...
## $ UrineFlow1 : num [1:10000] NA NA NA NA 0.094 ...
## $ UrineVol2 : int [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ UrineFlow2 : num [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ Diabetes : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ DiabetesAge : int [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ HealthGen : Factor w/ 5 levels "Excellent","Vgood",..: 3 3 3 NA 3 NA NA 2 2 2 ...
## $ DaysPhysHlthBad : int [1:10000] 0 0 0 NA 0 NA NA 0 0 0 ...
## $ DaysMentHlthBad : int [1:10000] 15 15 15 NA 10 NA NA 3 3 3 ...
## $ LittleInterest : Factor w/ 3 levels "None","Several",..: 3 3 3 NA 2 NA NA 1 1 1 ...
## $ Depressed : Factor w/ 3 levels "None","Several",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ nPregnancies : int [1:10000] NA NA NA NA 2 NA NA 1 1 1 ...
## $ nBabies : int [1:10000] NA NA NA NA 2 NA NA NA NA NA ...
## $ Age1stBaby : int [1:10000] NA NA NA NA 27 NA NA NA NA NA ...
## $ SleepHrsNight : int [1:10000] 4 4 4 NA 8 NA NA 8 8 8 ...
## $ SleepTrouble : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PhysActive : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 2 2 2 ...
## $ PhysActiveDays : int [1:10000] NA NA NA NA NA NA NA 5 5 5 ...
## $ TVHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ CompHrsDay : Factor w/ 7 levels "0_hrs","0_to_1_hr",..: NA NA NA NA NA NA NA NA NA NA ...
## $ TVHrsDayChild : int [1:10000] NA NA NA 4 NA 5 1 NA NA NA ...
## $ CompHrsDayChild : int [1:10000] NA NA NA 1 NA 0 6 NA NA NA ...
## $ Alcohol12PlusYr : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AlcoholDay : int [1:10000] NA NA NA NA 2 NA NA 3 3 3 ...
## $ AlcoholYear : int [1:10000] 0 0 0 NA 20 NA NA 52 52 52 ...
## $ SmokeNow : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA NA NA NA ...
## $ Smoke100 : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ Smoke100n : Factor w/ 2 levels "Non-Smoker","Smoker": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SmokeAge : int [1:10000] 18 18 18 NA 38 NA NA NA NA NA ...
## $ Marijuana : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ AgeFirstMarij : int [1:10000] 17 17 17 NA 18 NA NA 13 13 13 ...
## $ RegularMarij : Factor w/ 2 levels "No","Yes": 1 1 1 NA 1 NA NA 1 1 1 ...
## $ AgeRegMarij : int [1:10000] NA NA NA NA NA NA NA NA NA NA ...
## $ HardDrugs : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 1 1 1 ...
## $ SexEver : Factor w/ 2 levels "No","Yes": 2 2 2 NA 2 NA NA 2 2 2 ...
## $ SexAge : int [1:10000] 16 16 16 NA 12 NA NA 13 13 13 ...
## $ SexNumPartnLife : int [1:10000] 8 8 8 NA 10 NA NA 20 20 20 ...
## $ SexNumPartYear : int [1:10000] 1 1 1 NA 1 NA NA 0 0 0 ...
## $ SameSex : Factor w/ 2 levels "No","Yes": 1 1 1 NA 2 NA NA 2 2 2 ...
## $ SexOrientation : Factor w/ 3 levels "Bisexual","Heterosexual",..: 2 2 2 NA 2 NA NA 1 1 1 ...
## $ PregnantNow : Factor w/ 3 levels "Yes","No","Unknown": NA NA NA NA NA NA NA NA NA NA ...
## [1] 10000 76
# Select key variables for analysis
nhanes_analysis <- NHANES %>%
dplyr::select(
ID,
Gender, # Sex (Male/Female)
Age, # Age in years
Race1, # Race/ethnicity
Education, # Education level
BMI, # Body Mass Index
Pulse, # Resting heart rate
BPSys1, # Systolic blood pressure (1st reading)
BPDia1, # Diastolic blood pressure (1st reading)
PhysActive, # Physically active (Yes/No)
SmokeNow, # Current smoking status
Diabetes, # Diabetes diagnosis (Yes/No)
HealthGen # General health rating
) %>%
# Create a binary hypertension indicator (BPSys1 >= 140 OR BPDia1 >= 90)
mutate(
Hypertension = factor(ifelse(BPSys1 >= 140 | BPDia1 >= 90, "Yes", "No"))
)
# Remove rows with missing values for key variables
nhanes_analysis2 <- nhanes_analysis %>%
filter(complete.cases(.)) # Complete cases only
# View the processed dataset
head(nhanes_analysis, 10)## # A tibble: 10 × 14
## ID Gender Age Race1 Education BMI Pulse BPSys1 BPDia1 PhysActive SmokeNow Diabetes
## <int> <fct> <int> <fct> <fct> <dbl> <int> <int> <int> <fct> <fct> <fct>
## 1 51624 male 34 White High School 32.2 70 114 88 No No No
## 2 51624 male 34 White High School 32.2 70 114 88 No No No
## 3 51624 male 34 White High School 32.2 70 114 88 No No No
## 4 51625 male 4 Other <NA> 15.3 NA NA NA <NA> <NA> No
## 5 51630 female 49 White Some College 30.6 86 118 82 No Yes No
## 6 51638 male 9 White <NA> 16.8 82 84 50 <NA> <NA> No
## 7 51646 male 8 White <NA> 20.6 72 114 46 <NA> <NA> No
## 8 51647 female 45 White College Grad 27.2 62 106 62 Yes <NA> No
## 9 51647 female 45 White College Grad 27.2 62 106 62 Yes <NA> No
## 10 51647 female 45 White College Grad 27.2 62 106 62 Yes <NA> No
## # ℹ 2 more variables: HealthGen <fct>, Hypertension <fct>
## [1] 10000 14
Using the nhanes_analysis data, explore:
“How does hypertension prevalence vary by education level?”
Write code to:
# Your code here:
health_by_education <- nhanes_analysis %>%
group_by(Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2)
)
print(health_by_education)## # A tibble: 6 × 4
## Education N Mean_SysBP Pct_Hypertension
## <fct> <int> <dbl> <dbl>
## 1 8th Grade 451 128. 28.3
## 2 9 - 11th Grade 888 124. 17.3
## 3 High School 1517 124. 18.9
## 4 Some College 2267 122. 16.6
## 5 College Grad 2098 119. 13.1
## 6 <NA> 2779 106. 0.72
Create a bar chart showing hypertension by education level:
# Your visualization here:
health_by_education %>%
filter(!is.na(Education)) %>%
ggplot(aes(x = Education, y = Pct_Hypertension)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_text(aes(label = paste0(Pct_Hypertension, "%")),
vjust = -0.5, size = 3) +
labs(
title = "Hypertension Prevalence by Education Level",
x = "Education Level",
y = "Percent with Hypertension (%)",
caption = "Source: NHANES"
) +
ylim(0, 50) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))Write 2-3 sentences:
“What does this pattern tell us about health disparities and social determinants?”
Higher Blood Pressure is more prevalent among people who completed only 8th grade of school (28.26) and less prevalent among other groups of completed education. So people with completed 9-11th school grade/college education might be more informed about risk factors and health outcomes like Hypertension. Or people with those education levels has higher income in comparison with people who only have 8th grade of school education and it’s might affect health outcomes as well. But to be sure we need to analyse with other variables like smoking status, BMI, chronic diseases and income.
This might be important for Public Health because people who don’t have completed college or any other high education might be a potential risk group, that needs to be studied and
Consider: - Which education groups have highest/lowest hypertension? - What might explain these differences? - Why does this matter for public health?
Research Question: Are there Physical activity differences in hypertension that differ by education?
# Stratified by Physical activity and Education
health_stratified <- nhanes_analysis %>%
group_by(PhysActive, Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Mean_BMI = round(mean(BMI, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
)
print(health_stratified)## # A tibble: 13 × 6
## PhysActive Education N Mean_SysBP Mean_BMI Pct_Hypertension
## <fct> <fct> <int> <dbl> <dbl> <dbl>
## 1 No 8th Grade 339 129. 29.7 29.8
## 2 No 9 - 11th Grade 610 124. 29.7 18.9
## 3 No High School 882 125. 30.1 20.2
## 4 No Some College 1056 123. 30.0 18.4
## 5 No College Grad 546 121. 29.1 18.0
## 6 No <NA> 244 109. 24.6 1.35
## 7 Yes 8th Grade 112 127. 29.0 23.5
## 8 Yes 9 - 11th Grade 278 122. 27.9 13.7
## 9 Yes High School 635 122. 28.5 17.2
## 10 Yes Some College 1211 121. 28.5 15.1
## 11 Yes College Grad 1552 118. 27.1 11.4
## 12 Yes <NA> 861 109. 23.6 0.74
## 13 <NA> <NA> 1674 101. 17.9 0.4
# Create visualization
health_by_PhysActive_education <- nhanes_analysis %>%
filter(
!is.na(PhysActive),
!is.na(Education),
!is.na(Hypertension)
) %>%
group_by(PhysActive, Education) %>%
summarise(
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
)
ggplot(health_by_PhysActive_education, aes(x = PhysActive, y = Pct_Hypertension, fill = Education)) +
geom_col(position = "dodge", alpha = 0.8) +
labs(
title = "Hypertension Prevalence by Physical activity and Education",
subtitle = "NHANES Data",
x = "PhysActive",
y = "Prevalence (%)",
fill = "Education",
caption = "Source: National Health and Nutrition Examination Survey"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)Hypertension is still more prevalent among people with completed 8th grade level of education regardless of Physical activity status.
Research Question: Are there Diabetes differences in hypertension that differ by education?
# Stratified by Diabetes and Education
health_stratified <- nhanes_analysis %>%
group_by(Diabetes, Education) %>%
summarise(
N = n(),
Mean_SysBP = round(mean(BPSys1, na.rm = TRUE), 2),
Mean_BMI = round(mean(BMI, na.rm = TRUE), 2),
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
)
print(health_stratified)## # A tibble: 14 × 6
## Diabetes Education N Mean_SysBP Mean_BMI Pct_Hypertension
## <fct> <fct> <int> <dbl> <dbl> <dbl>
## 1 No 8th Grade 351 127. 28.8 25.6
## 2 No 9 - 11th Grade 781 122. 28.6 14.6
## 3 No High School 1352 123. 28.9 18.2
## 4 No Some College 2039 121. 28.7 15.0
## 5 No College Grad 1954 118. 27.3 11.8
## 6 No <NA> 2621 106. 20.5 0.73
## 7 Yes 8th Grade 100 134. 32.0 36.8
## 8 Yes 9 - 11th Grade 105 133. 33.1 36
## 9 Yes High School 165 127. 33.8 25
## 10 Yes Some College 228 129. 33.0 31.6
## 11 Yes College Grad 144 129. 31.3 29.6
## 12 Yes <NA> 18 100. 26.1 0
## 13 <NA> 9 - 11th Grade 2 142 40.8 100
## 14 <NA> <NA> 140 100 22.2 0
# Create visualization
health_by_Diabetes_education <- nhanes_analysis %>%
filter(
!is.na(Diabetes),
!is.na(Education),
!is.na(Hypertension)
) %>%
group_by(Diabetes, Education) %>%
summarise(
Pct_Hypertension = round(
sum(Hypertension == "Yes", na.rm = TRUE) / sum(!is.na(Hypertension)) * 100, 2),
.groups = 'drop'
)
ggplot(health_by_Diabetes_education, aes(x = Diabetes, y = Pct_Hypertension, fill = Education)) +
geom_col(position = "dodge", alpha = 0.8) +
labs(
title = "Hypertension Prevalence by Diabetes and Education",
subtitle = "NHANES Data",
x = "Diabetes",
y = "Prevalence (%)",
fill = "Education",
caption = "Source: National Health and Nutrition Examination Survey"
) +
theme_minimal() +
theme(
plot.title = element_text(size = 14, face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1)
)We can see that Hypertension is prevalent in all education level groups among people who have Diabetes. And hypertension is still more prevalent among people with completed 8th grade level of education even if a person does not have Diabetes