The National Health and Nutrition Examination Survey (NHANES), conducted by the National Center for Health Statistics (NCHS), is a cross-sectional, nationally representative survey of the US civilian noninstitutionalized population. The survey uses a complex, stratified, multistage probability cluster-sampling design (8). Participants are interviewed at their homes to obtain information on health history and health behaviors. Subsequently, they undergo a physical examination at a mobile examination center (MEC).

Among adults aged 20 years and older, the overall age-adjusted prevalence of diagnosed type II diabetes was 9.7% in 2013-2016. The majority of diabetes increased with age from a plurality of 1.8% among aged 20-39 years, 11.1% among 40-59 years old, and 21.0% among those aged 60 and older. ( Mendola ND, Chen T-C, Gu Q, Eberhardt MS, Saydah S. Prevalence of total, diagnosed, and undiagnosed diabetes among adults: United States, 2013–2016. NCHS Data Brief, no 319. Hyattsville, MD: National Center for Health Statistics. 2018).

First load packages and read the demographic data, keeping the variables necessary for the analysis.

pacman::p_load(gt, haven, patchwork, srvyr, survey, tidyverse,rstatix)

P_DEMO <- read_xpt("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_DEMO.XPT") %>%
  select(SEQN, RIDAGEYR, SDMVPSU, SDMVSTRA, WTMECPRP)

Read the blood pressure data remove missing values and create mean systolic and diastolic.

P_BPXO <- read_xpt("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_BPXO.XPT") %>%
  mutate(
    Systolic = rowMeans(cbind(BPXOSY1, BPXOSY2, BPXOSY3), na.rm = TRUE),
    Diastolic = rowMeans(cbind(BPXODI1, BPXODI2, BPXODI3), na.rm = TRUE)
  ) %>%
  select(SEQN, Systolic, Diastolic)

Read the diabetes questionnaire data and create diabetes indicator.

P_DIQ <- read_xpt("https://wwwn.cdc.gov/Nchs/Nhanes/2017-2018/P_DIQ.XPT") %>%
  mutate(
    Diabetes = case_when(DIQ010 == 1 ~ "Yes",
      DIQ010 %in% 2:3 ~ "No")
  ) %>%
  select(SEQN, Diabetes)

Join the files.

One <- left_join(P_DEMO, P_BPXO, by = "SEQN") %>%
  left_join(., P_DIQ, by = "SEQN")

It needs to be clarified how well GGPLOT graphs can use sample weights that account for and incorporate sampling design information; that is, the sample describes the US population. Is GGPLOT does give a different plot with a weight aesthetic?

g1 <- ggplot(One %>% filter(RIDAGEYR >= 18, !is.na(Systolic), !is.na(Diabetes), WTMECPRP > 0),
  aes(x = Systolic, y = Diabetes)) +
  geom_boxplot(fill = "#dee8ff") +
  coord_cartesian(xlim = c(80, 200)) +
  scale_x_continuous(breaks = seq(80, 200, 10)) +
  labs(title = "Unweighted",
    x = "Systolic Blood Pressure") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
    text = element_text(size = 14))

g2 <- ggplot(One %>% filter(RIDAGEYR >= 18, !is.na(Systolic), !is.na(Diabetes), WTMECPRP > 0),
  aes(x = Systolic, y = Diabetes, weight = WTMECPRP)) +
  geom_boxplot(fill = "#dee8ff") +
  coord_cartesian(xlim = c(80, 200)) +
  scale_x_continuous(breaks = seq(80, 200, 10)) +
  labs(title = "Weighted data represent the noninstitutionalized
US population",
    x = "Systolic Blood Pressure") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
    text = element_text(size = 14))

g1 / g2

### A t test for difference between mean systolic BP by diabetes status.

test <- One %>% t_test(Systolic ~ Diabetes)
options(scipen=999)
print(test)
## # A tibble: 1 × 10
##   .y.      group1 group2    n1    n2 statistic    df        p    p.adj p.adj.s…¹
## * <chr>    <chr>  <chr>  <int> <int>     <dbl> <dbl>    <dbl>    <dbl> <chr>    
## 1 Systolic No     Yes     9144  1203     -19.0 1463. 4.68e-72 4.68e-72 ****     
## # … with abbreviated variable name ¹​p.adj.signif
g3 <- ggplot(One %>% filter(RIDAGEYR >= 18, !is.na(Diastolic), !is.na(Diabetes), WTMECPRP > 0),
  aes(x = Systolic, y = Diabetes)) +
  geom_boxplot(fill = "#dee8ff") +
  coord_cartesian(xlim = c(40, 200)) +
  scale_x_continuous(breaks = seq(20, 200, 10)) +
  labs(title = "Unweighted",
    x = "Diastolic Blood Pressure") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
    text = element_text(size = 14))

g4 <- ggplot(One %>% filter(RIDAGEYR >= 18, !is.na(Diastolic), !is.na(Diabetes), WTMECPRP > 0),
  aes(x = Systolic, y = Diabetes, weight = WTMECPRP)) +
  geom_boxplot(fill = "#dee8ff") +
  coord_cartesian(xlim = c(40, 200)) +
  scale_x_continuous(breaks = seq(40, 200, 10)) +
  labs(title = "Weighted data represent the noninstitutionalized
US population",
    x = "Diastolic Blood Pressure") +
  theme_light() +
  theme(plot.title = element_text(hjust = 0.5),
    text = element_text(size = 14))

g3 / g4

### A t test for difference between mean diastolic BP by diabetes status.

test <- One %>% t_test(Diastolic ~ Diabetes)
options(scipen=999)
print(test)
## # A tibble: 1 × 10
##   .y.       group1 group2    n1    n2 statistic    df          p   p.adj p.adj…¹
## * <chr>     <chr>  <chr>  <int> <int>     <dbl> <dbl>      <dbl>   <dbl> <chr>  
## 1 Diastolic No     Yes     9144  1203     -5.96 1555.    3.13e-9 3.13e-9 ****   
## # … with abbreviated variable name ¹​p.adj.signif

Conclusion

There is a statistical difference between non-diabetics and diabetics in systolic and diastolic BP. Individuals diagnosed as people with diabetes need to control their BP better. Also, there appears to be 2-3 mmHg difference between weighted and non-weighted graphics using GGPLOT.

Note:

I thank Jeffery Hughes, NCHS/CDC, for his programming help.