#Introduction This project is about determining if there is a connection between health concerns and poverty. I work in public health, so I have a curiosity about how social determinants of health impacts our health. The data set that I used (described below) is a widely used data set for this type of exploration. Here is a list of things that I was curious to explore using this data set: 1. Self-reported health ratings by poverty scores 2. The number of health concerns of those below the poverty threshold (less than 1 on a 0-5 scale) 3. Connection between the number of health concerns and poverty

The US National Health and Nutrition Examination Study (NHANES)

The NHANES data set from 2011-2012 was used to perform the analysis. The survey data is collected by US National Center for Health Statistics (NCHS) where they asked a series of questions related to demographics, health, lifestyle. A health examination is also conducted. More information about the survey and the data set can be found here: https://www.cdc.gov/nchs/nhanes/about_nhanes.htm

Variables

Demographic Variables: SurveyYr: survey year that the participant participated in. Age: in years at screening of study participant. Gender: Gender (sex) of study participant coded as male or female. Race3: Reported race of study participant. Poverty: A ratio of family income to poverty guidelines. Smaller numbers indicate more poverty.

Health related variables: BMI: Body mass index (weight/height2 in kg/m2). BPSysAve: Combined systolic blood pressure reading. BPDiaAve: Combined diastolic blood pressure reading. TotChol: Total HDL cholesterol in mmol/L. Diabetes: Study participant told by a doctor or health professional that they have diabetes. Depressed: Self-reported number of days where participant felt down, depressed or hopeless. SleepTrouble: Participant has told a doctor or other health professional that they had trouble sleep. HealthGen: Self-reported rating of participant’s health in general.

Disclaimers: For NHANES datasets, the use of sampling weights and sample design variables is recommended for all analyses because the sample design is a clustered design and incorporates differential prob- abilities of selection. If you fail to account for the sampling parameters, you may obtain biased estimates and overstate significance levels.

Please note that the data sets provided in this package are derived from the NHANES database and have been adapted for educational purposes. As such, they are NOT suitable for use as a research database. For research purposes you should download original data files from the NHANES website and follow the analysis instructions given there.

Load packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(RColorBrewer)
library(viridis)

## Loading required package: viridisLite

library(viridisLite)
library(treemap)

Load NHANES dataset

setwd("/Users/smhenderson/Desktop/DATA110/R/Datasets")
nhanes <- read.csv("nhanes.csv")

Data Preparation

#Create a subset of the data set to be used during the analysis
nhanes2 <- nhanes %>%
  filter(SurveyYr == "2011_12") %>%
  filter(Age >=18) %>%
  select("HealthGen", "Gender", "Race3", "Poverty", "BMI", "BPSysAve", "BPDiaAve", "Diabetes", "TotChol", "Depressed", "SleepTrouble")

#summary(nhanes2)
#colnames(nhanes2)

Explore self-reported health rating by poverty score

Firstly, I wanted to look at the self-reported health ratings by poverty scores. According to an article written by American Academy of Family Physicians (AAFP), poverty has a significant impact on health as it restricts access to essential resources, such as nutritious food, suitable housing, safe environments to reside and work in, and other aspects that contribute to an individual’s overall well-being. People living in low-income or high-poverty areas often face health challenges due to the cumulative effect of these factors. The distribution of the boxplot follows a similiar trend.

#Handle NAs and recode the HealthGen variable
nhanes_healthgen <- nhanes2 %>%
  filter(!is.na(HealthGen) & (!is.na(HealthGen) & (!is.na(Poverty)))) %>%
  mutate(HealthGen =recode(HealthGen, "Vgood" = "Very Good"))

#Prepare the data so that it can be used to create a boxplot 
nhanes_healthgen$HealthGen <- factor(nhanes_healthgen$HealthGen,
                                levels = c("Poor", "Fair", "Good", "Very Good", "Excellent"))
num_colors <- length(levels(nhanes_healthgen$HealthGen))
colors <- viridis_pal(option = "D")(num_colors)

#Create boxplot
ggplot(nhanes_healthgen, aes(x = HealthGen, y = Poverty, fill = HealthGen)) +
  geom_boxplot() +
  scale_fill_manual(values = colors) +
  labs(x = "Health Ratings", y = "Poverty Scores", caption = "Poverty Scores: A value less than 1 indicates the family is below the poverty threshold.") +
  theme(plot.caption = element_text(hjust = 0, size = 7),
        plot.title = element_text(hjust = 0.5),  
        panel.background = element_rect(fill = "white", color = "gray"),
        panel.grid.minor = element_line(color = "gray"),
        legend.position = "none") + 
  ggtitle("Self-Reported General Health Ratings by Poverty Scores")

Create an interactive box plot

p <- ggplot(nhanes_healthgen, aes(x = HealthGen, y = Poverty, fill = HealthGen)) +
  geom_boxplot() +
  labs(x = "Health Ratings", y = "Poverty Scores",
       caption = "Poverty Scores: A value less than 1 indicates the family is below the poverty threshold.") +
  theme(plot.caption = element_text(hjust = 0, size = 7),
        plot.title = element_text(hjust = 0.5),
        panel.background = element_rect(fill = "white", color = "gray"),
        panel.grid.minor = element_line(color = "gray"),
        legend.position = "none") +
  ggtitle("Self-Reported General Health Ratings by Poverty Scores")

# Convert the plot to an interactive plotly object
p_interactive <- ggplotly(p, tooltip = "text")

# Show the interactive plot
p_interactive

Create indicators for health concerns

nhanes_health <- nhanes2 %>%
  mutate(bmi2 = ifelse(BMI <= 18.5 | BMI >= 25, 1, 0)) %>%
  mutate(diabetes2 = ifelse(Diabetes == "Yes", 1, 0)) %>%
  mutate(BP = ifelse(BPSysAve <=120 & BPDiaAve <=80, 0,1)) %>%
  mutate(sleeptrouble2 = ifelse(SleepTrouble == "Yes", 1, 0)) %>%
  mutate(totchol2 = ifelse(TotChol <5.2, 0, 1)) %>%
  mutate(depressed2 = ifelse(Depressed == "Most", 1,0))
#colnames(nhanes_health)

Common health concerns by those below the poverty threshold

Next, I wanted to look at the number of health concerns by survey participants below the poverty threshold. I thought it would be interesting to see the most commonly indicated health concerns for this group. The treemap shows health concerns of survey respondents below the poverty threshold, with elevated BMI being the most frequent health concern. This isn’t surprising given that poor dietary and fitness habits can be associated with poverty, as described above by the AAFP article.

#Filter data set show those below the poverty threshold and then create a data set with only needed variables
nhanes_health2 <- nhanes_health %>%
  filter(Poverty <1) %>%
  arrange(desc(Poverty)) %>%
  select("bmi2", "diabetes2", "BP", "sleeptrouble2", "totchol2", "depressed2")

#Tally up the number of each health concern
nhanes_health3 <- gather(nhanes_health2, key = "condition", value = "value", bmi2:depressed2, na.rm=TRUE)
nhanes_health4 <- nhanes_health3 %>%
  group_by(condition) %>%
  summarise(total = sum(value)) %>%
  arrange(desc(total))

#Rename conditions in the dataframe to be shown in the treemap
nhanes_health4$condition_renamed <- c("Elevated Body Mass Index", "Elevated Blood Pressure", "Elevated Cholesterol", "Reported Sleep Troubles", "Reported Depression", " Reported Diabetes")

#Create treemap
treemap(nhanes_health4, index = "condition_renamed", vSize = "total",
        vColor = "total", type = "manual",
        palette = viridis_pal(option = "D")(length(nhanes_health4$condition)),
        title = "Health Concerns of those Below the Poverty Threshold")

Create indicators for Race & Gender for easier analysis

#Create new variable that assigns Gender & Race to each survey respondent 
nhanes_demo <- nhanes_health %>%
  mutate(race_gender = ifelse(Race3 == "Asian" & Gender == "female", "Asian Women",
                       ifelse(Race3 == "Asian" & Gender == "male", "Asian Men",
                       ifelse(Race3 == "Black" & Gender == "female", "Black Women",
                       ifelse(Race3 == "Black" & Gender == "male", "Black Men",
                       ifelse(Race3 == "Hispanic" & Gender == "female", "Latinx Women",
                       ifelse(Race3 == "Mexican" & Gender == "female", "Latinx Women",
                       ifelse(Race3 == "Hispanic" & Gender == "male", "Latinx Men",
                       ifelse(Race3 == "Mexican" & Gender == "male", "Latinx Men",
                       ifelse(Race3 == "White" & Gender == "female", "White Women",
                       ifelse(Race3 == "White" & Gender == "male", "White Men", NA))))))))))) %>%
  filter(!is.na(race_gender))

#Recode Latinx group
nhanes_demo2 <- nhanes_demo %>%
  mutate(Race3 =recode(Race3, "Mexican" = "Latinx", "Hispanic" = "Latinx"))

Tally the number of health concerns for each survey respondent

nhanes_demo3 <- nhanes_demo2 %>%
  rowwise() %>%
  mutate(healthrisks_count = sum(diabetes2, bmi2, BP, sleeptrouble2, totchol2, depressed2)) %>%
  select("Poverty", "race_gender", "healthrisks_count") %>%
  filter(!is.na(healthrisks_count) & !is.na(Poverty) & !is.na(race_gender))

Create Facet-Wrap to visualize the relationship between the # of health concerns and poverty score

The last thing that I wanted to look at was the number of health concerns by Poverty & by Race, Gender. The visualization shows that a great deal of participants has at least 3 health concerns, regardless of race/gender and poverty score. So, I took it one step further (see the next chunk)….

plot1 <- nhanes_demo3 %>%
 ggplot(aes(Poverty, healthrisks_count))+
 geom_point(aes(color = race_gender))+
 facet_wrap(~race_gender) +
 ggtitle("Number of Health Concerns by Poverty & by Race, Gender") +
  labs(x = "Poverty Score", y = "Number of Health Concerns", caption = "Poverty Score: A value less than 1 indicates the family is below the poverty threshold.") +
  theme(plot.title = element_text(hjust = 0.5),
         legend.position = "none",
         plot.caption = element_text(hjust = 0, size = 7))
plot1

Explore # of health concerns by poverty & by race, gender by survey respondents with 3+ health concerns using facet-wrap

When only looking at 3+ health concerns, we can see somewhat of a clearer picture. Asian men and women reported the least number of health scores. For most of the other groups, it appears that the number of health concerns slightly decreases as the the the score moves closer to 5.

#Filter to show 3+ health concerns 
nhanes_demo4 <- nhanes_demo3 %>%
  filter(healthrisks_count >=3)

#Create facet-wrap
plot2 <- nhanes_demo4 %>%
  ggplot(aes(Poverty, healthrisks_count)) +
  geom_point(aes(color = race_gender), size = 0.8) +
  facet_wrap(~race_gender) +
  ggtitle("Number of Health Concerns and Poverty by Race, Gender") +
  scale_y_continuous(limits = c(3, 6), breaks = seq(0, 6, 1)) +
  theme(
    panel.background = element_rect(fill = "white", color = "gray"),
    panel.grid.minor = element_line(color = "gray"),
    legend.position = "none",
    plot.title = element_text(hjust = 0.5),
    strip.background = element_rect(fill = "navyblue", color = "navyblue"),
    strip.text = element_text(color = "white"),
    plot.caption = element_text(hjust = 0, size = 7)) +
  labs(x = "Poverty Score", y = "Number of Health Concerns", caption = "Poverty Score: A value less than 1 indicates the family is below the poverty threshold.") +
  scale_color_brewer(palette = "Set1") +
geom_vline(xintercept = 1, linetype = "solid", color = "black")
plot2

Conclusion

Overall, there does not appear to be a huge difference in the number of health concerns as it compares to poverty scores. Based on the AAFR article, one would expect to see significantly lower reporting in health concerns as the poverty level moves closer to 5. A couple things to note, if this data set was weighed, it is possible that the findings may have been different. If I had more time, I would have dedicated efforts to weighing the dataset. Also, this data set was created for educational purposes only. It is unclear how much data manipulation occurred.

References: https://www.aafp.org/about/policies/all/poverty-health.html#:~:text=Poverty%20affects%20health%20by%20limiting,an%20individual’s%20standard%20of%20living.

Project 2 - Is There A Connection Between the Health Concerns & Poverty?

Shalanda Henderson

2023-07-05