Research Question

Diabetes in Humboldt Park

Humboldt Park faces a significantly elevated rate of diabetes compared to both Chicago, as a whole, and the national average. Why is that? What lifestyle factors and/or social determinants of health come into play?

This health issue is particularly critical for residents of Humboldt Park, given that diabetes ranks as the 8th leading cause of death in the United States.

Research indicates a disproportionate impact on individuals with lower socioeconomic status (SES). Various factors contribute to this, including limited access to whole foods due to financial constraints. Studies also reveal that low-income neighborhoods often lack convenient access to healthier food options, prompting reliance on less nutritious alternatives like fast food.

Moreover, the effects of socioeconomic status extend beyond income, influencing lifestyle choices. High prevalence areas of low SES are often associated with safety concerns, discouraging outdoor physical activities.

That being said, I believe that although many of these variables will be the cause of the elevated levels of diabetes within this community, financial standings and a lack of exercise will be substantially greater than the others.

To address these disparities, it is imperative to allocate government resources and services to support communities like Humboldt Park. By relieving the burdens faced by these populations, we can work towards diminishing health inequalities and fostering a healthier environment.

Library and Packages

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(paletteer)
library(knitr)
library(kableExtra)

## 
## Attaching package: 'kableExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     group_rows

Uploading the Data

Exploratory Data Upload

data = read_csv("cha_data.csv")

## New names:
## Rows: 16 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (7): Indicators, Units, Time Period, Humboldt Park, Humboldt Park Margin... dbl
## (1): Chicago, IL Time Period lgl (5): Stratification, Humboldt Park Time
## Period, Humboldt Park Stratifica...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...13`

Data was downloaded from the Chicago Health Atlas database, and a number of indicators were chosen. All of the indicators are potential determinants that may be adversely affecting the prevalence of diabetes within Humboldt Park and will be stratified throughout the course of this project.

Above is the initial data set that was acquired in order to find patterns or indicators that are significantly affecting the residents of Humboldt Park

Explanatory Data Upload

data_overtime = read_csv("cha_data_overtime.csv")

## New names:
## Rows: 16 Columns: 13
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (3): Indicators, Units, Time Period dbl (5): Humboldt Park, Humboldt Park
## Margin of Error, Chicago, IL, Chicago,... lgl (5): Stratification, Humboldt
## Park Time Period, Humboldt Park Stratifica...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...13`

This data set is a combination of the 6 indicators that were identified to be significantly affecting the residents of Humboldt Park after our exploratory analysis. This specific data set includes multiple different time intervals to see how each indicator has changed over time.

Tidying the Data

Exploratory Analysis

data2 = 
  data %>% 
  select("Indicators", "Units", "Time Period", "Humboldt Park", "Chicago, IL") %>%
  mutate(
    Indicators = ifelse(grepl("%", Units), 
                        paste(Indicators, "(%)"), 
                   ifelse(Indicators == "Particulate matter (PM 2.5) concentration", 
                          paste(Indicators, "(µg/m3)"), 
                          ifelse(Indicators == "Median household income", 
                                 paste(Indicators, "($)", sep = ""), Indicators)))) %>% 
  filter(Indicators != "Particulate matter (PM 2.5) concentration (µg/m3)" &
         Indicators != "Median household income($)") %>%
  select(-"Units")

kable(data2, format = "html", caption = "Potential Causes of Diabetes") %>%
  kable_styling()

Potential Causes of Diabetes
Indicators	Time Period	Humboldt Park	Chicago, IL
Primary care provider rate (%)	2021-2022	80.0	82.0
Routine checkup rate (%)	2021-2022	79.0	77.1
Uninsured rate (%)	2017-2021	14.89	9.75
Received needed care rate (%)	2021-2022	60.8	77.7
Health care satisfaction rate (%)	2021-2022	46.2	54.8
Neighborhood violence rate (%)	2021-2022	46.5	33.2
Neighborhood sidewalk quality rate (%)	2021-2022	44.1	51.4
Adult diabetes rate (%)	2021-2022	18.2	10.2
Adult obesity rate (%)	2021-2022	37.9	33.2
Poverty rate (%)	2017-2021	23.44	17.06
Food stamps (SNAP) (%)	2017-2021	32.45	19.56
Adult fruit and vegetable servings rate (%)	2021-2022	29.4	29.9
Easy access to fruits and vegetables rate (%)	2021-2022	56.2	58.1
Adult physical inactivity rate (%)	2021-2022	40.3	28.1

The data set was streamline by initially selecting only the variables relevant to the research question.

Next, observations with a “%” were identified in the “Units” variable and appended it to the names in the “Indicators” variable. While the indicators “Particulate matter (PM 2.5) concentration” and “Median household income” were initially included, they were later omitted to achieve consistency and continuity among the other variables. Subsequently, I retained only observations that were crucial to the research question, specifically those representing percentages of the population. To enhance clarity and make the data set tidier, I added the “(%)” to the end of each observation’s name in the “Indicators” variable.

As a final step, I removed the “Units” variable as its contents were combined with each observation/unit of analysis, resulting in a refined and focused data set.

Each unit of analysis now uniquely represents a combination of the indicator of interest, time period and location data.

Explanatory Data Analysis

data_overtime2 = 
  data_overtime %>% 
  select("Indicators", "Units", "Time Period", "Humboldt Park", "Chicago, IL") %>%
  unite(Indicator, Indicators, Units, 
        sep = " ") %>%
  mutate(Indicator = sub("\\s%.*$", " (%)", Indicator),
         `Time Period` = str_extract(`Time Period`, "\\d+$"))

data_overtime3 = 
  data_overtime2 %>% 
  filter(Indicator != "Received needed care rate (%)" &
           Indicator != "Neighborhood violence rate (%)" &
           Indicator != "Adult physical inactivity rate (%)") %>% 
  arrange(Indicator, `Time Period`)

kable(data_overtime3, format = "html", caption = "Significant Indicators Over Time") %>%
  kable_styling()

Significant Indicators Over Time
Indicator	Time Period	Humboldt Park	Chicago, IL
Adult diabetes rate (%)	2021	8.00	12.20
Adult diabetes rate (%)	2022	18.20	10.20
Food stamps (SNAP) (%)	2013	41.80	18.41
Food stamps (SNAP) (%)	2017	40.13	19.48
Food stamps (SNAP) (%)	2021	32.45	19.56
Poverty rate (%)	2013	35.58	22.63
Poverty rate (%)	2017	30.77	20.64
Poverty rate (%)	2021	23.44	17.09
Uninsured rate (%)	2013	23.40	19.73
Uninsured rate (%)	2017	17.93	12.80
Uninsured rate (%)	2021	14.89	9.75

Once the exploratory analysis had shown which indicators residents of Humboldt Park significantly suffered from, those indicators were taken along with a set of years. Similar to the tidying of the exploratory data, only the variables of interest were extracted and a new data set was formed.

Then, this time using the “unite()” function, the two variables “Indicators” and “Units” were combined. Afterwards, everything after the “%” symbol was removed to leave us with the indicator as well as the unit of measurement for each.

For convenience of plotting, the last year in the interval was kept because that was the final year of the average by using the “extract” variation of the stringr function.

Finally, due to the fact that the data base was missing some of the data, stratified by time, for certain variables, those variables were removed to make it easier for the reader to digest the information.

Coding

nonfinancial_data = 
  data2 %>% 
  filter(Indicators %in% c(
    "Primary care provider rate (%)",
    "Routine checkup rate (%)",
    "Received needed care rate (%)",
    "Health care satisfaction rate (%)",
    "Neighborhood violence rate (%)",
    "Neighborhood sidewalk quality rate (%)",
    "Adult diabetes rate (%)",
    "Adult obesity rate (%)",
    "Adult fruit and vegetable servings rate (%)",
    "Easy access to fruits and vegetables rate (%)",
    "Adult physical inactivity rate (%)"
  ))

A new object was created that contained all the indicators that were not financially related. This was done to make plotting this analysis out easier and to separate it from the other indicators.

financial_data = 
  data2 %>% 
  filter(Indicators %in% c("Uninsured rate (%)", "Poverty rate (%)", "Food stamps (SNAP) (%)"))

Similar to the last part; however, this time a new data set was created, essentially skipping a step.

Data Visualization

Exploratory Analysis

Non-Financial Indicator Data

nonfin_data_reformat = 
  nonfinancial_data %>% 
  gather(key = "Location", 
         value = "Value", 
         `Humboldt Park`, 
         `Chicago, IL`)

ggplot(nonfin_data_reformat, aes(x = Indicators, 
                                 y = as.numeric(Value), 
                                 fill = Location)) +
  geom_bar(stat = "identity", 
           position = position_dodge(width = 0.9), 
           color = "black") +
  geom_text(aes(label = sprintf("%.2f", 
                                as.numeric(Value))),
            position = position_dodge(width = 0.7),
            vjust = -1, size = 2) +
  labs(title = "Comparison of Indicators between Humboldt Park and Chicago, IL",
       x = "Indicators",
       y = "Values") +
  scale_fill_manual(values = c("Humboldt Park" = "red", "Chicago, IL" = "lightblue")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, 
                                   hjust = 1)) +
  guides(fill = guide_legend(title = "Location"))

The above bar plot provides a clear comparison of indicators unrelated to the population’s financial status. With this distinction, we are completely able to segregate the financial components that may be affecting the residents of Humboldt Park as well as Chicago, giving the reader the chance to see what lifestyle factors/behaviors may be affecting the issue of diabetes.

Notably, the data reveals the alarming trend - almost double the proportion of residents in Humboldt Park have been diagnosed with diabetes compared to the rest of Chicago, as evident in the first pair of bars. Furthermore, residents of Humboldt Park face disproportionate challenges in all indicators, with the exception of “Routine checkup rates”. The most significant disparities are observed in (1) Adult Physical Inactivity Rates, (2) Neighborhood Violence Rates and (3) Received Needed Care Rates (from healthcare physicians).

Financial Indicator Data

fin_data_reformat = 
  financial_data %>% 
  gather(key = "Location", value = "Value", `Humboldt Park`, `Chicago, IL`)

ggplot(fin_data_reformat, aes(x = Indicators, 
                              y = as.numeric(Value), 
                              fill = Location)) +
  geom_bar(stat = "identity", 
           position = position_dodge(width = 0.9), 
           color = "black") +
  geom_text(aes(label = sprintf("%.2f", as.numeric(Value))),
            position = position_dodge(width = 0.7),
            vjust = -1, size = 2) +
  labs(title = "Comparison of Indicators between Humboldt Park and Chicago, IL",
       x = "Indicators",
       y = "Values") +
  scale_fill_manual(values = c("Humboldt Park" = "red", "Chicago, IL" = "lightblue")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  guides(fill = guide_legend(title = "Location"))

The bar plot above shows the indicators that are related to the financial aspects for Humboldt Park and Chicago, IL. As mentioned before, financial aspects such as income can drastically affect an individual or family’s ability to purchase whole foods, which can lead to unhealthier diets. The differences in all three indicators are significant, especially when it comes to residents that qualify for food stamps. Although the federal government advocates for the use of food stamps to be directed towards healthier options, studies have shown that over 20% of individuals use these resources to purchase unhealthy foods and beverages due to their convenience.

It is worth noting that a serious issue with the SNAP program is that individuals that are likely eligible to receive these benefits are unable to qualify due to things such as tedious applications and other factors such as employment requirements. So for a great proportion of individuals that are not able to receive benefits yet lie under the federal poverty line, their chances of buying unhealthy food and beverage options are great, most definitely higher than those that do receive benefits.

Therefore, the six indicators that should be researched further are: (1) Adult Physical Inactivity Rates, (2) Neighborhood Violence Rates, (3) Received Needed Care Rates, (4) Food stamps (SNAP), (5) Poverty Rates and (6) Uninsured Rates. However, it is my belief that there four of these indicators are

Explanatory Analysis

ggplot(data_overtime3, aes(x = as.numeric(`Time Period`), 
                           y = `Humboldt Park`, 
                           color = Indicator)) +
  geom_line(aes(group = Indicator)) +
  labs(title = "Indicators Over Time",
       x = "Time Period",
       y = "Values") +
  theme_minimal()

Although we had identified 6 indicators that may be of interest, we decided to slim the analysis down to 4 due to the fact that the other 2 indicators did not have much data in regard to time intervals (only 2021-2022 data). The results of this line plot are extremely interesting for a number of reasons.

As seen, the three longer lines represent the indicators that may be causing the increased rate of diabetes: Food Stamps (SNAP), Poverty Rates and Uninsured Rates. Although all of these values seem to be decreasing as time goes by, the rate of diabetes has skyrocketed within the last couple years.

If these indicators do not match the rising levels of diabetes, there are a couple other factors that should be researched in a future study. For instance, the rise of diabetes happens to be around the peak of the pandemic, which negatively affected a great proportion of the population.

Final Project rmd

ChanKim

2023-12-06

Research Question

Diabetes in Humboldt Park

Library and Packages

Uploading the Data

Exploratory Data Upload

Explanatory Data Upload

Tidying the Data

Exploratory Analysis

Explanatory Data Analysis

Coding

Data Visualization

Exploratory Analysis

Non-Financial Indicator Data

Financial Indicator Data

Explanatory Analysis