#Abstract

Diabetes is a disease with a high health cost to the individual and high monetary cost to our communities. New York state tracks diabetic rates at the county level along with other health and economic data. I leveraged this publicly available data to explore the heterogeneity in diabetic rates in New York state. I wanted to know if there was a correlation between diabetes, income, and obesity at the county level. The individual variables were visualized. Linear regression was used explore the relationship between diabetes (dependent variable) and income and obesity (independent variables). The R-squared was found to be 0.355, showing a correlation between the outcome and predictive variables. Both coefficients were found to significantly different from zero. The incidence of diabetes increases with the increase of the obesity rate for the county and decreases with the increase in average income for a county. Further exploration of the relationships could lead to better interventions to prevent diabetes. In further analysis, the obesity rates and average incomes should be weighted by county population.

Part 1 - Introduction

I was interested in exploring the hetrogeneity of diabetic rates in New York State and how it is related to income and obesity rates. New York is an intersting state for this analysis because there are so many different geographic regions within the state.

Research Question: Are the diabetic rates in New York state correlated with the obesity rates and average income at the county level?

Part 2 - Data

This data comes from the NY.GOV site as part of their open data sets

The Obesity and Diabetes data comes from the New York State Department of Health disease registries. These registries are based on multiple sources including hospital registries in a given geographic area. People who live outside a given area but receive treatment within a geographic region are not counted in these population based registries.

Obesity and Diabetes

The income data is collected by the New York State Department of Taxation and Finance. This data comes the New York State personal income tax returns that were filed in a timely fashion. This data is for full-time New York State residents.

Income

Part 3 - Code

We read in the data sets from GitHub.
We filter and rename some variables so that they can be used in the Shiny app.
We set up our ui an server. The ui slices the data so that we only visualize a subset of the data.

Income <- readr::read_delim("https://raw.githubusercontent.com/catfoodlover/Data607/main/Income.csv", delim = ';', show_col_types = FALSE)
Obese <- readr::read_delim("https://raw.githubusercontent.com/catfoodlover/Data607/main/Obese.csv", delim = ';', show_col_types = FALSE)
Diagnosed <- readr::read_delim("https://raw.githubusercontent.com/catfoodlover/Data607/main/Diagnosed.csv", delim = ';', show_col_types = FALSE)

data("county.regions")
Diagnosed <- rename(Diagnosed, County = "County Name")
Income <- rename(Income, County = "Place of Residence")
Income <- rename(Income, Year = "Tax Year")
Obese <- rename(Obese, County = "County Name")
Income$County <- as.character(Income$County)


Income <-mutate(Income, County=ifelse(County=="New York City - Bronx", "Bronx", County))
Income <-mutate(Income, County=ifelse(County=="New York City - Kings", "Kings", County))
Income <-mutate(Income, County=ifelse(County=="New York City - Manhattan", "New York", County))
Income <-mutate(Income, County=ifelse(County=="New York City - Queens", "Queens", County))
Income <-mutate(Income, County=ifelse(County=="New York City - Richmond", "Richmond", County))


Income09 <- filter(Income, Year == 2009)
Income09 <- mutate(Income09, County = ifelse(County == "new york city", "new york", County))


Obese$County <- tolower(Obese$County)
Income09$County <- tolower(Income09$County)
Diagnosed$County <- tolower(Diagnosed$County)


DiaDiagnosed <- filter(Diagnosed, Diagnosed$"Health Topic" %in% "Cirrhosis/Diabetes Indicators")


DiaDiagnosed <- rename(DiaDiagnosed, value = "Percent/Rate")
Income09 <- rename(Income09, value = "Average NY AGI of All Returns")

Obese <- rename(Obese, value = "Percentage/Rate")

Temp <- filter(county.regions, state.name %in% "new york")

#Income09 <- Income09 %>% mutate(County = as.character(County))

Temp_Obese <- left_join(Temp, Obese, by = c("county.name" = "County"))
Temp_Income <- left_join(Temp, Income09, by = c("county.name" = "County"))
Temp_Diab <- left_join(Temp, DiaDiagnosed, by = c("county.name" = "County"))


#fix missing decimals
Temp_Diab <- Temp_Diab %>% mutate(value = ifelse(value > 50, value/10, value), Input = 'Diabetes')
Temp_Obese <- Temp_Obese %>% mutate(value = ifelse(value > 100, value/10, value), Input = 'Obesity')
Temp_Income <- Temp_Income %>% mutate(Input = 'Income')

Final <- bind_rows(Temp_Diab, Temp_Obese, Temp_Income)

full_data <- bind_rows(Temp_Diab %>% select(region, value, Input), Temp_Obese %>% select(region, value, Input))
full_data <- inner_join(full_data, Temp_Income %>% select(region, income = value))


ui <- navbarPage(title = "NY State Income and Comorbidity Data",
                 tabPanel(title = 'Disease Explorer',
                          plotOutput("plot1"),
                          selectInput('Type', 'Input', choices = unique(Final$Input), selected='Diabetes')
                 ),

                 tabPanel(title = 'Map Explorer',
                          plotOutput("plot2"),
                          selectInput('Type2', 'Input', choices = unique(Final$Input), selected='Diabetes')),

                  tabPanel(title = 'Relation Explorer',
                           plotOutput("plot3"),
                           selectInput('Type3', 'Input', choices = unique(full_data$Input), selected='Diabetes')))




server <- function(input, output, session) {
    #mskRvis::set_msk_ggplot()
    data1 <- reactive({
        dfSlice <- Final %>%
            filter(Input == input$Type)
    })
    
    data2 <- reactive({
        dfSlice <- Final %>%
            filter(Input == input$Type2)
    })
    
    data3 <- reactive({
      dfSlice <- full_data %>%
        filter(Input == input$Type3)
    })
    
    
    output$plot1 <- renderPlot({
        
        ggplot(data1(), aes(x = reorder(county.name, desc(value)), y = value)) +
            geom_bar(stat="identity", color = 'blue', fill = 'blue') + #mskRvis::msk_colors["msk_blue"]) + 
        scale_y_continuous(labels = gtsummary::style_number,n.breaks = 7) +
        labs(x = "County") + 
        theme(axis.text.x = element_text(angle = 90))
    })
    
    output$plot2 <- renderPlot({
        
        progress = shiny::Progress$new()
        on.exit(progress$close())
        progress$set(message = "Creating image. Please wait.", value = 0)
        
        county_choropleth(data2(),
                          title      = "2009 New York State Rates",
                          legend     = "% Population",
                          num_colors = 9,
                          state_zoom = c("new york")) + scale_fill_brewer(palette="RdPu")
    })
    
    output$plot3 <- renderPlot({
      
      progress = shiny::Progress$new()
      on.exit(progress$close())
      progress$set(message = "Creating image. Please wait.", value = 0)
      
      ggplot(data3(), aes(y = value, x = income)) + geom_point() + geom_smooth(method = "loess", se =
                                                                                  TRUE, alpha = 0.2) + labs(title = "Income vs Comorbidity in NY State") + xlab("Mean Taxable Income by County") + ylab("Comorbidity Rate by County")
    })
    
    
    
}



shinyApp(ui = ui, server = server)

Part 4 - Conclusion

This analysis is important because type 2 diabetes is a debilitating desease that is largely preventable. To understand what other factors are related to its incidence may lead to better prevention methods.

We found that there is a correlation between diabetes and income and obesity. There are some limitations to the interpretability of these results. We used measurements captured at the county level, the populations within each county vary wildly. We can’t say what the relationship is at the population level of all people who live in New York state because these measurements are unweighted.

References

New York State Department of Health. “Community Health Obesity and Diabetes Related Indicators: 2008 - 2012: State of New York.” Community Health Obesity and Diabetes Related Indicators: 2008 - 2012 | State of New York, 1 July 2016, https://health.data.ny.gov/Health/Community-Health-Obesity-and-Diabetes-Related-Indi/tchg-ruva.

New York State Department of Taxation and Finance. “Average Income and Tax Liability of Full-Year Residents by County - Table 5: State of New York.” Average Income and Tax Liability of Full-Year Residents by County - Table 5 | State of New York, 6 Feb. 2017, https://data.ny.gov/Government-Finance/Average-Income-and-Tax-Liability-of-Full-Year-Resi/2w9v-ejxd.

DATA608 Final Project Write Up

William Aiken

12/11/2022

Part 1 - Introduction

Part 2 - Data

Part 3 - Code

Part 4 - Conclusion

References