Joining Data Frames: Final Dataset

This data set merges together the KFF Medicaid data, the CDC Wonder Mortality data, and the CDC Wonder population data. This also includes a new variable, mortality_rate that shows the mortality rate for heart and respiratory deaths for all 50 states.

merged_data<-merge(merge(kff_data, mortality, by="state"), population, by="state")

merged_data <- merged_data %>% 
  mutate(mortality_rate = round((heart_respiratory_deaths/Population*100000), 2))

str(merged_data)
## 'data.frame':    50 obs. of  7 variables:
##  $ state                   : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ non_senior_enrollment   : num  791900 153500 1299800 458100 8113700 ...
##  $ non_senior_spending     : num  1.31e+09 8.44e+08 3.96e+09 2.04e+09 1.97e+10 ...
##  $ spending_per_non_senior : num  1653 5498 3047 4460 2423 ...
##  $ heart_respiratory_deaths: num  5430 350 3493 2983 16730 ...
##  $ Population              : num  4151060 493770 5943270 2497790 33280744 ...
##  $ mortality_rate          : num  130.8 70.9 58.8 119.4 50.3 ...

Table 1: Interactive Data Table: Top 10 States with the Most Deaths per Medicaid Dollar Spent

top10 <- merged_data %>% 
     mutate(mortality_to_spending = round(10*(mortality_rate/spending_per_non_senior),2)) %>%
     arrange(desc(mortality_to_spending)) %>% 
     filter(mortality_to_spending >= 0.2991) %>%
  select(!c(non_senior_spending)) %>% 
  select(state, Population, heart_respiratory_deaths, mortality_rate, non_senior_enrollment, spending_per_non_senior, mortality_to_spending)

datatable(top10,
          options = list(
            pageLength = 10,
            columnDefs = list(
              list(className = 'dt-center', targets = 1:4)),
            dom = "t"),
          rownames = FALSE,
          colnames = c("State",
                       "State Population",
                       "Heart/Respiratory Deaths",
                       "Mortality Rate", 
                       "Medicaid Enrollment", 
                       "Spending Per Enrollee", 
                       "Mortality to Spending Ratio"),
          filter = "top") %>%
          formatCurrency(6, currency = "$", digits=2)

Interpretation: The table above shows the top 10 states with the highest number of deaths per medicaid dollar spent. The table shows population, respiratory/heart deaths, mortality rate, Medicaid enrollment, spending per Medicaid enrollee, and mortality to spending ratio. Alabama, West Virginia, South Carolina, Mississippi, and Georgia have the highest mortality to spending ratio.

Figure 1: Scatterplot of Medicaid spending and mortality rates

merged_data %>% 
  plot_ly(
    x = ~spending_per_non_senior, 
    y = ~mortality_rate, 
    split = ~factor(is_top5),
    mode = "markers",
    size = 25,
    type = "scatter",
    mode = "markers", 
    hoverinfo = 'text',
    text = ~paste('State:', state, '<br>Medicaid Spending per Non-Senior: $', spending_per_non_senior, '<br>Heart and Respiratory Mortality Rate:', mortality_rate,"per 100,000 non-seniors")
  ) %>% 
  layout(title = "Medicaid spending per non-senior and mortality rates <br> for heart and respiratory deaths, per 100,000 non-seniors, 2021",
         yaxis = list(title = "Mortality rate per 100,0000"), 
         xaxis = list(title = "State Medicaid spending per non-senior, in dollars"),
         legend = list(traceorder = "reversed", y = .9, bordercolor = "darkgrey", borderwidth = 1),
         annotations = list(x = 1.25, y = -.1,
                            text = " p = 0.315 ",
                            font = list(size = 10, color = "darkgrey"),
                            bordercolor = "darkgrey",
                            borderwidth = 1,
                            showarrow = F,
                            xref = 'paper',
                            yref = 'paper'
         ),
         margin = list(t = 50)
        ) 

Interpretation: There appears to be no correlation between average state Medicaid spending per non-senior and mortality rates for heart and respiratory related deaths per 100,000 non-seniors (p-value = .315)

Figure 2: Bar graph of 5 states with the highest mortality rate per million medicaid dollars spent

top5 %>% 
  plot_ly(
    x = ~factor(state, levels = c("Alabama", "West Virginia", "South Carolina", "Mississippi", "Georgia", "U.S.", ordered = T)),
    y = ~mortality_to_spending, 
    split = ~is_top5,
    type = "bar",
    mode = "markers", 
    hoverinfo = 'text',
    text = ~paste('State:', state, '<br>Medicaid Spending per Non-Senior: $', spending_per_non_senior, '<br>Heart and Respiratory Mortality Rate:', mortality_rate,"per 100,000 non-seniors")
  ) %>% 
  layout(title = "5 States with the Highest Ratios of <br> Heart- and Repiratory- Mortalities per 1 Million Medicaid Dollars Spent, <br> among Non-Seniors, 2021",
         yaxis = list(title = "Deaths per $1,000,000 Spent", dtick = .2), 
         xaxis = list(title = "State"),
         showlegend = F,
         margin = list(t = 80)
         )

Interpretation: The bar graph supplements the information in the scatterplot by “zooming in” on the 5 states with the highest number of deaths per medicaid dollar spent. In Alabama, for example, there appears to be about 4 times as many heart and respiratory deaths per medicaid dollar spent compared to the U.S. as a whole.