CSC303 Project Report

Introduction

CDC Project Overview

According to the American Statistical Association, drug overdose deaths are now the leading cause of injury and death in the United States. As of 2016, 2.1 million Americans have an opioid use disorder.

The goal of this project extension is to look further into the details of state population, specifically at predictor variables such as breakdown of race and annual rate of unemployment.

Where we Start

In the original CDC Data Project, we observed that West Virginia’s death percentage rate was fairly higher than that of most other states throughout the range of years observed (2005 - 2016) - significantly so after 2010.

cdcDF %>%
  ggplot(aes(x = Year, y = deathPerc, group = State)) + 
  geom_line() +
  gghighlight::gghighlight(State == "West Virginia") +
  labs(y = "Death Percentage")
## label_key: State

deathPercByStateData %>% 
  filter(Year == 2016) %>% 
  ggplot(aes(x = long, y = lat, group = group)) + 
  geom_polygon(aes(fill = deathPerc)) +
  coord_fixed(1.3) +
  ditch_the_axes + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Ratio of Deaths per State Population - 2016", 
       fill = "Death Percentage")

Thus for this report, we will be looking more extensively into We will be looking into predictor variables as mentioned in the overview and breaking down those variables for West Virginia in particular.

Analysis

Race Breakdown

cdcDF2016 %>% 
  arrange(desc(deathPerc)) %>% 
  head(5) %>% 
  select(State, deathPerc, White, Black, Hispanic, Asian,
         `American Indian/Alaska Native`, 
         `Native Hawaiian/Other Pacific Islander`, 
         `Two Or More Races`) %>% 
  DT::datatable(options = list(scrollX = TRUE), 
              rownames = FALSE)

Looking at the top five states in terms of death percentage, West Virginia has the highest rate of deaths per population and the highest percentage of white people its population. New Hampshire also has a very high percentage of white people within its population, however its death percentage is moderately lower and a higher percentage of Hispanic people.

raceData %>% 
  filter(State %in% c("United States", "West Virginia")) %>% 
  knitr::kable()
State White Black Hispanic Asian American Indian/Alaska Native Native Hawaiian/Other Pacific Islander Two Or More Races Total Year
United States 0.61 0.12 0.18 0.01 0.05 0 0.03 1 2016
West Virginia 0.92 0.03 0.01 0.00 0.01 NA 0.02 1 2016

Comparing the race breakdown of West Virginia against the United States as a whole, the state has an extremely significant higher proportion of white people (92% versus 61%).

Unemployment Rate Breakdown

cdcDF2016 %>% 
  arrange(desc(rate2016)) %>% 
  head(5) %>% 
  select(State, rate2016) %>% 
  knitr::kable(caption = "Top Five States - Unemployment Rate 2016")
Top Five States - Unemployment Rate 2016
State rate2016
Alaska 6.9
New Mexico 6.7
West Virginia 6.1
Louisiana 6.0
Alabama 5.9

While not the highest, West Virginia is within the top five highest unemployment rates in the United States in 2016.

cdcDF2016 %>% 
  inner_join(states, by = "region") %>% 
  ggplot(aes(x = long, y = lat, group = group)) + 
  geom_polygon(aes(fill = rate2016)) +
  coord_fixed(1.3) +
  ditch_the_axes + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(title = "State Unemployment Rates - 2016", 
       fill = "Unemployment Rate")

From the map, it’s shown that the higher unemployment rates are spread throughout the country rather than being isolated to one specific region. The range for unemployment rate is not significantly large, however it is a fairly significant amount of people when relating the percentage to the population numbers.

Numerical Model

modCDC <- lm(deathPerc ~ White + rate2016, data = cdcDF2016)
summary(modCDC)
## 
## Call:
## lm(formula = deathPerc ~ White + rate2016, data = cdcDF2016)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0111508 -0.0074817 -0.0005783  0.0042798  0.0219710 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -0.007121   0.009882  -0.721  0.47471   
## White        0.017568   0.008243   2.131  0.03832 * 
## rate2016     0.003627   0.001317   2.754  0.00835 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.008492 on 47 degrees of freedom
## Multiple R-squared:  0.1609, Adjusted R-squared:  0.1251 
## F-statistic: 4.505 on 2 and 47 DF,  p-value: 0.01622
confint(modCDC)
##                     2.5 %      97.5 %
## (Intercept) -0.0270011093 0.012758684
## White        0.0009861185 0.034150026
## rate2016     0.0009774032 0.006276686

From the predictor variables chosen to create the linear model, there is evidence that rate2016 increases the percentage of deaths per population for a state by an estimated 3.627e-03. There is slight evidence that a higher proportion of white people within the population raises the rate of drug-related deaths by an estimated 1.756e-02. For the White estimate, the P-value of 0.038 indicates that there is a 3.8% chance of getting an estimate as high as we did with this model. For the rate2016 estimate, the P-value of 0.008 indicates that there is a 0.8% chance of getting an estimate as high as we did with this model. Since both P-values are smaller than 0.05, the data provides evidence that unemployment rate and the proportion of white people within a state is associated with the number of drug-related deaths wihtin a state population. The results for the model are statistically significant considering the p-value is 0.01622. Regarding the confidence intervals for each predictor value, both predictor variables do not contain 0. Thus there is evidence that a correlation between both predictor variables (unemployment rate and proportion of white people within theh state population) and its number of drug-related deaths within the population exists. The model is 95% confident that the estimated value of drug-related deaths for a state with a higher unemployment rate would lie between 6.276e-03 and 9.774e-04, while a state with a higher proportion of white people would lie between 3.415e-02 and 9.861e-04.

tr <- rpart::rpart(deathPerc~White + rate2016, data = cdcDF2016)
plot(partykit::as.party(tr))

Based on the data from cdcDF2016, the classification tree seems to show a trend that as the unemployment rate and proportion of white people within the state population increases the percentage of drug-related deaths within the population increases. This is shown by the boxplots at each end of the tree, as the median value increases from left to right just as the rate of unemployment and proportion of white people in the population increases to the right.

Conclusion

Based on the above analysis, there seems to be a slight to moderate correlation between the number of drug-related deaths within a state population and the state’s unemployment rate as well as the percentage of white people within its population. However, this data was based off of one year (2016) which was skewed (in terms of death percentage) in favor of West Virginia - even though the line graph depicting death percentage over the years shown in the introduction shows that West Virginia’s high death percentage is not an isolated event. Given more time, I would cumulate more data from a larger range of years in order to model the predictor variables more effectively as well as look into adding more predictor variables, such as age breakdown within the population and income.

References

  1. “Multiple Cause of Death Data on CDC WONDER.” Centers for Disease Control and Prevention, Centers for Disease Control and Prevention, wonder.cdc.gov/mcd.html.

  2. “Population Distribution by Race/Ethnicity”. The Henry J. Kaiser Family Foundation, The Henry J. Kaiser Family Foundation, 29 Nov. 2018, www.kff.org/other/state-indicator/distribution-by-raceethnicity/?currentTimeframe=1&sortModel=%7B%22colId%22%3A%22Location%22%2C%22sort%22%3A%22desc%22%7D.

  3. “Unemployment Rates for States, Annual Averages.” U.S. Bureau of Labor Statistics, U.S. Bureau of Labor Statistics, 27 Feb. 2018, www.bls.gov/lau/lastrk16.htm.

Maddy Ramser

12/4/2018