CSC303 Project

Introduction

This project analyzes the CDC Drug Related Deaths data. This data set gives data relating to people who have died from the years 1999-2016 due to a drug related cause. There are many variables in this data set inlcuding, but not limited to: year, state, age, deaths, population and region. When looking at this data I first wanted to organize the data so that I could look at the death percentages by state. By doing this I thought this would help me investigate some trends in drug related deaths. I then looked at the Medicare Part D prescribing data. This data set gives information about the opioid and extended release opioid prescribing rates for each state. I wanted to see if there was a correlation in the prescribing rate and the percentage of drug related deaths. In many areas of the United States the opioid crisis has become an epidemic. It was not until recently that actions were taken to regulate the opioid prescribing rate. I would like to see if the increase in opioid related deaths has any correlation to an increase in opioid prescribing rates. I chose to look at the calender year 2016 for both of these sets.

Analysis

First we must read in the .txt file for the CDC data.

cdcDeathCauses <- read_delim("cdcDeathCauses.txt", 
                             "\t", escape_double = FALSE, trim_ws = TRUE)

Within this text file there is a column labeled “Notes.” This column appears to not be relevant to the data that I am trying to look at therefore I will remove that column.

cdcDeathCauses <- cdcDeathCauses[,-1]

Now let’s rename some variables, tidy up the data to select for the certain parameters we want and create a data frame from the CDC data.

states <- map_data("state")
deathPercByStateData <-
  cdcDeathCauses %>%
  mutate(deathPerc = (Deaths/Population) * 100) %>%
  mutate(crudeRate = parse_number(`Crude Rate`)) %>%
  mutate(ageAdjRate = parse_number(`Age Adjusted Rate`)) %>%
  arrange(desc(deathPerc)) %>% 
  mutate(region = `State`, region = tolower(region)) %>% 
  filter(State != "District of Columbia") %>% 
  inner_join(states, by = "region") %>%
  select(State, Year, Deaths, Population, deathPerc, crudeRate, ageAdjRate, region, lat,
         long, group)
deathPercByStateData

## # A tibble: 279,486 x 11
##    State  Year Deaths Population deathPerc crudeRate ageAdjRate region
##    <chr> <int>  <int>      <int>     <dbl>     <dbl>      <dbl> <chr> 
##  1 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  2 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  3 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  4 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  5 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  6 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  7 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  8 West…  2016    973    1831102    0.0531      53.1       57.1 west …
##  9 West…  2016    973    1831102    0.0531      53.1       57.1 west …
## 10 West…  2016    973    1831102    0.0531      53.1       57.1 west …
## # ... with 279,476 more rows, and 3 more variables: lat <dbl>, long <dbl>,
## #   group <dbl>

Now let’s import the Medicare Part D prescribing data set

opioidPrescribing <- read_excel("Medicare_Part_D_Opioid_Prescribing_Geographic_2016.xlsx", 
    skip = 4)

Now let’s tidy up the Part D prescribing data set

prescribingData <- 
  opioidPrescribing %>%
  rename(state.name="State Name") %>%
  mutate(opPreRate = parse_number(`Opioid Prescribing Rate`)) %>%
  mutate(xrOpPreRate = parse_number(`Extended Release Opioid Prescribing Rate`)) %>% 
  select(state.name, opPreRate, xrOpPreRate)
DT::datatable(prescribingData)

I now want to remove the National row as I am just looking at the data for each state.

prescribingData <- prescribingData[-1, ]
DT::datatable(prescribingData)

Now I want to compare these two data sets.

First let’s look at the death percentages by state in the year 2016.

ditch_the_axes <- theme(
  axis.text = element_blank(),
  axis.line = element_blank(),
  axis.ticks = element_blank(),
  panel.border = element_blank(),
  panel.grid = element_blank(),
  axis.title = element_blank()
)
deathPercByStateData %>% 
  filter(Year == 2016) %>% 
  ggplot(aes(x = long, y = lat, group = group)) + 
  geom_polygon(aes(fill = deathPerc)) +
  coord_fixed(1.3) +
  ditch_the_axes + 
  theme(plot.title = element_text(hjust = 0.5)) + 
  labs(title = "Ratio of Deaths per State Population - 2016", 
       fill = "Death Percentage")

Now let’s arrange the opioid prescribing data for the year of 2016 to see if we can find trends among the two. First, it might help to take the average for each state.

I would like to arrange it based on the regular opioid prescribing rate

evalPrescribingData <- 
  prescribingData %>% 
  group_by(state.name) %>% 
  summarise(average = mean(opPreRate)) %>% 
  arrange(desc(average))
as.data.frame(evalPrescribingData)

##              state.name average
## 1               Alabama    7.47
## 2                  Utah    7.46
## 3                Nevada    7.44
## 4              Oklahoma    7.31
## 5                 Idaho    7.29
## 6              Colorado    7.02
## 7             Tennessee    6.97
## 8                Oregon    6.93
## 9                Alaska    6.74
## 10             Michigan    6.68
## 11           Washington    6.65
## 12              Arizona    6.51
## 13             Arkansas    6.37
## 14              Montana    6.30
## 15           New Mexico    6.17
## 16              Indiana    6.10
## 17          Mississippi    5.96
## 18             Delaware    5.94
## 19              Georgia    5.94
## 20       North Carolina    5.94
## 21       South Carolina    5.92
## 22              Wyoming    5.91
## 23               Kansas    5.86
## 24            Louisiana    5.78
## 25             Maryland    5.71
## 26                Maine    5.55
## 27            Wisconsin    5.54
## 28             Missouri    5.53
## 29             Kentucky    5.49
## 30                Texas    5.45
## 31              Vermont    5.41
## 32             Virginia    5.31
## 33        West Virginia    5.27
## 34              Florida    5.19
## 35        New Hampshire    5.11
## 36                 Ohio    5.04
## 37         South Dakota    5.03
## 38           California    5.02
## 39            Minnesota    4.75
## 40         Pennsylvania    4.69
## 41             Nebraska    4.67
## 42             Illinois    4.61
## 43                 Iowa    4.49
## 44         North Dakota    4.17
## 45          Connecticut    4.06
## 46           New Jersey    4.01
## 47 District of Columbia    3.91
## 48        Massachusetts    3.84
## 49               Hawaii    3.76
## 50         Rhode Island    3.48
## 51             New York    2.94

Let’s select the top 5 states.

averageTop5op <-
evalPrescribingData %>% 
  filter(state.name == c("Alabama", "Utah", "Nevada", "Oklahoma", "Idaho"))
as.data.frame(averageTop5op)

##   state.name average
## 1    Alabama    7.47
## 2       Utah    7.46
## 3     Nevada    7.44
## 4   Oklahoma    7.31
## 5      Idaho    7.29

A graphic to show this

evalPrescribingData %>% 
  filter(state.name == c("Alabama", "Utah", "Nevada", "Oklahoma", "Idaho")) %>% 
  ggplot(aes(x=state.name, y = average)) +
  geom_bar(stat="identity") +
  labs(x="Top 5 states", y="Average Prescription Rate", title = "Average regular opiod prescription rates for the top 5 states")

Now let’s organize it based on the extended release opioid prescribing rate

eval2PrescribingData <- 
  prescribingData %>% 
  group_by(state.name) %>% 
  summarise(average = mean(xrOpPreRate)) %>% 
  arrange(desc(average))
as.data.frame(eval2PrescribingData)

##              state.name average
## 1               Vermont   11.37
## 2                Alaska   10.99
## 3               Arizona   10.68
## 4                Nevada   10.67
## 5              Delaware   10.57
## 6            Washington   10.10
## 7         New Hampshire   10.08
## 8                Oregon   10.05
## 9              Maryland    9.82
## 10                Maine    9.74
## 11                Idaho    9.22
## 12                 Utah    9.15
## 13               Hawaii    9.02
## 14             Colorado    8.94
## 15              Florida    8.58
## 16            Tennessee    8.53
## 17              Montana    7.82
## 18            Wisconsin    7.78
## 19        Massachusetts    7.69
## 20           California    7.62
## 21             Oklahoma    7.51
## 22       North Carolina    7.26
## 23         Pennsylvania    7.20
## 24             Michigan    7.03
## 25             New York    6.91
## 26               Kansas    6.83
## 27              Wyoming    6.75
## 28           New Mexico    6.69
## 29          Connecticut    6.55
## 30           New Jersey    6.50
## 31              Indiana    6.30
## 32            Minnesota    6.18
## 33              Georgia    6.04
## 34             Virginia    5.93
## 35         South Dakota    5.90
## 36             Nebraska    5.85
## 37             Missouri    5.74
## 38             Arkansas    5.70
## 39       South Carolina    5.64
## 40              Alabama    5.59
## 41                 Iowa    5.42
## 42                 Ohio    5.31
## 43         North Dakota    5.20
## 44            Louisiana    5.04
## 45 District of Columbia    4.82
## 46         Rhode Island    4.80
## 47                Texas    4.72
## 48             Kentucky    4.49
## 49          Mississippi    4.23
## 50        West Virginia    4.16
## 51             Illinois    4.03

Let’s take these top 5 as well

averageTop5xr <-
  eval2PrescribingData %>% 
  filter(state.name == c("Vermont", "Alaska", "Arizona", "Nevada","Delaware"))
as.data.frame(averageTop5xr)

##   state.name average
## 1    Vermont   11.37
## 2     Alaska   10.99
## 3    Arizona   10.68
## 4     Nevada   10.67
## 5   Delaware   10.57

A graphic to show this

eval2PrescribingData %>% 
  filter(state.name == c("Vermont", "Alaska", "Arizona", "Nevada","Delaware")) %>% 
  ggplot(aes(x=state.name, y=average)) +
  geom_bar(stat="identity") +
  labs(x="Top 5 states", y="Average Prescription Rate", title = "Average extended release opioid prescription rates for the top 5 states")

Conclusions

When just looking at the CDC data and analyzing the percentage of drug related deaths by state, we can conclude that west Virginia has the highest percentage of drug related deaths. The following states are also on the upper end of death percentages: Ohio, Pennsylvania, Nevada, New Mexico, Oklahoma, Missouri, Louisiana and Kentucky. When comparing these findings to the Medicare Part D Opioid prescribing data set we see that for the regular opioid prescribing rate Nevada and Oklahoma are two states with greater death percentages that also have higher prescribing rates. For the extended release opioid prescribing rate, we see that Nevada is the only state that falls within the top 5 that also has a higher death percentage rate. These are not the exact results that I was hoping to see. However the CDC data set deals with all drug related deaths, not just opioids. This could be contributing to the lack of strong correlation between the two data sets. Another thought, is that often times many patients might first get addicted from a prescription; but then, they typically start to overuse illegally. This would not be recorded in the prescribing rate if they are getting their drugs elsewhere. It is also when they start to use illegally that then raises their change at overdose or death due to drug use.
Some further investigating for this might include: * Mutating the CDC data so that I am only evaluating the opioid related deaths. * Gathering information about the people who have died due to an opioid related death that have also been recieving prescriptions for opioids.

References

“Medicare Part D Opioid Prescribing Mapping Tool” CMS
“Multiple Causes of Death 1999-2016” CDC