Goal

The goal of this project is to look at rates of opioid overdoses within New York state and compare to a variety of socioeconomic factors to see if there is a correlation between them.

Introduction

The opioid crisis is a well-documented public health crisis. The rate of overdose deaths due to opioids (e.g. oxycodone, heroin, etc.) has been on the rise for years and local, state, and federal governments have struggled to find solutions.

Data Sources

Data for this analysis comes in three parts: data on overdose deaths, unemployment data, and poverty and income.

Overdose Deaths

New York State tracks opioid overdose deaths by county on their website1. To get this data we need to scrape it from the site.

library(rvest)
rawHTML <- read_html("https://www.health.ny.gov/statistics/opioid/data/d2.htm")

Once we have the raw HTML, we can then parse it to get our raw data:

# Get the table from the HTML
rawTable <- html_table(rawHTML)[[1]]

### Fix up the data frame ###
# Fix column names
colnames(rawTable) <- c("county","deaths_2013","deaths_2014",
                        "deaths_2015","total_deaths","avgPop",
                        "rate","adjRate")

# Remove region titles and region totals
opioids <- rawTable[-grep("[[:space:]]?Reg",rawTable$county),]

Next we tidy the data:

# Put years into rows
opioids <- gather(opioids, key="year", value = "deaths",
                  deaths_2013, deaths_2014, deaths_2015)

opioids$year <- as.numeric(str_extract(opioids$year,"[0-9]+"))

# Standardize our counties
opioids$county <- tolower(opioids$county)
opioids$county <- str_replace(opioids$county,"\\.","")

# Fix column types
opioids$total_deaths <- as.numeric(str_replace_all(opioids$total_deaths,"(\\*|,)",""))
opioids$deaths <- as.numeric(str_replace_all(opioids$deaths,"(\\*|,)",""))
opioids$avgPop <- as.numeric(str_replace_all(opioids$avgPop,"(\\*|,)",""))
opioids$rate <- as.numeric(str_replace_all(opioids$rate,"(\\*|,)",""))
opioids$adjRate <- as.numeric(str_replace_all(opioids$adjRate,"(\\*|,)",""))

opioids
##             county total_deaths   avgPop rate adjRate year deaths
## 1           nassau          446  1357374 11.0    11.5 2013    131
## 2          suffolk          618  1501431 13.7    14.3 2013    209
## 3            bronx          382  1437445  8.9     8.8 2013    105
## 4            kings          486  2616892  6.2     6.1 2013    126
## 5         new york          313  1635648  6.4     5.8 2013     94
## 6           queens          351  2318968  5.0     4.7 2013    120
## 7         richmond          199   473486 14.0    14.1 2013     61
## 8         dutchess          146   296416 16.4    17.1 2013     52
## 9           orange          170   376446 15.1    15.8 2013     46
## 10          putnam           34    99391 11.4    12.4 2013     12
## 11        rockland           66   323602  6.8     7.6 2013     23
## 12        sullivan           56    75828 24.6    26.7 2013     17
## 13          ulster           71   180529 13.1    13.9 2013     25
## 14     westchester          223   972611  7.6     7.6 2013     71
## 15          albany           77   308166  8.3     8.8 2013     24
## 16        columbia           22    61958 11.8    11.5 2013      7
## 17          greene           22    48016 15.3    16.6 2013      5
## 18      rensselaer           36   159986  7.5     8.1 2013     12
## 19        saratoga           49   225012  7.3     7.7 2013     12
## 20     schenectady           28   155224  6.0     5.6 2013     11
## 21          fulton            6    54228  3.7     3.8 2013      3
## 22        herkimer           17    63675  8.9    10.7 2013      7
## 23      montgomery           11    49773  7.4     8.9 2013      3
## 24          otsego           13    61149  7.1     8.3 2013      3
## 25       schoharie            4    31580  4.2     4.1 2013      2
## 26         clinton           20    81491  8.2     6.9 2013      5
## 27           essex           14    38640 12.1    13.4 2013      4
## 28        franklin            7    51203  4.6     4.7 2013      1
## 29        hamilton            1     4733  7.0     2.8 2013      0
## 30          warren            7    64999  3.6     3.1 2013      0
## 31      washington            5    62565  2.7     2.6 2013      1
## 32       jefferson           34   118747  9.5     9.8 2013     12
## 33           lewis            5    27109  6.1     6.6 2013      3
## 34     st lawrence           31   111457  9.3    10.2 2013      9
## 35          cayuga           32    78863 13.5    14.1 2013     12
## 36        cortland           12    48831  8.2     8.9 2013      0
## 37         madison           18    72200  8.3     9.8 2013      4
## 38          oneida           88   232985 12.6    13.4 2013     17
## 39        onondaga          174   468349 12.4    12.6 2013     47
## 40          oswego           48   120741 13.3    13.8 2013     18
## 41          broome           77   197150 13.0    14.1 2013     22
## 42        chenango           17    49258 11.5    13.3 2013      4
## 43        delaware            9    46452  6.5     8.4 2013      1
## 44           tioga           18    49855 12.0    13.6 2013      4
## 45        tompkins           32   104411 10.2    11.5 2013      7
## 46         chemung           26    87782  9.9    10.6 2013      9
## 47      livingston            9    64669  4.6     5.3 2013      3
## 48          monroe          237   749688 10.5    10.4 2013     68
## 49         ontario           25   109457  7.6     8.1 2013      9
## 50        schuyler            4    18375  7.3     8.2 2013      3
## 51          seneca            7    35042  6.7     7.7 2013      2
## 52         steuben           15    98225  5.1     5.1 2013      5
## 53           wayne            8    91990  2.9     2.8 2013      2
## 54           yates            5    25137  6.6     6.6 2013      1
## 55        allegany            4    47769  2.8     3.6 2013      1
## 56     cattaraugus           15    78471  6.4     7.1 2013      3
## 57      chautauqua           39   131971  9.9    10.9 2013     11
## 58            erie          444   921760 16.1    16.5 2013     94
## 59         genesee           13    59184  7.3     7.3 2013      2
## 60         niagara           94   213475 14.7    15.9 2013     26
## 61         orleans           10    41934  7.9     8.4 2013      5
## 62         wyoming           11    41244  8.9    10.0 2013      8
## 63  new york state         5461 19731048  9.2     9.0 2013   1604
## 64          nassau          446  1357374 11.0    11.5 2014    143
## 65         suffolk          618  1501431 13.7    14.3 2014    199
## 66           bronx          382  1437445  8.9     8.8 2014    102
## 67           kings          486  2616892  6.2     6.1 2014    163
## 68        new york          313  1635648  6.4     5.8 2014    106
## 69          queens          351  2318968  5.0     4.7 2014    113
## 70        richmond          199   473486 14.0    14.1 2014     73
## 71        dutchess          146   296416 16.4    17.1 2014     39
## 72          orange          170   376446 15.1    15.8 2014     58
## 73          putnam           34    99391 11.4    12.4 2014      9
## 74        rockland           66   323602  6.8     7.6 2014     16
## 75        sullivan           56    75828 24.6    26.7 2014     20
## 76          ulster           71   180529 13.1    13.9 2014     19
## 77     westchester          223   972611  7.6     7.6 2014     67
## 78          albany           77   308166  8.3     8.8 2014     23
## 79        columbia           22    61958 11.8    11.5 2014      9
## 80          greene           22    48016 15.3    16.6 2014      9
## 81      rensselaer           36   159986  7.5     8.1 2014      7
## 82        saratoga           49   225012  7.3     7.7 2014     19
## 83     schenectady           28   155224  6.0     5.6 2014      5
## 84          fulton            6    54228  3.7     3.8 2014      1
## 85        herkimer           17    63675  8.9    10.7 2014      4
## 86      montgomery           11    49773  7.4     8.9 2014      3
## 87          otsego           13    61149  7.1     8.3 2014      3
## 88       schoharie            4    31580  4.2     4.1 2014      0
## 89         clinton           20    81491  8.2     6.9 2014      6
## 90           essex           14    38640 12.1    13.4 2014      3
## 91        franklin            7    51203  4.6     4.7 2014      2
## 92        hamilton            1     4733  7.0     2.8 2014      1
## 93          warren            7    64999  3.6     3.1 2014      3
## 94      washington            5    62565  2.7     2.6 2014      3
## 95       jefferson           34   118747  9.5     9.8 2014     13
## 96           lewis            5    27109  6.1     6.6 2014      1
## 97     st lawrence           31   111457  9.3    10.2 2014     10
## 98          cayuga           32    78863 13.5    14.1 2014      5
## 99        cortland           12    48831  8.2     8.9 2014      5
## 100        madison           18    72200  8.3     9.8 2014      8
## 101         oneida           88   232985 12.6    13.4 2014     36
## 102       onondaga          174   468349 12.4    12.6 2014     56
## 103         oswego           48   120741 13.3    13.8 2014     15
## 104         broome           77   197150 13.0    14.1 2014     27
## 105       chenango           17    49258 11.5    13.3 2014      7
## 106       delaware            9    46452  6.5     8.4 2014      2
## 107          tioga           18    49855 12.0    13.6 2014      7
## 108       tompkins           32   104411 10.2    11.5 2014     14
## 109        chemung           26    87782  9.9    10.6 2014      7
## 110     livingston            9    64669  4.6     5.3 2014      2
## 111         monroe          237   749688 10.5    10.4 2014     88
## 112        ontario           25   109457  7.6     8.1 2014     10
## 113       schuyler            4    18375  7.3     8.2 2014      0
## 114         seneca            7    35042  6.7     7.7 2014      1
## 115        steuben           15    98225  5.1     5.1 2014      6
## 116          wayne            8    91990  2.9     2.8 2014      3
## 117          yates            5    25137  6.6     6.6 2014      2
## 118       allegany            4    47769  2.8     3.6 2014      1
## 119    cattaraugus           15    78471  6.4     7.1 2014      1
## 120     chautauqua           39   131971  9.9    10.9 2014     14
## 121           erie          444   921760 16.1    16.5 2014    112
## 122        genesee           13    59184  7.3     7.3 2014      2
## 123        niagara           94   213475 14.7    15.9 2014     25
## 124        orleans           10    41934  7.9     8.4 2014      2
## 125        wyoming           11    41244  8.9    10.0 2014      0
## 126 new york state         5461 19731048  9.2     9.0 2014   1710
## 127         nassau          446  1357374 11.0    11.5 2015    172
## 128        suffolk          618  1501431 13.7    14.3 2015    210
## 129          bronx          382  1437445  8.9     8.8 2015    175
## 130          kings          486  2616892  6.2     6.1 2015    197
## 131       new york          313  1635648  6.4     5.8 2015    113
## 132         queens          351  2318968  5.0     4.7 2015    118
## 133       richmond          199   473486 14.0    14.1 2015     65
## 134       dutchess          146   296416 16.4    17.1 2015     55
## 135         orange          170   376446 15.1    15.8 2015     66
## 136         putnam           34    99391 11.4    12.4 2015     13
## 137       rockland           66   323602  6.8     7.6 2015     27
## 138       sullivan           56    75828 24.6    26.7 2015     19
## 139         ulster           71   180529 13.1    13.9 2015     27
## 140    westchester          223   972611  7.6     7.6 2015     85
## 141         albany           77   308166  8.3     8.8 2015     30
## 142       columbia           22    61958 11.8    11.5 2015      6
## 143         greene           22    48016 15.3    16.6 2015      8
## 144     rensselaer           36   159986  7.5     8.1 2015     17
## 145       saratoga           49   225012  7.3     7.7 2015     18
## 146    schenectady           28   155224  6.0     5.6 2015     12
## 147         fulton            6    54228  3.7     3.8 2015      2
## 148       herkimer           17    63675  8.9    10.7 2015      6
## 149     montgomery           11    49773  7.4     8.9 2015      5
## 150         otsego           13    61149  7.1     8.3 2015      7
## 151      schoharie            4    31580  4.2     4.1 2015      2
## 152        clinton           20    81491  8.2     6.9 2015      9
## 153          essex           14    38640 12.1    13.4 2015      7
## 154       franklin            7    51203  4.6     4.7 2015      4
## 155       hamilton            1     4733  7.0     2.8 2015      0
## 156         warren            7    64999  3.6     3.1 2015      4
## 157     washington            5    62565  2.7     2.6 2015      1
## 158      jefferson           34   118747  9.5     9.8 2015      9
## 159          lewis            5    27109  6.1     6.6 2015      1
## 160    st lawrence           31   111457  9.3    10.2 2015     12
## 161         cayuga           32    78863 13.5    14.1 2015     15
## 162       cortland           12    48831  8.2     8.9 2015      7
## 163        madison           18    72200  8.3     9.8 2015      6
## 164         oneida           88   232985 12.6    13.4 2015     35
## 165       onondaga          174   468349 12.4    12.6 2015     71
## 166         oswego           48   120741 13.3    13.8 2015     15
## 167         broome           77   197150 13.0    14.1 2015     28
## 168       chenango           17    49258 11.5    13.3 2015      6
## 169       delaware            9    46452  6.5     8.4 2015      6
## 170          tioga           18    49855 12.0    13.6 2015      7
## 171       tompkins           32   104411 10.2    11.5 2015     11
## 172        chemung           26    87782  9.9    10.6 2015     10
## 173     livingston            9    64669  4.6     5.3 2015      4
## 174         monroe          237   749688 10.5    10.4 2015     81
## 175        ontario           25   109457  7.6     8.1 2015      6
## 176       schuyler            4    18375  7.3     8.2 2015      1
## 177         seneca            7    35042  6.7     7.7 2015      4
## 178        steuben           15    98225  5.1     5.1 2015      4
## 179          wayne            8    91990  2.9     2.8 2015      3
## 180          yates            5    25137  6.6     6.6 2015      2
## 181       allegany            4    47769  2.8     3.6 2015      2
## 182    cattaraugus           15    78471  6.4     7.1 2015     11
## 183     chautauqua           39   131971  9.9    10.9 2015     14
## 184           erie          444   921760 16.1    16.5 2015    238
## 185        genesee           13    59184  7.3     7.3 2015      9
## 186        niagara           94   213475 14.7    15.9 2015     43
## 187        orleans           10    41934  7.9     8.4 2015      3
## 188        wyoming           11    41244  8.9    10.0 2015      3
## 189 new york state         5461 19731048  9.2     9.0 2015   2147

Unemployment Data

Next we turn to unemployment rates. We can also get these data from NYS 2. This time, the data is in a much easier to retrieve CSV format:

unemployment <- read_csv("NY_unemployment.csv")
unemployment
## # A tibble: 186 x 6
##     year county      meanRate meanLabor meanEmployed meanUnemployed
##    <int> <chr>          <dbl>     <dbl>        <dbl>          <dbl>
##  1  2013 albany          6.06   160808.      151075           9750 
##  2  2013 allegany        7.5     23808.       22000           1783.
##  3  2013 bronx          11.8    603450       532425          71050 
##  4  2013 broome          7.76    91925        84767.          7158.
##  5  2013 cattaraugus     8.51    38108.       34850           3250 
##  6  2013 cayuga          7.37    38858.       35992.          2867.
##  7  2013 chautauqua      8.02    60233.       55400           4825 
##  8  2013 chemung         7.88    39600        36475           3133.
##  9  2013 chenango        7.32    24100        22333.          1758.
## 10  2013 clinton         8.31    37167.       34083.          3083.
## # ... with 176 more rows

This data is already in a tidy format, so we don’t need to do anything to it.

Poverty and Income Data

Finally, we gather our poverty and income data from the Census Bureau3. This data is also in a CSV format, which makes importing a bit easier:

rawPoverty <- read_csv(url("https://raw.githubusercontent.com/lysanthus/Data607/master/Final/poverty.csv"))

rawPoverty$county <- tolower(rawPoverty$county)

rawPoverty
## # A tibble: 125 x 44
##     year state countyID county   `All Ages SAIPE Pove… `All Ages in Pover…
##    <int> <int>    <int> <chr>                    <dbl>               <dbl>
##  1  2016    36    36001 albany                  293097               35585
##  2  2013    36    36001 albany                  291194               39857
##  3  2016    36    36003 allegany                 42697                7836
##  4  2013    36    36003 allegany                 43635                7296
##  5  2016    36    36005 bronx                  1418238              405516
##  6  2013    36    36005 bronx                  1381104              423904
##  7  2016    36    36007 broome                  184887               30417
##  8  2013    36    36007 broome                  187458               33205
##  9  2016    36    36009 cattara…                 75207               11014
## 10  2013    36    36009 cattara…                 76451               14442
## # ... with 115 more rows, and 38 more variables: `All Ages in Poverty
## #   Count LB 90%` <dbl>, `All Ages in Poverty Count UB 90%` <dbl>, `90%
## #   Confidence Interval (All Ages in Poverty Count)` <chr>, `All Ages in
## #   Poverty Percent` <dbl>, `All Ages in Poverty Percent LB 90%` <dbl>,
## #   `All Ages in Poverty Percent UB 90%` <dbl>, `90% Confidence Interval
## #   (All Ages in Poverty Percent)` <chr>, `Under Age 18 SAIPE Poverty
## #   Universe` <dbl>, `Under Age 18 in Poverty Count` <dbl>, `Under Age 18
## #   in Poverty Count LB 90%` <dbl>, `Under Age 18 in Poverty Count UB
## #   90%` <dbl>, `90% Confidence Interval (Under Age 18 in Poverty
## #   Count)` <chr>, `Under Age 18 in Poverty Percent` <dbl>, `Under Age 18
## #   in Poverty Percent LB 90%` <dbl>, `Under Age 18 in Poverty Percent UB
## #   90%` <dbl>, `90% Confidence Interval (Under Age 18 in Poverty
## #   Percent)` <chr>, `Ages 5 to 17 in Families SAIPE Poverty
## #   Universe` <dbl>, `Ages 5 to 17 in Families in Poverty Count` <dbl>,
## #   `Ages 5 to 17 in Families in Poverty Count LB 90%` <dbl>, `Ages 5 to
## #   17 in Families in Poverty Count UB 90%` <dbl>, `90% Confidence
## #   Interval (Ages 5 to 17 in Families in Poverty Count)` <chr>, `Ages 5
## #   to 17 in Families in Poverty Percent` <dbl>, `Ages 5 to 17 in Families
## #   in Poverty Percent LB 90%` <dbl>, `Ages 5 to 17 in Families in Poverty
## #   Percent UB 90%` <dbl>, `90% Confidence Interval (Ages 5 to 17 in
## #   Families in Poverty Percent)` <chr>, `Under Age 5 SAIPE Poverty
## #   Universe` <chr>, `Under Age 5 in Poverty Count` <chr>, `Under Age 5 in
## #   Poverty Count LB 90%` <chr>, `Under Age 5 in Poverty Count UB
## #   90%` <chr>, `90% Confidence Interval (Under Age 5 in Poverty
## #   Count)` <chr>, `Under Age 5 in Poverty Percent` <chr>, `Under Age 5 in
## #   Poverty Percent LB 90%` <chr>, `Under Age 5 in Poverty Percent UB
## #   90%` <chr>, `90% Confidence Interval (Under Age 5 in Poverty
## #   Percent)` <chr>, `Median Household Income in Dollars` <chr>, `Median
## #   Household Income in Dollars LB 90%` <chr>, `Median Household Income in
## #   Dollars UB 90%` <chr>, `90% Confidence Interval (Median Household
## #   Income in Dollars)` <chr>

The CSV contains several variables, however for this analysis we will look at only poverty rate and median incomes for each county.

Also, we have values for 2013 and 2016 only. So we will linearly impute the middle values of 2014 and 2015 as equally distant from 2013 and 2016:

# 2013 values
pov13 <- rawPoverty %>% filter(year == 2013) %>% select(county,pct = `All Ages in Poverty Percent`, inc = `Median Household Income in Dollars`)

# 2016 values
pov16 <- rawPoverty %>% filter(year == 2016) %>% select(county,pct = `All Ages in Poverty Percent`, inc = `Median Household Income in Dollars`)

# Fix income values by removing $ and ,
pov13$inc <- as.numeric(str_replace_all(pov13$inc,"\\$|,",""))
pov16$inc <- as.numeric(str_replace_all(pov16$inc,"\\$|,",""))

# Combine our data frames
poverty <- inner_join(pov13, pov16, by=c("county" = "county"),
                       suffix = c("_2013","_2016"))

# Compute changes and impute interim values
poverty <- poverty %>%
  mutate(povChg = pct_2016 - pct_2013, incChg = inc_2016 - inc_2013,
         povIncrement = povChg / 3, incIncrement = incChg / 3,
         pct_2014 = pct_2013 + povIncrement, pct_2015 = pct_2016 - povIncrement,
         inc_2014 = inc_2013 + incIncrement, inc_2015 = inc_2016 - incIncrement)

# Tidy the data
poverty <- poverty %>% 
  gather(key="year",
         value="value",
         pct_2013,pct_2014,pct_2015,pct_2016,
         inc_2013,inc_2014,inc_2015,inc_2016)

poverty <- poverty %>% separate(year,c("measure","yr"),"_")

poverty <- poverty %>% spread(key="measure",value="value") %>% select(county, year = yr, income = inc, poverty = pct)

poverty$year <- as.numeric(poverty$year)

# Adjust income to 1,000's scale
poverty$income <- round(poverty$income/1000,2)

poverty
## # A tibble: 248 x 4
##    county       year income poverty
##    <chr>       <dbl>  <dbl>   <dbl>
##  1 albany       2013   55.8    13.7
##  2 allegany     2013   41.8    16.7
##  3 bronx        2013   33.1    30.7
##  4 broome       2013   45.1    17.7
##  5 cattaraugus  2013   40.9    18.9
##  6 cayuga       2013   49.0    14.2
##  7 chautauqua   2013   40.5    19.1
##  8 chemung      2013   45.3    17  
##  9 chenango     2013   44.3    16.8
## 10 clinton      2013   45.9    15.7
## # ... with 238 more rows

Now our data is tidy.

Visualization

Now that we have all three data sets loaded, let’s look at them and see what we patterns we can easily detect.

First, we look at our variable of interest, overdose deaths, and plot them on a map of New York State:

## Breaking out data into bins
breaks <- c(0, seq(25,250,by=25))

opioids %>% filter(year == "2013") %>%
  NYMap("deaths","Opioid Overdose Deaths","2013",
        "Deaths\nPer 100,000", breaks)

opioids %>% filter(year == "2014") %>%
  NYMap("deaths","Opioid Overdose Deaths","2014",
        "Deaths\nPer 100,000", breaks)

opioids %>% filter(year == "2015") %>%
  NYMap("deaths","Opioid Overdose Deaths","2015",
        "Deaths\nPer 100,000", breaks)

Looking at the maps, there are a few counties with a larger number of overdose deaths than others. Specifically, Erie, Suffolk, and Nassau seem to be some of the worst areas.

Let’s do the same for our unemployment data:

breaks <- c(1, seq(2,12,by=1))

unemployment %>% filter(year == "2013") %>%
  NYMap("meanRate","Unemployment Rate","2013",
        "Rate", breaks)

unemployment %>% filter(year == "2014") %>%
  NYMap("meanRate","Unemployment Rate","2014",
        "Rate", breaks)

unemployment %>% filter(year == "2015") %>%
  NYMap("meanRate","Unemployment Rate","2015",
        "Rate", breaks)

Surprisingly, the unemployment data in some of the worst counties for opioid deaths isn’t very bad. In fact, it seems to get better from 2013 to 2015.

How about the poverty rate? Let’s map those as well:

breaks <- c(5, seq(10,35,by=5))

poverty %>% filter(year == "2013") %>%
  NYMap("poverty","Poverty Rate","2013",
        "Rate", breaks)

poverty %>% filter(year == "2014") %>%
  NYMap("poverty","Poverty Rate","2014",
        "Rate", breaks)

poverty %>% filter(year == "2015") %>%
  NYMap("poverty","Poverty Rate","2015",
        "Rate", breaks)

The poverty rate also appears to be low in counties where opioid deaths are high. We can also look at the median incomes of each county:

breaks <- c(30, seq(40,110,by=10))

poverty %>% filter(year == "2013") %>%
  NYMap("income","Median Income","2013",
        "Thousands of Dollars", breaks)

poverty %>% filter(year == "2014") %>%
  NYMap("income","Median Income","2014",
        "Thousands of Dollars", breaks)

poverty %>% filter(year == "2015") %>%
  NYMap("income","Median Income","2015",
        "Thousands of Dollars", breaks)

We see rather high median incomes in some of the problematic counties, which we mostly expected from the poverty levels above.

Analysis

To support our analysis, we will join our data frames so we have all the variables we will be using in a single data frame.

joint <- inner_join(opioids,unemployment,
                    by=c("county" = "county","year" = "year")) %>%
  inner_join(poverty, by=c("county" = "county","year" = "year"))

Let’s start by plotting deaths versus unemployment rates.

joint %>%
  ggplot(aes(x=meanRate, y=deaths, col=as.factor(year))) +
  geom_point() + 
  facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
  ylab("Opioid Deaths") + xlab("Unemployment Rate")

There doesn’t appear to be much of a linear relationship here. In fact, the variables do not seem to be very correlated:

cor(joint$deaths, joint$meanRate)
## [1] -0.1069796

Next we look at poverty rates:

joint %>%
  ggplot(aes(x=poverty, y=deaths, col=as.factor(year))) +
  geom_point() + 
  facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
  ylab("Opioid Deaths") + xlab("Poverty Rate")

The poverty variable seems to have little in the way of a linear relationship with the number of opioid deaths as well.

cor(joint$deaths, joint$poverty)
## [1] 0.008518504

Finally, we explore median income:

joint %>%
  ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
  geom_point() + 
  facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
  ylab("Opioid Deaths") + xlab("Median Income (1,000's $)")

Strangely, income too appears to have somewhat of a relationship to the number of overdose deaths.

cor(joint$deaths, joint$income)
## [1] 0.4575836

Analysis

Let’s dig deeper into the income variable and see how it relates to the overdose deaths. Let’s take 2013 first:

joint %>%
  filter(year == "2013") %>%
  ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
  geom_point() + 
  scale_color_discrete("Year") +
  ylab("Opioid Deaths") + xlab("Median Income (1,000's $)")

Let’s try to fit a linear regression line:

mod_income13 <- lm(deaths ~ income, data = joint[which(joint$year==2013),])
summary(mod_income13)
## 
## Call:
## lm(formula = deaths ~ income, data = joint[which(joint$year == 
##     2013), ])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -77.783 -16.162  -8.252  -0.015 130.679 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -59.3863    19.3503  -3.069  0.00322 ** 
## income        1.6212     0.3576   4.534 2.82e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 35.96 on 60 degrees of freedom
## Multiple R-squared:  0.2552, Adjusted R-squared:  0.2428 
## F-statistic: 20.56 on 1 and 60 DF,  p-value: 2.821e-05

There is some linear relationship:

joint %>%
  filter(year == "2013") %>%
  ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
  geom_point() + 
  scale_color_discrete("Year") +
  ylab("Opioid Deaths") + xlab("Median Income (1,000's $)") +
  geom_line(aes(y=fitted(mod_income13)),linetype = 3)

However, what does that mean in context? Why would an increase in income be associated with somewhat of a mild increase in opioid deaths? Apparently the Median Income variable is acting here as a surrogate for something else.

Works Cited


  1. https://www.health.ny.gov/statistics/opioid/data/d2.htm

  2. https://data.ny.gov/Economic-Development/Local-Area-Unemployment-Statistics-Beginning-1976/5hyu-bdh8

  3. https://www.census.gov/data-tools/demo/saipe/saipe.html?s_appName=saipe&map_yearSelector=2013&map_geoSelector=aa_c&s_state=36&s_year=2016,2013