The goal of this project is to look at rates of opioid overdoses within New York state and compare to a variety of socioeconomic factors to see if there is a correlation between them.
The opioid crisis is a well-documented public health crisis. The rate of overdose deaths due to opioids (e.g. oxycodone, heroin, etc.) has been on the rise for years and local, state, and federal governments have struggled to find solutions.
Data for this analysis comes in three parts: data on overdose deaths, unemployment data, and poverty and income.
New York State tracks opioid overdose deaths by county on their website1. To get this data we need to scrape it from the site.
library(rvest)
rawHTML <- read_html("https://www.health.ny.gov/statistics/opioid/data/d2.htm")
Once we have the raw HTML, we can then parse it to get our raw data:
# Get the table from the HTML
rawTable <- html_table(rawHTML)[[1]]
### Fix up the data frame ###
# Fix column names
colnames(rawTable) <- c("county","deaths_2013","deaths_2014",
"deaths_2015","total_deaths","avgPop",
"rate","adjRate")
# Remove region titles and region totals
opioids <- rawTable[-grep("[[:space:]]?Reg",rawTable$county),]
Next we tidy the data:
# Put years into rows
opioids <- gather(opioids, key="year", value = "deaths",
deaths_2013, deaths_2014, deaths_2015)
opioids$year <- as.numeric(str_extract(opioids$year,"[0-9]+"))
# Standardize our counties
opioids$county <- tolower(opioids$county)
opioids$county <- str_replace(opioids$county,"\\.","")
# Fix column types
opioids$total_deaths <- as.numeric(str_replace_all(opioids$total_deaths,"(\\*|,)",""))
opioids$deaths <- as.numeric(str_replace_all(opioids$deaths,"(\\*|,)",""))
opioids$avgPop <- as.numeric(str_replace_all(opioids$avgPop,"(\\*|,)",""))
opioids$rate <- as.numeric(str_replace_all(opioids$rate,"(\\*|,)",""))
opioids$adjRate <- as.numeric(str_replace_all(opioids$adjRate,"(\\*|,)",""))
opioids
## county total_deaths avgPop rate adjRate year deaths
## 1 nassau 446 1357374 11.0 11.5 2013 131
## 2 suffolk 618 1501431 13.7 14.3 2013 209
## 3 bronx 382 1437445 8.9 8.8 2013 105
## 4 kings 486 2616892 6.2 6.1 2013 126
## 5 new york 313 1635648 6.4 5.8 2013 94
## 6 queens 351 2318968 5.0 4.7 2013 120
## 7 richmond 199 473486 14.0 14.1 2013 61
## 8 dutchess 146 296416 16.4 17.1 2013 52
## 9 orange 170 376446 15.1 15.8 2013 46
## 10 putnam 34 99391 11.4 12.4 2013 12
## 11 rockland 66 323602 6.8 7.6 2013 23
## 12 sullivan 56 75828 24.6 26.7 2013 17
## 13 ulster 71 180529 13.1 13.9 2013 25
## 14 westchester 223 972611 7.6 7.6 2013 71
## 15 albany 77 308166 8.3 8.8 2013 24
## 16 columbia 22 61958 11.8 11.5 2013 7
## 17 greene 22 48016 15.3 16.6 2013 5
## 18 rensselaer 36 159986 7.5 8.1 2013 12
## 19 saratoga 49 225012 7.3 7.7 2013 12
## 20 schenectady 28 155224 6.0 5.6 2013 11
## 21 fulton 6 54228 3.7 3.8 2013 3
## 22 herkimer 17 63675 8.9 10.7 2013 7
## 23 montgomery 11 49773 7.4 8.9 2013 3
## 24 otsego 13 61149 7.1 8.3 2013 3
## 25 schoharie 4 31580 4.2 4.1 2013 2
## 26 clinton 20 81491 8.2 6.9 2013 5
## 27 essex 14 38640 12.1 13.4 2013 4
## 28 franklin 7 51203 4.6 4.7 2013 1
## 29 hamilton 1 4733 7.0 2.8 2013 0
## 30 warren 7 64999 3.6 3.1 2013 0
## 31 washington 5 62565 2.7 2.6 2013 1
## 32 jefferson 34 118747 9.5 9.8 2013 12
## 33 lewis 5 27109 6.1 6.6 2013 3
## 34 st lawrence 31 111457 9.3 10.2 2013 9
## 35 cayuga 32 78863 13.5 14.1 2013 12
## 36 cortland 12 48831 8.2 8.9 2013 0
## 37 madison 18 72200 8.3 9.8 2013 4
## 38 oneida 88 232985 12.6 13.4 2013 17
## 39 onondaga 174 468349 12.4 12.6 2013 47
## 40 oswego 48 120741 13.3 13.8 2013 18
## 41 broome 77 197150 13.0 14.1 2013 22
## 42 chenango 17 49258 11.5 13.3 2013 4
## 43 delaware 9 46452 6.5 8.4 2013 1
## 44 tioga 18 49855 12.0 13.6 2013 4
## 45 tompkins 32 104411 10.2 11.5 2013 7
## 46 chemung 26 87782 9.9 10.6 2013 9
## 47 livingston 9 64669 4.6 5.3 2013 3
## 48 monroe 237 749688 10.5 10.4 2013 68
## 49 ontario 25 109457 7.6 8.1 2013 9
## 50 schuyler 4 18375 7.3 8.2 2013 3
## 51 seneca 7 35042 6.7 7.7 2013 2
## 52 steuben 15 98225 5.1 5.1 2013 5
## 53 wayne 8 91990 2.9 2.8 2013 2
## 54 yates 5 25137 6.6 6.6 2013 1
## 55 allegany 4 47769 2.8 3.6 2013 1
## 56 cattaraugus 15 78471 6.4 7.1 2013 3
## 57 chautauqua 39 131971 9.9 10.9 2013 11
## 58 erie 444 921760 16.1 16.5 2013 94
## 59 genesee 13 59184 7.3 7.3 2013 2
## 60 niagara 94 213475 14.7 15.9 2013 26
## 61 orleans 10 41934 7.9 8.4 2013 5
## 62 wyoming 11 41244 8.9 10.0 2013 8
## 63 new york state 5461 19731048 9.2 9.0 2013 1604
## 64 nassau 446 1357374 11.0 11.5 2014 143
## 65 suffolk 618 1501431 13.7 14.3 2014 199
## 66 bronx 382 1437445 8.9 8.8 2014 102
## 67 kings 486 2616892 6.2 6.1 2014 163
## 68 new york 313 1635648 6.4 5.8 2014 106
## 69 queens 351 2318968 5.0 4.7 2014 113
## 70 richmond 199 473486 14.0 14.1 2014 73
## 71 dutchess 146 296416 16.4 17.1 2014 39
## 72 orange 170 376446 15.1 15.8 2014 58
## 73 putnam 34 99391 11.4 12.4 2014 9
## 74 rockland 66 323602 6.8 7.6 2014 16
## 75 sullivan 56 75828 24.6 26.7 2014 20
## 76 ulster 71 180529 13.1 13.9 2014 19
## 77 westchester 223 972611 7.6 7.6 2014 67
## 78 albany 77 308166 8.3 8.8 2014 23
## 79 columbia 22 61958 11.8 11.5 2014 9
## 80 greene 22 48016 15.3 16.6 2014 9
## 81 rensselaer 36 159986 7.5 8.1 2014 7
## 82 saratoga 49 225012 7.3 7.7 2014 19
## 83 schenectady 28 155224 6.0 5.6 2014 5
## 84 fulton 6 54228 3.7 3.8 2014 1
## 85 herkimer 17 63675 8.9 10.7 2014 4
## 86 montgomery 11 49773 7.4 8.9 2014 3
## 87 otsego 13 61149 7.1 8.3 2014 3
## 88 schoharie 4 31580 4.2 4.1 2014 0
## 89 clinton 20 81491 8.2 6.9 2014 6
## 90 essex 14 38640 12.1 13.4 2014 3
## 91 franklin 7 51203 4.6 4.7 2014 2
## 92 hamilton 1 4733 7.0 2.8 2014 1
## 93 warren 7 64999 3.6 3.1 2014 3
## 94 washington 5 62565 2.7 2.6 2014 3
## 95 jefferson 34 118747 9.5 9.8 2014 13
## 96 lewis 5 27109 6.1 6.6 2014 1
## 97 st lawrence 31 111457 9.3 10.2 2014 10
## 98 cayuga 32 78863 13.5 14.1 2014 5
## 99 cortland 12 48831 8.2 8.9 2014 5
## 100 madison 18 72200 8.3 9.8 2014 8
## 101 oneida 88 232985 12.6 13.4 2014 36
## 102 onondaga 174 468349 12.4 12.6 2014 56
## 103 oswego 48 120741 13.3 13.8 2014 15
## 104 broome 77 197150 13.0 14.1 2014 27
## 105 chenango 17 49258 11.5 13.3 2014 7
## 106 delaware 9 46452 6.5 8.4 2014 2
## 107 tioga 18 49855 12.0 13.6 2014 7
## 108 tompkins 32 104411 10.2 11.5 2014 14
## 109 chemung 26 87782 9.9 10.6 2014 7
## 110 livingston 9 64669 4.6 5.3 2014 2
## 111 monroe 237 749688 10.5 10.4 2014 88
## 112 ontario 25 109457 7.6 8.1 2014 10
## 113 schuyler 4 18375 7.3 8.2 2014 0
## 114 seneca 7 35042 6.7 7.7 2014 1
## 115 steuben 15 98225 5.1 5.1 2014 6
## 116 wayne 8 91990 2.9 2.8 2014 3
## 117 yates 5 25137 6.6 6.6 2014 2
## 118 allegany 4 47769 2.8 3.6 2014 1
## 119 cattaraugus 15 78471 6.4 7.1 2014 1
## 120 chautauqua 39 131971 9.9 10.9 2014 14
## 121 erie 444 921760 16.1 16.5 2014 112
## 122 genesee 13 59184 7.3 7.3 2014 2
## 123 niagara 94 213475 14.7 15.9 2014 25
## 124 orleans 10 41934 7.9 8.4 2014 2
## 125 wyoming 11 41244 8.9 10.0 2014 0
## 126 new york state 5461 19731048 9.2 9.0 2014 1710
## 127 nassau 446 1357374 11.0 11.5 2015 172
## 128 suffolk 618 1501431 13.7 14.3 2015 210
## 129 bronx 382 1437445 8.9 8.8 2015 175
## 130 kings 486 2616892 6.2 6.1 2015 197
## 131 new york 313 1635648 6.4 5.8 2015 113
## 132 queens 351 2318968 5.0 4.7 2015 118
## 133 richmond 199 473486 14.0 14.1 2015 65
## 134 dutchess 146 296416 16.4 17.1 2015 55
## 135 orange 170 376446 15.1 15.8 2015 66
## 136 putnam 34 99391 11.4 12.4 2015 13
## 137 rockland 66 323602 6.8 7.6 2015 27
## 138 sullivan 56 75828 24.6 26.7 2015 19
## 139 ulster 71 180529 13.1 13.9 2015 27
## 140 westchester 223 972611 7.6 7.6 2015 85
## 141 albany 77 308166 8.3 8.8 2015 30
## 142 columbia 22 61958 11.8 11.5 2015 6
## 143 greene 22 48016 15.3 16.6 2015 8
## 144 rensselaer 36 159986 7.5 8.1 2015 17
## 145 saratoga 49 225012 7.3 7.7 2015 18
## 146 schenectady 28 155224 6.0 5.6 2015 12
## 147 fulton 6 54228 3.7 3.8 2015 2
## 148 herkimer 17 63675 8.9 10.7 2015 6
## 149 montgomery 11 49773 7.4 8.9 2015 5
## 150 otsego 13 61149 7.1 8.3 2015 7
## 151 schoharie 4 31580 4.2 4.1 2015 2
## 152 clinton 20 81491 8.2 6.9 2015 9
## 153 essex 14 38640 12.1 13.4 2015 7
## 154 franklin 7 51203 4.6 4.7 2015 4
## 155 hamilton 1 4733 7.0 2.8 2015 0
## 156 warren 7 64999 3.6 3.1 2015 4
## 157 washington 5 62565 2.7 2.6 2015 1
## 158 jefferson 34 118747 9.5 9.8 2015 9
## 159 lewis 5 27109 6.1 6.6 2015 1
## 160 st lawrence 31 111457 9.3 10.2 2015 12
## 161 cayuga 32 78863 13.5 14.1 2015 15
## 162 cortland 12 48831 8.2 8.9 2015 7
## 163 madison 18 72200 8.3 9.8 2015 6
## 164 oneida 88 232985 12.6 13.4 2015 35
## 165 onondaga 174 468349 12.4 12.6 2015 71
## 166 oswego 48 120741 13.3 13.8 2015 15
## 167 broome 77 197150 13.0 14.1 2015 28
## 168 chenango 17 49258 11.5 13.3 2015 6
## 169 delaware 9 46452 6.5 8.4 2015 6
## 170 tioga 18 49855 12.0 13.6 2015 7
## 171 tompkins 32 104411 10.2 11.5 2015 11
## 172 chemung 26 87782 9.9 10.6 2015 10
## 173 livingston 9 64669 4.6 5.3 2015 4
## 174 monroe 237 749688 10.5 10.4 2015 81
## 175 ontario 25 109457 7.6 8.1 2015 6
## 176 schuyler 4 18375 7.3 8.2 2015 1
## 177 seneca 7 35042 6.7 7.7 2015 4
## 178 steuben 15 98225 5.1 5.1 2015 4
## 179 wayne 8 91990 2.9 2.8 2015 3
## 180 yates 5 25137 6.6 6.6 2015 2
## 181 allegany 4 47769 2.8 3.6 2015 2
## 182 cattaraugus 15 78471 6.4 7.1 2015 11
## 183 chautauqua 39 131971 9.9 10.9 2015 14
## 184 erie 444 921760 16.1 16.5 2015 238
## 185 genesee 13 59184 7.3 7.3 2015 9
## 186 niagara 94 213475 14.7 15.9 2015 43
## 187 orleans 10 41934 7.9 8.4 2015 3
## 188 wyoming 11 41244 8.9 10.0 2015 3
## 189 new york state 5461 19731048 9.2 9.0 2015 2147
Next we turn to unemployment rates. We can also get these data from NYS 2. This time, the data is in a much easier to retrieve CSV format:
unemployment <- read_csv("NY_unemployment.csv")
unemployment
## # A tibble: 186 x 6
## year county meanRate meanLabor meanEmployed meanUnemployed
## <int> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 2013 albany 6.06 160808. 151075 9750
## 2 2013 allegany 7.5 23808. 22000 1783.
## 3 2013 bronx 11.8 603450 532425 71050
## 4 2013 broome 7.76 91925 84767. 7158.
## 5 2013 cattaraugus 8.51 38108. 34850 3250
## 6 2013 cayuga 7.37 38858. 35992. 2867.
## 7 2013 chautauqua 8.02 60233. 55400 4825
## 8 2013 chemung 7.88 39600 36475 3133.
## 9 2013 chenango 7.32 24100 22333. 1758.
## 10 2013 clinton 8.31 37167. 34083. 3083.
## # ... with 176 more rows
This data is already in a tidy format, so we don’t need to do anything to it.
Finally, we gather our poverty and income data from the Census Bureau3. This data is also in a CSV format, which makes importing a bit easier:
rawPoverty <- read_csv(url("https://raw.githubusercontent.com/lysanthus/Data607/master/Final/poverty.csv"))
rawPoverty$county <- tolower(rawPoverty$county)
rawPoverty
## # A tibble: 125 x 44
## year state countyID county `All Ages SAIPE Pove… `All Ages in Pover…
## <int> <int> <int> <chr> <dbl> <dbl>
## 1 2016 36 36001 albany 293097 35585
## 2 2013 36 36001 albany 291194 39857
## 3 2016 36 36003 allegany 42697 7836
## 4 2013 36 36003 allegany 43635 7296
## 5 2016 36 36005 bronx 1418238 405516
## 6 2013 36 36005 bronx 1381104 423904
## 7 2016 36 36007 broome 184887 30417
## 8 2013 36 36007 broome 187458 33205
## 9 2016 36 36009 cattara… 75207 11014
## 10 2013 36 36009 cattara… 76451 14442
## # ... with 115 more rows, and 38 more variables: `All Ages in Poverty
## # Count LB 90%` <dbl>, `All Ages in Poverty Count UB 90%` <dbl>, `90%
## # Confidence Interval (All Ages in Poverty Count)` <chr>, `All Ages in
## # Poverty Percent` <dbl>, `All Ages in Poverty Percent LB 90%` <dbl>,
## # `All Ages in Poverty Percent UB 90%` <dbl>, `90% Confidence Interval
## # (All Ages in Poverty Percent)` <chr>, `Under Age 18 SAIPE Poverty
## # Universe` <dbl>, `Under Age 18 in Poverty Count` <dbl>, `Under Age 18
## # in Poverty Count LB 90%` <dbl>, `Under Age 18 in Poverty Count UB
## # 90%` <dbl>, `90% Confidence Interval (Under Age 18 in Poverty
## # Count)` <chr>, `Under Age 18 in Poverty Percent` <dbl>, `Under Age 18
## # in Poverty Percent LB 90%` <dbl>, `Under Age 18 in Poverty Percent UB
## # 90%` <dbl>, `90% Confidence Interval (Under Age 18 in Poverty
## # Percent)` <chr>, `Ages 5 to 17 in Families SAIPE Poverty
## # Universe` <dbl>, `Ages 5 to 17 in Families in Poverty Count` <dbl>,
## # `Ages 5 to 17 in Families in Poverty Count LB 90%` <dbl>, `Ages 5 to
## # 17 in Families in Poverty Count UB 90%` <dbl>, `90% Confidence
## # Interval (Ages 5 to 17 in Families in Poverty Count)` <chr>, `Ages 5
## # to 17 in Families in Poverty Percent` <dbl>, `Ages 5 to 17 in Families
## # in Poverty Percent LB 90%` <dbl>, `Ages 5 to 17 in Families in Poverty
## # Percent UB 90%` <dbl>, `90% Confidence Interval (Ages 5 to 17 in
## # Families in Poverty Percent)` <chr>, `Under Age 5 SAIPE Poverty
## # Universe` <chr>, `Under Age 5 in Poverty Count` <chr>, `Under Age 5 in
## # Poverty Count LB 90%` <chr>, `Under Age 5 in Poverty Count UB
## # 90%` <chr>, `90% Confidence Interval (Under Age 5 in Poverty
## # Count)` <chr>, `Under Age 5 in Poverty Percent` <chr>, `Under Age 5 in
## # Poverty Percent LB 90%` <chr>, `Under Age 5 in Poverty Percent UB
## # 90%` <chr>, `90% Confidence Interval (Under Age 5 in Poverty
## # Percent)` <chr>, `Median Household Income in Dollars` <chr>, `Median
## # Household Income in Dollars LB 90%` <chr>, `Median Household Income in
## # Dollars UB 90%` <chr>, `90% Confidence Interval (Median Household
## # Income in Dollars)` <chr>
The CSV contains several variables, however for this analysis we will look at only poverty rate and median incomes for each county.
Also, we have values for 2013 and 2016 only. So we will linearly impute the middle values of 2014 and 2015 as equally distant from 2013 and 2016:
# 2013 values
pov13 <- rawPoverty %>% filter(year == 2013) %>% select(county,pct = `All Ages in Poverty Percent`, inc = `Median Household Income in Dollars`)
# 2016 values
pov16 <- rawPoverty %>% filter(year == 2016) %>% select(county,pct = `All Ages in Poverty Percent`, inc = `Median Household Income in Dollars`)
# Fix income values by removing $ and ,
pov13$inc <- as.numeric(str_replace_all(pov13$inc,"\\$|,",""))
pov16$inc <- as.numeric(str_replace_all(pov16$inc,"\\$|,",""))
# Combine our data frames
poverty <- inner_join(pov13, pov16, by=c("county" = "county"),
suffix = c("_2013","_2016"))
# Compute changes and impute interim values
poverty <- poverty %>%
mutate(povChg = pct_2016 - pct_2013, incChg = inc_2016 - inc_2013,
povIncrement = povChg / 3, incIncrement = incChg / 3,
pct_2014 = pct_2013 + povIncrement, pct_2015 = pct_2016 - povIncrement,
inc_2014 = inc_2013 + incIncrement, inc_2015 = inc_2016 - incIncrement)
# Tidy the data
poverty <- poverty %>%
gather(key="year",
value="value",
pct_2013,pct_2014,pct_2015,pct_2016,
inc_2013,inc_2014,inc_2015,inc_2016)
poverty <- poverty %>% separate(year,c("measure","yr"),"_")
poverty <- poverty %>% spread(key="measure",value="value") %>% select(county, year = yr, income = inc, poverty = pct)
poverty$year <- as.numeric(poverty$year)
# Adjust income to 1,000's scale
poverty$income <- round(poverty$income/1000,2)
poverty
## # A tibble: 248 x 4
## county year income poverty
## <chr> <dbl> <dbl> <dbl>
## 1 albany 2013 55.8 13.7
## 2 allegany 2013 41.8 16.7
## 3 bronx 2013 33.1 30.7
## 4 broome 2013 45.1 17.7
## 5 cattaraugus 2013 40.9 18.9
## 6 cayuga 2013 49.0 14.2
## 7 chautauqua 2013 40.5 19.1
## 8 chemung 2013 45.3 17
## 9 chenango 2013 44.3 16.8
## 10 clinton 2013 45.9 15.7
## # ... with 238 more rows
Now our data is tidy.
Now that we have all three data sets loaded, let’s look at them and see what we patterns we can easily detect.
First, we look at our variable of interest, overdose deaths, and plot them on a map of New York State:
## Breaking out data into bins
breaks <- c(0, seq(25,250,by=25))
opioids %>% filter(year == "2013") %>%
NYMap("deaths","Opioid Overdose Deaths","2013",
"Deaths\nPer 100,000", breaks)
opioids %>% filter(year == "2014") %>%
NYMap("deaths","Opioid Overdose Deaths","2014",
"Deaths\nPer 100,000", breaks)
opioids %>% filter(year == "2015") %>%
NYMap("deaths","Opioid Overdose Deaths","2015",
"Deaths\nPer 100,000", breaks)
Looking at the maps, there are a few counties with a larger number of overdose deaths than others. Specifically, Erie, Suffolk, and Nassau seem to be some of the worst areas.
Let’s do the same for our unemployment data:
breaks <- c(1, seq(2,12,by=1))
unemployment %>% filter(year == "2013") %>%
NYMap("meanRate","Unemployment Rate","2013",
"Rate", breaks)
unemployment %>% filter(year == "2014") %>%
NYMap("meanRate","Unemployment Rate","2014",
"Rate", breaks)
unemployment %>% filter(year == "2015") %>%
NYMap("meanRate","Unemployment Rate","2015",
"Rate", breaks)
Surprisingly, the unemployment data in some of the worst counties for opioid deaths isn’t very bad. In fact, it seems to get better from 2013 to 2015.
How about the poverty rate? Let’s map those as well:
breaks <- c(5, seq(10,35,by=5))
poverty %>% filter(year == "2013") %>%
NYMap("poverty","Poverty Rate","2013",
"Rate", breaks)
poverty %>% filter(year == "2014") %>%
NYMap("poverty","Poverty Rate","2014",
"Rate", breaks)
poverty %>% filter(year == "2015") %>%
NYMap("poverty","Poverty Rate","2015",
"Rate", breaks)
The poverty rate also appears to be low in counties where opioid deaths are high. We can also look at the median incomes of each county:
breaks <- c(30, seq(40,110,by=10))
poverty %>% filter(year == "2013") %>%
NYMap("income","Median Income","2013",
"Thousands of Dollars", breaks)
poverty %>% filter(year == "2014") %>%
NYMap("income","Median Income","2014",
"Thousands of Dollars", breaks)
poverty %>% filter(year == "2015") %>%
NYMap("income","Median Income","2015",
"Thousands of Dollars", breaks)
We see rather high median incomes in some of the problematic counties, which we mostly expected from the poverty levels above.
To support our analysis, we will join our data frames so we have all the variables we will be using in a single data frame.
joint <- inner_join(opioids,unemployment,
by=c("county" = "county","year" = "year")) %>%
inner_join(poverty, by=c("county" = "county","year" = "year"))
Let’s start by plotting deaths versus unemployment rates.
joint %>%
ggplot(aes(x=meanRate, y=deaths, col=as.factor(year))) +
geom_point() +
facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
ylab("Opioid Deaths") + xlab("Unemployment Rate")
There doesn’t appear to be much of a linear relationship here. In fact, the variables do not seem to be very correlated:
cor(joint$deaths, joint$meanRate)
## [1] -0.1069796
Next we look at poverty rates:
joint %>%
ggplot(aes(x=poverty, y=deaths, col=as.factor(year))) +
geom_point() +
facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
ylab("Opioid Deaths") + xlab("Poverty Rate")
The poverty variable seems to have little in the way of a linear relationship with the number of opioid deaths as well.
cor(joint$deaths, joint$poverty)
## [1] 0.008518504
Finally, we explore median income:
joint %>%
ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
geom_point() +
facet_wrap(~ year, nrow = 3) + scale_color_discrete("Year") +
ylab("Opioid Deaths") + xlab("Median Income (1,000's $)")
Strangely, income too appears to have somewhat of a relationship to the number of overdose deaths.
cor(joint$deaths, joint$income)
## [1] 0.4575836
Let’s dig deeper into the income variable and see how it relates to the overdose deaths. Let’s take 2013 first:
joint %>%
filter(year == "2013") %>%
ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
geom_point() +
scale_color_discrete("Year") +
ylab("Opioid Deaths") + xlab("Median Income (1,000's $)")
Let’s try to fit a linear regression line:
mod_income13 <- lm(deaths ~ income, data = joint[which(joint$year==2013),])
summary(mod_income13)
##
## Call:
## lm(formula = deaths ~ income, data = joint[which(joint$year ==
## 2013), ])
##
## Residuals:
## Min 1Q Median 3Q Max
## -77.783 -16.162 -8.252 -0.015 130.679
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -59.3863 19.3503 -3.069 0.00322 **
## income 1.6212 0.3576 4.534 2.82e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.96 on 60 degrees of freedom
## Multiple R-squared: 0.2552, Adjusted R-squared: 0.2428
## F-statistic: 20.56 on 1 and 60 DF, p-value: 2.821e-05
There is some linear relationship:
joint %>%
filter(year == "2013") %>%
ggplot(aes(x=income, y=deaths, col=as.factor(year))) +
geom_point() +
scale_color_discrete("Year") +
ylab("Opioid Deaths") + xlab("Median Income (1,000's $)") +
geom_line(aes(y=fitted(mod_income13)),linetype = 3)
However, what does that mean in context? Why would an increase in income be associated with somewhat of a mild increase in opioid deaths? Apparently the Median Income variable is acting here as a surrogate for something else.