Data and Programs

library(dplyr)
library(ggplot2)
library(kableExtra)
DPM <- tbl_df(read.csv(file = "Drug_Poisoning_Mortality_by_State__United_States.csv"))

Evaluated data set on drug poisoning deaths in the United States on the state level. Data provided at data.gov

Summary

summary (DPM)
##            State           Year              Sex          Age.Group   
##  United States:1944   Min.   :1999   Both Sexes:1566   All Ages:1134  
##  Alabama      :  18   1st Qu.:2003   Female    : 648   0–14  : 216  
##  Alaska       :  18   Median :2008   Male      : 648   15–24 : 216  
##  Arizona      :  18   Mean   :2008                     25–34 : 216  
##  Arkansas     :  18   3rd Qu.:2012                     35–44 : 216  
##  California   :  18   Max.   :2016                     45–54 : 216  
##  (Other)      : 828                                    (Other) : 648  
##           Race.and.Hispanic.Origin     Deaths          Population       
##  All Races-All Origins:1404        Min.   :    1.0   Min.   :   238163  
##  Hispanic             : 486        1st Qu.:  126.2   1st Qu.:  2730651  
##  Non-Hispanic Black   : 486        Median :  491.0   Median :  6423686  
##  Non-Hispanic White   : 486        Mean   : 1966.6   Mean   : 16650235  
##                                    3rd Qu.: 1526.8   3rd Qu.: 18165691  
##                                    Max.   :63632.0   Max.   :323127513  
##                                                                         
##  Crude.Death.Rate  Standard.Error.for.Crude.Rate
##  Min.   : 0.0389   Min.   :0.01428              
##  1st Qu.: 3.9705   1st Qu.:0.19991              
##  Median : 9.0742   Median :0.36146              
##  Mean   :10.6588   Mean   :0.43556              
##  3rd Qu.:14.8954   3rd Qu.:0.57133              
##  Max.   :68.3122   Max.   :2.40780              
##                                                 
##  Lower.Confidence.Limit.for.Crude.Rate
##  Min.   : 0.008                       
##  1st Qu.: 3.393                       
##  Median : 8.210                       
##  Mean   : 9.823                       
##  3rd Qu.:13.724                       
##  Max.   :66.889                       
##                                       
##  Upper.Confidence.Limit.for.Crude.Rate Age.adjusted.Rate
##  Min.   : 0.0951                       Min.   : 1.821   
##  1st Qu.: 4.6711                       1st Qu.: 7.522   
##  Median : 9.9497                       Median :10.956   
##  Mean   :11.5509                       Mean   :11.767   
##  3rd Qu.:16.0478                       3rd Qu.:14.627   
##  Max.   :69.7353                       Max.   :52.021   
##                                        NA's   :1728     
##  Standard.Error.for.Age.adjusted.Rate
##  Min.   :0.0467                      
##  1st Qu.:0.2631                      
##  Median :0.4664                      
##  Mean   :0.5528                      
##  3rd Qu.:0.7430                      
##  Max.   :2.4066                      
##  NA's   :1728                        
##  Lower.Confidence.Limit.for.Age.adjusted.Rate
##  Min.   : 0.9407                             
##  1st Qu.: 6.6791                             
##  Median :10.0210                             
##  Mean   :10.7010                             
##  3rd Qu.:13.3355                             
##  Max.   :48.5088                             
##  NA's   :1728                                
##  Upper.Confidence.Limit.for.Age.adjusted.Rate State.Crude.Rate.in.Range
##  Min.   : 2.210                                          :1728         
##  1st Qu.: 8.457                               10.3–12.6: 188         
##  Median :12.053                               1.8–7.1  : 187         
##  Mean   :12.886                               12.6–16  : 187         
##  3rd Qu.:15.893                               16–52    : 187         
##  Max.   :55.533                               7.1–10.2 : 187         
##  NA's   :1728                                 (Other)    : 198         
##  US.Crude.Rate    US.Age.adjusted.Rate
##  Min.   : 6.038   Min.   : 6.057      
##  1st Qu.: 8.888   1st Qu.: 8.877      
##  Median :11.970   Median :11.886      
##  Mean   :11.481   Mean   :11.441      
##  3rd Qu.:13.267   3rd Qu.:13.185      
##  Max.   :19.692   Max.   :19.785      
## 

Compare states

Data provided is from 1999 to 2016, getting a total count for each state. There are no NA’s but there is United States in the state column. Filter out United States and summarize by state.

State Death totals

Total.States <- DPM %>%
  filter(State != "United States") %>%
  select(State, Year, Deaths, Population) %>%
  group_by(State) %>%
  summarise(Total_Deaths = sum(Deaths)) %>%
  arrange(desc(Total_Deaths)) 

find the average population over the corse of time from 1999 to 2016 joined information from first set “Total.States” to current set Find pecentage of total deaths over average population

Pecents of total deaths per average state population from 1999 to 2016

pct_per_state <- DPM %>%
  filter(State != "United States") %>%
  select(State, Year, Deaths, Population) %>%
  group_by(State) %>%
  mutate(Mean_Deaths = mean(Deaths)) %>%
  summarise(Mean_Population = mean(Population)) %>%
  inner_join(y= Total.States, by = "State") %>%
  mutate(pct_population = (Total_Deaths/Mean_Population)*100) %>%
  arrange(desc(pct_population))

Select only the rows that have the highest death count per state count unique states is year 2016 and max death year

Pecent of states had their highest deathrate in 2016

max_pct <- DPM %>%
  filter(State != "United States") %>%
  select(State, Year, Deaths) %>%
  group_by(State) %>%
  mutate(max_death = max(Deaths)) %>%
  filter(Deaths == max_death) %>%
  ungroup() %>%
  mutate(year_max_death = (Year == 2016) & (Deaths == max_death)) %>%
  group_by() %>%
  summarise(sum(year_max_death)/(length(unique(State))))

lag find the top 50 times the percent of deaths jumped year to year find how many times each state is in the top 50. add column for difference between years remove all NA caused by not having previous years data pecentage of change from year to year arranged in order of highest to lowest

Top 50 times the percent of deaths jumped year to year

state_top_50 <- DPM %>%
  filter(State != "United States") %>%
  select(State, Year, Deaths) %>%
  group_by(State) %>%
  mutate(deaths_py = lag(Deaths)) %>%
  filter(!is.na(deaths_py)) %>%
  ungroup() %>%
  mutate(pct_change = ((Deaths - deaths_py)/Deaths)*100) %>%
  arrange(desc(pct_change)) %>%
  slice(1:50) %>%
  count(State) %>%
  arrange(desc(n))

conclution

DC and NC