Data and Programs
library(dplyr)
library(ggplot2)
library(kableExtra)
DPM <- tbl_df(read.csv(file = "Drug_Poisoning_Mortality_by_State__United_States.csv"))
Evaluated data set on drug poisoning deaths in the United States on the state level. Data provided at data.gov
Summary
summary (DPM)
## State Year Sex Age.Group
## United States:1944 Min. :1999 Both Sexes:1566 All Ages:1134
## Alabama : 18 1st Qu.:2003 Female : 648 0–14 : 216
## Alaska : 18 Median :2008 Male : 648 15–24 : 216
## Arizona : 18 Mean :2008 25–34 : 216
## Arkansas : 18 3rd Qu.:2012 35–44 : 216
## California : 18 Max. :2016 45–54 : 216
## (Other) : 828 (Other) : 648
## Race.and.Hispanic.Origin Deaths Population
## All Races-All Origins:1404 Min. : 1.0 Min. : 238163
## Hispanic : 486 1st Qu.: 126.2 1st Qu.: 2730651
## Non-Hispanic Black : 486 Median : 491.0 Median : 6423686
## Non-Hispanic White : 486 Mean : 1966.6 Mean : 16650235
## 3rd Qu.: 1526.8 3rd Qu.: 18165691
## Max. :63632.0 Max. :323127513
##
## Crude.Death.Rate Standard.Error.for.Crude.Rate
## Min. : 0.0389 Min. :0.01428
## 1st Qu.: 3.9705 1st Qu.:0.19991
## Median : 9.0742 Median :0.36146
## Mean :10.6588 Mean :0.43556
## 3rd Qu.:14.8954 3rd Qu.:0.57133
## Max. :68.3122 Max. :2.40780
##
## Lower.Confidence.Limit.for.Crude.Rate
## Min. : 0.008
## 1st Qu.: 3.393
## Median : 8.210
## Mean : 9.823
## 3rd Qu.:13.724
## Max. :66.889
##
## Upper.Confidence.Limit.for.Crude.Rate Age.adjusted.Rate
## Min. : 0.0951 Min. : 1.821
## 1st Qu.: 4.6711 1st Qu.: 7.522
## Median : 9.9497 Median :10.956
## Mean :11.5509 Mean :11.767
## 3rd Qu.:16.0478 3rd Qu.:14.627
## Max. :69.7353 Max. :52.021
## NA's :1728
## Standard.Error.for.Age.adjusted.Rate
## Min. :0.0467
## 1st Qu.:0.2631
## Median :0.4664
## Mean :0.5528
## 3rd Qu.:0.7430
## Max. :2.4066
## NA's :1728
## Lower.Confidence.Limit.for.Age.adjusted.Rate
## Min. : 0.9407
## 1st Qu.: 6.6791
## Median :10.0210
## Mean :10.7010
## 3rd Qu.:13.3355
## Max. :48.5088
## NA's :1728
## Upper.Confidence.Limit.for.Age.adjusted.Rate State.Crude.Rate.in.Range
## Min. : 2.210 :1728
## 1st Qu.: 8.457 10.3–12.6: 188
## Median :12.053 1.8–7.1 : 187
## Mean :12.886 12.6–16 : 187
## 3rd Qu.:15.893 16–52 : 187
## Max. :55.533 7.1–10.2 : 187
## NA's :1728 (Other) : 198
## US.Crude.Rate US.Age.adjusted.Rate
## Min. : 6.038 Min. : 6.057
## 1st Qu.: 8.888 1st Qu.: 8.877
## Median :11.970 Median :11.886
## Mean :11.481 Mean :11.441
## 3rd Qu.:13.267 3rd Qu.:13.185
## Max. :19.692 Max. :19.785
##
Compare states
Data provided is from 1999 to 2016, getting a total count for each state. There are no NA’s but there is United States in the state column. Filter out United States and summarize by state.
Total.States <- DPM %>%
filter(State != "United States") %>%
select(State, Year, Deaths, Population) %>%
group_by(State) %>%
summarise(Total_Deaths = sum(Deaths)) %>%
arrange(desc(Total_Deaths))
find the average population over the corse of time from 1999 to 2016 joined information from first set “Total.States” to current set Find pecentage of total deaths over average population
pct_per_state <- DPM %>%
filter(State != "United States") %>%
select(State, Year, Deaths, Population) %>%
group_by(State) %>%
mutate(Mean_Deaths = mean(Deaths)) %>%
summarise(Mean_Population = mean(Population)) %>%
inner_join(y= Total.States, by = "State") %>%
mutate(pct_population = (Total_Deaths/Mean_Population)*100) %>%
arrange(desc(pct_population))
Select only the rows that have the highest death count per state count unique states is year 2016 and max death year
max_pct <- DPM %>%
filter(State != "United States") %>%
select(State, Year, Deaths) %>%
group_by(State) %>%
mutate(max_death = max(Deaths)) %>%
filter(Deaths == max_death) %>%
ungroup() %>%
mutate(year_max_death = (Year == 2016) & (Deaths == max_death)) %>%
group_by() %>%
summarise(sum(year_max_death)/(length(unique(State))))
lag find the top 50 times the percent of deaths jumped year to year find how many times each state is in the top 50. add column for difference between years remove all NA caused by not having previous years data pecentage of change from year to year arranged in order of highest to lowest
state_top_50 <- DPM %>%
filter(State != "United States") %>%
select(State, Year, Deaths) %>%
group_by(State) %>%
mutate(deaths_py = lag(Deaths)) %>%
filter(!is.na(deaths_py)) %>%
ungroup() %>%
mutate(pct_change = ((Deaths - deaths_py)/Deaths)*100) %>%
arrange(desc(pct_change)) %>%
slice(1:50) %>%
count(State) %>%
arrange(desc(n))
conclution
DC and NC