Introduction

This data set provides statistics as well as age-adjusted drug poisoning mortality rates on drug poisoning deaths at the national and state level for the United States. Due to a large number of unresolved pending cases or an incorrect classification of the kind of poisoning, some states’ and years’ death rates may be considerably low. This data has been made available through the National Center of Health Statistics, which makes it a reliable source for our visualizations. There are a total of 19 variables with more than 2000 observations including both categorical and quantitative variables as listed below. I personally concentrated on the death rates according to age groups, years, and particular states; where I filtered 10 states, and age groups accordingly. I have mentioned below the reason of choosing the states I have, but I really like how I can look through the data of so many years through the functions of R. We can easily see the relations between drug intake in states according to years when the change is slowly shown in front of you. Also, I wanted to see the difference in death rates according to age groups, particularly from 2000 to 2015, without seeing all the changes in the middle.

According to the National Center of Health Statistics, the most common cause of injury-related mortality in the US is poisoning; and the majority of poisoning deaths are caused by drugs, both legal and illegal. We are taught from a very young age, I believe since 5th grade, the importance of staying away from harmful drugs. There are entire courses available, presentations in schools, giving people the awareness needed regarding substance abuse and addiction. In fact there are numerous facilities on the subject, all indicating towards the consequences of overdose. I believe this is a very important concept everyone, and as you will see in the data below, all age groups should be aware about. This is especially meaningful for me as well because I know more than one person who also knows more than one person.. and so on, going through these situations in their homes or to their loved ones. Everyone likely knows at least one person they care about who has lost their fight against drugs. They might not be addicted yet, but we all fear that inevitable day. Negative stigmas with regards to drugs have been ingrained in the minds of the masses. No one realizes how difficult the situation could be until they see a close one face it.

Load the libraries and import the data

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(gganimate)
library(RColorBrewer)

setwd
## function (dir) 
## .Internal(setwd(dir))
## <bytecode: 0x7fafc5070c08>
## <environment: namespace:base>
drugs <- read_csv("drug poisoning mortality by state NCHS.csv")
## Rows: 2862 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (6): State, Sex, Age Group, Race and Hispanic Origin, State Crude Rate ...
## dbl (13): Year, Deaths, Population, Crude Death Rate, Standard Error for Cru...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Load library for animated ggplot

library(gifski)
library(png)

Metadata

State - catagorical variable, Year - catagorical variable Sex - catagorical variable, this is one catagory for all states except United states where they are divided into different catagories. This goes for all the catagorical varibales of this data set. Age Group - catagorical variable Race - catagorical variable

Deaths - quantitative variable, number of death by drugs Population - quantitative variable, entire population crude death rate - quantitative variable, death rates according to population standard error for crude death rate - quantitative variable, the standard error of the crude death rates variable Lower confidence limit for crude death rate - quantitative variable, lower confidence interval for crude death rate Upper confidence limit for crude death rate - quantitative variable, upper confidence interval for crude death rate

Age adjusted rate - quantitative variable, a technique used to allow statistical populations to be compared when the age profiles of the populations are quite different. They are summary measures adjusted for differences in age distributions. standard error for age adjusted rate - quantitative variable Lower confidence limit for age adjusted rate - quantitative variable, lower confidence interval for age adjusted rate Upper confidence limit for age adjusted rate - quantitative variable, upper confidence interval for age adjusted rate state crude rate - catagorical variable, ranges of the state crude rates US crude rate - quantitative variable, crude of the entire US US age adjusted rate - quantitative variable Unit - catagorical variable, units per 100,000 population

Explore the data

head(drugs)
## # A tibble: 6 × 19
##   State    Year Sex       Age G…¹ Race …² Deaths Popul…³ Crude…⁴ Stand…⁵ Lower…⁶
##   <chr>   <dbl> <chr>     <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Alabama  1999 Both Sex… All Ag… All Ra…    169 4430143    3.81   0.293    3.24
## 2 Alabama  2000 Both Sex… All Ag… All Ra…    197 4447100    4.43   0.316    3.81
## 3 Alabama  2001 Both Sex… All Ag… All Ra…    216 4467634    4.83   0.329    4.19
## 4 Alabama  2002 Both Sex… All Ag… All Ra…    211 4480089    4.71   0.324    4.07
## 5 Alabama  2003 Both Sex… All Ag… All Ra…    197 4503491    4.37   0.312    3.76
## 6 Alabama  2004 Both Sex… All Ag… All Ra…    283 4530729    6.25   0.371    5.52
## # … with 9 more variables: `Upper Confidence Limit for Crude Rate` <dbl>,
## #   `Age-adjusted Rate` <dbl>, `Standard Error for Age-adjusted Rate` <dbl>,
## #   `Lower Confidence Limit for Age-adjusted Rate` <dbl>,
## #   `Upper Confidence Limit for Age-adjusted Rate` <dbl>,
## #   `State Crude Rate in Range` <chr>, `US Crude Rate` <dbl>,
## #   `US Age-adjusted Rate` <dbl>, Unit <chr>, and abbreviated variable names
## #   ¹​`Age Group`, ²​`Race and Hispanic Origin`, ³​Population, …
tail(drugs)
## # A tibble: 6 × 19
##   State    Year Sex       Age G…¹ Race …² Deaths Popul…³ Crude…⁴ Stand…⁵ Lower…⁶
##   <chr>   <dbl> <chr>     <chr>   <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Wyoming  2011 Both Sex… All Ag… All Ra…     85  568158    15.0    1.62    12.0
## 2 Wyoming  2012 Both Sex… All Ag… All Ra…     98  576412    17.0    1.72    13.8
## 3 Wyoming  2013 Both Sex… All Ag… All Ra…     98  582658    16.8    1.70    13.7
## 4 Wyoming  2014 Both Sex… All Ag… All Ra…    109  584153    18.7    1.79    15.2
## 5 Wyoming  2015 Both Sex… All Ag… All Ra…     96  586107    16.4    1.67    13.3
## 6 Wyoming  2016 Both Sex… All Ag… All Ra…     99  585501    16.9    1.70    13.7
## # … with 9 more variables: `Upper Confidence Limit for Crude Rate` <dbl>,
## #   `Age-adjusted Rate` <dbl>, `Standard Error for Age-adjusted Rate` <dbl>,
## #   `Lower Confidence Limit for Age-adjusted Rate` <dbl>,
## #   `Upper Confidence Limit for Age-adjusted Rate` <dbl>,
## #   `State Crude Rate in Range` <chr>, `US Crude Rate` <dbl>,
## #   `US Age-adjusted Rate` <dbl>, Unit <chr>, and abbreviated variable names
## #   ¹​`Age Group`, ²​`Race and Hispanic Origin`, ³​Population, …

Filter data according to 10 states.

When I studied in the Arabic Academy for a few years, we had a presentation on drug awareness and the amount of harmful side effects that drugs cause on individuals personally, as well as how it affects the people around them. In that presentation, few people volunteered to share stories from their personal lives where they might know someone who had gone through this. These 10 states stuck with me, as the stories I heard were very deep and inspiring. Therefore, I decided to filter my data according to these 10 states, and further research what the death rates are.

drugs2 <- filter(drugs, State == "Maryland" |State == "Florida" | State == "Hawaii" |State == "Michigan" |State == "Missouri" | State == "California" | State == "Texas" | State == "New York" | State == "Pennsylvania" | State == "Tennessee") 

Statistical analysis through a boxplot

library(ggplot2)
ggplot(data = drugs2, mapping = aes(x = reorder(State, Deaths), y = Deaths, fill = State)) +
  geom_boxplot() +
  labs(x=NULL, y="Deaths") +
  coord_flip() + # flip the boxplots to make the horizontal
  geom_hline(yintercept = 493, linetype="dashed", color = "red", size = 1) + # allows you to annotate the plot 
  theme_bw()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

Units for all numbers are per 100,000 of the population

As we can notice through this boxplot, there are very few outliers in these 10 states. this makes it easier to trust this data and make accurate predictions from these plots.

Filter data with only United States

In this data frame, all states other than United States have “All Ages” listed under the age group category. This would cause the age group divisions to go away and not accurately present the distribution. Therefore, I filtered it to only United States.

drugs4 <- filter(drugs, State == "United States")

Scatter plot for Death Rates According to Age Groups in 2000

“According to the latest data from the Centers for Disease Control and Prevention, individuals of the 50-and-under age group are more likely to die from an overdose than any other cause including heart disease and cancer, the leading two causes of death among the overall population.”

Source: https://www.thefreedomcenter.com/accidental-overdose-the-leading-cause-of-death-under-50/

drugs6 <- drugs4 %>%
  filter(`Age Group` != "All Ages") %>%
  # filter(Deaths >= 5000) %>% I wanted to filter this, but it would remove the 0 - 14 category, which says alot.
  filter(Year == "2000") %>%
  ggplot(aes(y=Deaths, x=`Age Group`, color = `Age Group`)) +
  ggtitle("Death Rates According to Age Groups in 2000") +
  xlab("Age groups") +
  ylab("Deaths") +
  theme_minimal(base_size = 12) + 
  geom_point() +
  scale_color_hue(name = "Age Groups", labels = c("0-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65-74", "75+"))
drugs6

Scatter plot for Death Rates According to Age Groups in 2015

drugs7 <- drugs4 %>%
  filter(`Age Group` != "All Ages") %>% # filter out all ages so it does not mess the data representation
  filter(Year == "2015") %>%
  ggplot(aes(y=Deaths, x=`Age Group`, color = `Age Group`)) +
  ggtitle("Death Rates According to Age Groups in 2015") +
  xlab("Age groups") +
  ylab("Deaths") +
  theme_minimal(base_size = 12) + 
  geom_point() +
  scale_color_hue(name = "Age Groups", labels = c("0-14", "15-24", "25-34", "35-44", "45-54", "55-64", "65-74", "75+"))
drugs7

### Units for all numbers are per 100,000 of the population

You can see the difference in the death rates over the 15 year gap.

Final graphs - Total Deaths per state over the years 1999 - 2016

drugs3 <- drugs2 %>%
  plot_ly( x = ~State, y = ~Deaths, frame = ~Year, #frame over the years so we can see the change
    type = 'bar', colors = "Blues")

drugs3 <- drugs3 %>% layout(title = "Total Deaths per state Because of drug poisoning",
         yaxis = list(title = "Deaths"),
         legend = list(title="States in the US"))

drugs3

This bar graph shows the same data as an animated gif

ggplot(drugs2, aes(x = State, y = Deaths, fill = State, frame = Year)) +
    geom_bar(stat="identity") + # enable you to plot bars that where the bar length is set by your variable mappings
    theme_bw() +
    labs(title = "Total Deaths per State Because Of Drug Poisoning", subtitle = "Year: {as.integer(frame_time)}") + 
    ylab("Deaths") +
    xlab("States In the US") +
    transition_time(Year, range = c(2000, 2016)) + # see a transition of years in animation
    ease_aes('sine-in-out') + #  control the rate of change between transition states.
    theme(axis.text.x = element_blank(), axis.ticks = element_blank()) # Remove grid lines and customize axis lines.

anim_save("Deaths per state.gif")

Conclusion

The conclusions to be made through these visualizations are quite devastating. The amount of people dying through drug poisoning should be feared, especially as the count keeps increasing. From 1999 to 2016, drug-poisoning death rates have more than tripled, as you can see in the bar graphs above. Even through the animation, the numbers keep going up as the years pass by. You would assume through all the awareness created there would atleast by a tiny bit decrease in deaths. As you can see in 2000, the highest death rate in the age group 35 - 44 is alittle bit above 6,000. However, in the visualization for 2015, it goes right above 10,000. Also, it was mentioned in a study that individuals of the 50-and-under age group are more likely to die from an overdose than any other cause. This can be proved through the dot plot as the frequency of deaths are considerably higher for ages under 50. Still, from 1999 - 2016, the death rates increased for ALL age groups, putting everyone at risk. Something surprising for me, was that the death rates for Hawaii were quite less through out the years considering I always assumed that was more of a party state. But something to keep in mind is that the population plays a roll as well in this data. Simultaneously, California has the highest population in the US, so it would be normal to assume it would be the state with the highest deaths due to drug poisoning.

I was actually quite surprised when I was able to make the visualization that shows a model over the years. I believe that is a great way to represent the entire data of the specified states over the 15 year span. I also wanted to make a map of the United States with the data of all 50 states included, however I was not able to due to the time constraint. Something else I wanted to explore was the death rates according to gender, and if it matters whether its a male or female in drug-poisoning deaths.