Project

This project explores the topic of drug related deaths in the state of Connecticut. I decided to explore this data for a multitude of reasons. After completion of his masters in Seattle my father moved my family to Hartford, Connecticut where we lived for one year. I wanted to explore the conditions and trends of a state I previously lived in. Also, Connecticut’s state ranking prompted me to select the state for further analysis. According to the CDC, Connecticut ranks 13th in the United States for drug-related deaths. Despite being the 6th wealthiest state in the US, Connecticut has a large number of drug-related deaths. This was interesting to me because a state that wealthy should have the funding for public health programs to deter events such as this. However, according to NCDAS Connecticut’s OD death rate is 67.63% above the national average. This suggests that there is a need to better understand the variables that make up these larger statistics and explore trends (ex: race, gender, county, etc)

The data set I am using to explore this is the “Accidental Drug Related Deaths 2012-2022” data set made available through Data.Gov. The variables in here include: Date of Death, Type of Date (ex Date of Death vs Date reported), Age, Sex, Race, Residence City, Residence State, Residence County, Residence City, Cause of Death, Whether or not the drug was an opiate, Type of Drug, Place of Initial Injury, Description of Injury, City of Death, Death location, Death State, and Manner of Death. Most of these variables are qualitative such as a location or sex. The only quantitative variable would be age. This data did not require a lot of cleaning. In the race column I used the dplyr command ‘mutate’ to modify variables here. In the set under the race column there was “Black” or “Black and African American” so I wanted to combine these two.

# load the needed libraries
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.3     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(leaflet)

setwd("/Users/blossomanyanwu/Documents/Data 110 (Fall 2023)")
read_csv("Accidental_Drug_Related_Deaths_2012-2022.csv")

Rows: 10654 Columns: 48
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (47): Date, Date Type, Sex, Race, Ethnicity, Residence City, Residence C...
dbl  (1): Age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# A tibble: 10,654 × 48
   Date       `Date Type`     Age Sex    Race  Ethnicity `Residence City`
   <chr>      <chr>         <dbl> <chr>  <chr> <chr>     <chr>           
 1 05/29/2012 Date of death    37 Male   Black <NA>      STAMFORD        
 2 06/27/2012 Date of death    37 Male   White <NA>      NORWICH         
 3 03/24/2014 Date of death    28 Male   White <NA>      HEBRON          
 4 12/31/2014 Date of death    26 Female White <NA>      BALTIC          
 5 01/16/2016 Date of death    41 Male   White <NA>      SHELTON         
 6 06/13/2017 Date reported    57 Male   White <NA>      BLANDFORD       
 7 10/20/2015 Date reported    26 Male   White <NA>      DANBURY         
 8 02/02/2017 Date reported    64 Male   White <NA>      MILFORD         
 9 07/03/2018 Date of death    33 Male   <NA>  <NA>      <NA>            
10 05/08/2013 Date of death    23 Male   White <NA>      BETHEL          
# ℹ 10,644 more rows
# ℹ 41 more variables: `Residence County` <chr>, `Residence State` <chr>,
#   `Injury City` <chr>, `Injury County` <chr>, `Injury State` <chr>,
#   `Injury Place` <chr>, `Description of Injury` <chr>, `Death City` <chr>,
#   `Death County` <chr>, `Death State` <chr>, Location <chr>,
#   `Location if Other` <chr>, `Cause of Death` <chr>, `Manner of Death` <chr>,
#   `Other Significant Conditions` <chr>, Heroin <chr>, …

drug_deaths <- read_csv("Accidental_Drug_Related_Deaths_2012-2022.csv")

Rows: 10654 Columns: 48
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (47): Date, Date Type, Sex, Race, Ethnicity, Residence City, Residence C...
dbl  (1): Age

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Simple plot explorations

# Combine "Black" and "Black or African American"
drugs_combined <- drug_deaths %>%
  mutate(Race = fct_recode(as.factor(Race), "Black" = "Black or African American"))
# Combine "Native American, Other" and "American Indian or Alaska Native"
drugs_combined <- drugs_combined %>%
  mutate(Race = fct_recode(as.factor(Race), "Native American" = "American Indian or Alaska Native", "Native American" = "Native American, Other"))
# Simple Preliminary Visualization
ggplot(drugs_combined, aes(x = Race)) +
  geom_bar(stat = "count") +
  labs(title = "Drug Deaths by Race in Conneticut (2012-2020)", x = "Race", y = "Count")

theme (axis.text = element_text(angle = 90, size = 5))

List of 1
 $ axis.text:List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : num 5
  ..$ hjust        : NULL
  ..$ vjust        : NULL
  ..$ angle        : num 90
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

Statistical Analysis: Histogram and Box-plot

ggplot(drugs_combined, aes(x = Sex, y = Age, fill = Sex)) +
  geom_boxplot() +
  labs(title = "Spread of Overdoses by Age and Sex", x = "Sex", y = "Age") +
  theme_minimal()

Warning: Removed 2 rows containing non-finite values (`stat_boxplot()`).

drugs_combined |>
  drop_na(Sex) |>
  ggplot(aes(x = Sex, y = Age, fill = Sex)) +
  geom_boxplot() +
  labs(title = "Spread of Drug Deaths by Age and Sex", x = "Sex", y = "Age") +
  theme_minimal() +
  scale_y_continuous(breaks = seq(0, 100, by = 10))  # Adjust the breaks as needed

Warning: Removed 1 rows containing non-finite values (`stat_boxplot()`).

ggplot(drugs_combined, aes(x = Age, fill = "count")) +
  geom_histogram(binwidth = 10, color = "black", alpha = 0.9) +
  labs(title = "Histogram of Age", x = "Age", y = "Frequency") +
  scale_x_continuous(breaks = seq(0, 100, by = 10), labels = c("0-9", "10-19", "20-29", "30-39", "40-49", "50-59", "60-69", "70-79", "80-89", "90-99", "100")) +
  theme_minimal()

Warning: Removed 2 rows containing non-finite values (`stat_bin()`).

Explanation of Visualization: For my statistical visualizations I created a box-plot and a histogram. The purpose of these visualizations is to see the spread in the data set. The box plot has two distinct plots, one for the age distributions of each sex. In the initial visualiztion there was a N/A section for 8 data points for which there was no assigned sex. I decided to remove these and create a second visualization. Because it was 8 observations and since this data set was so large I do not believe this took away from the integrity of my exploration. The second statistical visualization I created was a histogram, it displayed the spread of age in drug deaths in the absence of sex. Comined these graphs show that the spread in age is very similar for each sex. The histogram shows that the highest ammount of drug deaths are within the age range of 40-49 and 50-59, while surprisingly they are lower in the age range of 20-29.

Primary Visualizations. Based on what I discovered I wanted to explore cause of deaths among those passed away. I created an interactive bar chart.

mainvis <- ggplot(drugs_combined, aes(x = `Injury County`)) +
  geom_bar(fill = "orange", color = "black") +
  labs(title = "County Injury Rates",
       x = "County of Injury",
       y = "Count") +
  theme_minimal()
theme(axis.text.x = element_text(angle = 45, hjust = 1))

List of 1
 $ axis.text.x:List of 11
  ..$ family       : NULL
  ..$ face         : NULL
  ..$ colour       : NULL
  ..$ size         : NULL
  ..$ hjust        : num 1
  ..$ vjust        : NULL
  ..$ angle        : num 45
  ..$ lineheight   : NULL
  ..$ margin       : NULL
  ..$ debug        : NULL
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_text" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

print(mainvis)

After seeing that there was not much spread in gender and age when it came to drug deaths in Connecticut I decided to explore location to see if there was a concentration in drug deaths in certain areas.

Essay The issue of drug-related deaths has become a pressing issue in the state of Connecticut, mirroring the nationwide increase in substance abuse and drug related deaths. This project aimed to explore the variables and gain a deeper understanding. As mentioned before according to the Chamber of Commerce, Connecticut ranks as one of the wealthiest states in the US. Despite this, the state also ranks high in terms of drug abuse and drug related deaths. This was very perplexing because a state with a large economy should have the means to invest in public health facilities and programs that can fight this issue.

As a result of this, I aimed to explore factors like location to understand this issue. When I initially used Tableau and developed a map I noticed a substantial portion of the drug deaths were from people out of state. A decent amount of the people dying were from up and down the east coast, highly concentrated in Florida. And of the out of state residents passing away in Florida a lot of the incidents occurred in Hotels/Motels. This does not negate the fact that a substantial portion of drug deaths are still experienced by CT citizens. However, through Tableau I was able to conclude they were happening in lower income areas where the budgets are not as high. My final visualization involved looking at the counts for each of the counties.

Another interesting trend was also found, drug deaths involving opiates have been on the incline over the past years. With this data in mind a lot of legislative responses could take place such as increasing funding in certain areas as well as outreach programs and public health initiatives that target older populations who seem to be the ones suffering the most in this epidemic.

Simple plot explorations

Statistical Analysis: Histogram and Box-plot

Primary Visualizations. Based on what I discovered I wanted to explore cause of deaths among those passed away. I created an interactive bar chart.

After seeing that there was not much spread in gender and age when it came to drug deaths in Connecticut I decided to explore location to see if there was a concentration in drug deaths in certain areas.

“How Rich Is Each US State? | Chamber of Commerce.” Chamber of Commerce,www.chamberofcommerce.org/how-rich-is-each-us-state/.