SYNOPSIS

As we know that the number of drug related deaths are increasing day by day.So let’s explore the dataset to know about the number of drug related death cases,the gender mostly involved in these cases, the age group of the persons involved and the top 10 drugs associated with deaths.

Data Processing:

In data processing we load the file that contains the data for our analysis.

readData <- read.csv("Accidental_Drug_Related_Deaths__2012-2017.csv",header = TRUE,stringsAsFactors = FALSE)

Let’s dive into our data.

1. Head function is used to see the first 6 records of the data.

2. Tail function is used to see the last 6 records of the data.

head(readData)
##   CaseNumber       Date    Sex  Race Age Residence.City Residence.State
## 1   13-16336  11/9/2013 Female White  53         GROTON                
## 2   12-18447 12/29/2012   Male White  30        WOLCOTT                
## 3    14-2758  2/18/2014   Male White  43        ENFIELD                
## 4   14-13497   9/7/2014 Female White  24    WALLINGFORD                
## 5   13-14421  10/4/2013 Female White  26     WEST HAVEN                
## 6   13-18018 12/10/2013 Female White  45     RIDGEFIELD                
##   Residence.County  Death.City Death.State Death.County  Location
## 1       NEW LONDON      GROTON               NEW LONDON Residence
## 2        NEW HAVEN   WATERBURY                NEW HAVEN  Hospital
## 3                      ENFIELD                          Residence
## 4                  WALLINGFORD                NEW HAVEN Residence
## 5        NEW HAVEN  WEST HAVEN                NEW HAVEN     Other
## 6        FAIRFIELD     DANBURY                FAIRFIELD  Hospital
##   DescriptionofInjury          InjuryPlace
## 1                                         
## 2                                Residence
## 3                                Residence
## 4                                Residence
## 5                     Residential Building
## 6                                Residence
##                                                                                                     ImmediateCauseA
## 1 Acute Combined Oxycodone, Oxymorphone, Carisoprodol,, Meprobamate, Diazepam, Nordiazepam and Trazodone Toxicities
## 2                                                                                    Cocaine and Oxycodone Toxicity
## 3                               Acute Intoxication due to the Combined Effects of Alcohol, Lorazepam and Citalopram
## 4                                                                                  Heroin and Fentanyl Intoxication
## 5                                                                                             Acute Heroin Toxicity
## 6                  Acute Intoxication due to the Combined Effects of Ethanol, Clonazepam, Alprazolam, and Trazodone
##   Heroin Cocaine Fentanyl Oxycodone Oxymorphone EtOH Hydrocodone
## 1                                 Y           Y                 
## 2              Y                  Y                             
## 3                                                  Y            
## 4      Y                Y                                       
## 5      Y                                                        
## 6                                                  Y            
##   Benzodiazepine Methadone Amphet Tramad Morphine..not.heroin. Other
## 1              Y                                                    
## 2                                                                   
## 3              Y                                                    
## 4                                                                   
## 5                                                                   
## 6              Y                                                    
##   Any.Opioid MannerofDeath AmendedMannerofDeath
## 1                 Accident                     
## 2                 Accident                     
## 3                 Accident                     
## 4                 Accident                     
## 5                 Accident                     
## 6                 Accident                     
##                                   DeathLoc
## 1       GROTON, CT\n(41.343693, -72.07877)
## 2   WATERBURY, CT\n(41.554261, -73.043069)
## 3     ENFIELD, CT\n(41.976501, -72.591985)
## 4 WALLINGFORD, CT\n(41.454408, -72.818414)
## 5  WEST HAVEN, CT\n(41.272336, -72.949817)
## 6     DANBURY, CT\n(41.393666, -73.451539)

Now after understanding our data, let’s find the questions you need to answer through the analysis of this data.

Q1: Number of drug cases yearly and the gender most involved in these cases?

Extract the data you need for the analysis.

i) We can select the columns we want to use from the dataset.

ii) You can view the column number with the function “names”.

names(readData)
##  [1] "CaseNumber"            "Date"                 
##  [3] "Sex"                   "Race"                 
##  [5] "Age"                   "Residence.City"       
##  [7] "Residence.State"       "Residence.County"     
##  [9] "Death.City"            "Death.State"          
## [11] "Death.County"          "Location"             
## [13] "DescriptionofInjury"   "InjuryPlace"          
## [15] "ImmediateCauseA"       "Heroin"               
## [17] "Cocaine"               "Fentanyl"             
## [19] "Oxycodone"             "Oxymorphone"          
## [21] "EtOH"                  "Hydrocodone"          
## [23] "Benzodiazepine"        "Methadone"            
## [25] "Amphet"                "Tramad"               
## [27] "Morphine..not.heroin." "Other"                
## [29] "Any.Opioid"            "MannerofDeath"        
## [31] "AmendedMannerofDeath"  "DeathLoc"
extractData <- readData[,c(1,2,3)]
head(extractData)
##   CaseNumber       Date    Sex
## 1   13-16336  11/9/2013 Female
## 2   12-18447 12/29/2012   Male
## 3    14-2758  2/18/2014   Male
## 4   14-13497   9/7/2014 Female
## 5   13-14421  10/4/2013 Female
## 6   13-18018 12/10/2013 Female

As I want to analyse the data yearly, so I will extract the year from the date.

i) You can use the strptime function to change the format of the date and then you can extract the year/month/day from the date.

extractData$year <- format(strptime(extractData$Date, format="%m/%d/%Y"),"%Y")

Let’s calculate the yearly count of the Number of Drug cases.

i) “Count” function comes in the “dplyr” package.

ii)“subset” function is used to extract the data based on certain conditions.

library(dplyr)
drugCount<- count(extractData1, year)
subsetDrugCount <-subset(drugCount,drugCount$year != "NA")
head(subsetDrugCount)
## # A tibble: 6 x 2
##   year      n
##   <chr> <int>
## 1 2012    355
## 2 2013    490
## 3 2014    558
## 4 2015    723
## 5 2016    917
## 6 2017   1036

Now Create graph showing Number of Drug Deaths per year.

i) “ggplot” function comes in “ggplot2” package to create beautiful visualizations.

ii) You can use various attributes within the ggplot function to change the graphics or look of the plot. For Example: fill, theme, color etc.

library(ggplot2)
p <- ggplot(data = subsetDrugCount,aes(x = year,y = n,width = 0.4) ) +
  geom_bar(stat = "identity",fill = "steelblue")+ 
  theme_minimal()+ylab("Number of Drug Cases)")+
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "steelblue")) +
  theme(axis.text.x = element_text(angle = 60, hjust = 1))  

Let’s also get an idea about the gender most involved in Drug Cases.

i) “group_by” function comes in “dplyr” package. The “group_by()”" takes an existing tbl and converts it into a grouped tbl where operations are performed “by group”. ungroup() removes grouping.

ii) “geom_line” is used to create line chart.

p + geom_line(data=femaleData, aes(x=year, y=CaseNumber,color = Sex),group = 1,size = 1,linetype = "dashed")+
  geom_line(data=maleData, aes(x=year, y=CaseNumber,color = Sex),group = 1, size = 1,linetype = "dashed") +
  theme(legend.position="top") 

Q2 : Which Age Group is most involved in drug deaths?

ageData <- readData[,c(1,2,5)]
head(ageData)
##   CaseNumber       Date Age
## 1   13-16336  11/9/2013  53
## 2   12-18447 12/29/2012  30
## 3    14-2758  2/18/2014  43
## 4   14-13497   9/7/2014  24
## 5   13-14421  10/4/2013  26
## 6   13-18018 12/10/2013  45

Creating age bucket

setDT(ageData)[Age <= 29, agegroup := "young"]
ageData[Age >= 30 & Age <= 50,agegroup:= "Middle-Aged"]
ageData[Age >= 51 & Age <= 99,agegroup:= "Old" ]

ageData1 <- extractData1[,c(1,3)]
ageData1$agegroup <- ageData$agegroup

ageGroup <- ageData1 %>% group_by(agegroup,year) %>% summarize(CaseNumber=n())

subsetAge <- subset(ageGroup,ageGroup$year != "NA" & ageGroup$agegroup != "NA")

Let’s create a Stacked barplot.

ggplot(data=subsetAge, aes(x=year, y=CaseNumber, fill=agegroup,width = 0.5)) +
  geom_bar(stat="identity") + theme_minimal()+ylab("Number of Drug Cases")+
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "steelblue"))+
  theme(legend.position = "top")

Result : As we can see from the plot that the middle aged man are more involved in drugs.

Q3. What are the top 10 Drugs associated with the deaths?

library(tidyr)

drugExtract <- readData[,c(1,2,16,17,18,19,20,21,22,23,24,25,26,27,28,29)]
drugExtract$year <- format(strptime(drugExtract$Date, format="%m/%d/%Y"),"%Y")
drugData<- gather(drugExtract, Drugs, Yes, Heroin:Any.Opioid, factor_key=TRUE)

subData <- subset(drugData,drugData$Yes == "Y")

finalDrugData <- subData %>% group_by(year,Drugs) %>% summarize(Yes = n())

finalDrugData$year <- factor(finalDrugData$year)

finalDrugData <- subset(finalDrugData,finalDrugData$year != "NA")

Now let’s create a chart

Text theme for every plot in this project

text_theme <- theme(text = element_text(size = 10, 
                                        family = "Verdana", 
                                        face = "italic"),
                    plot.title = element_text(hjust = 0.5))

Let’s find the top 10 drugs contributing in deaths.

sumDrugData <- aggregate(finalDrugData$Yes,by=list(Drugs = finalDrugData$Drugs),FUN = sum)

topData <- head(arrange(sumDrugData,desc(x)),n=10)

colourCount = length(unique(topData$Drugs))
getPalette = colorRampPalette(brewer.pal(9, "Set1"))

topData$Drugs <- factor(topData$Drugs,levels = topData$Drugs[order(topData$x)])

ggplot(topData, aes(x = Drugs,y = x)) +
  geom_bar(aes(fill = Drugs), 
           alpha = 0.9,stat = "identity") +
  coord_flip()+  theme_minimal()+
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "steelblue")) + text_theme +
  scale_fill_manual(values = colorRampPalette(brewer.pal(8, "Spectral"))(colourCount)) + labs(title = "Death causes per year", 
                                                                                              x = "Drugs", y = "Number of deaths")

Result : As we can see from the plot the top ten drugs associated with deaths, Heroine being the most involved.

Let’s create a line chart

ggplot() +
  geom_line(data = subsetRaceData, aes(x = as.numeric(year), y = Count, color = Race), size = 0.8)+
  xlab("year") +
  ylab("Number of cases") +
  theme_minimal() +
  theme(panel.border = element_blank(), panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.line = element_line(colour = "steelblue")) 

Result : As we can see from the line chart that the White are the most involved in drugs.