In this post, I will be discussing converting the dataset to wide and long forms. Dataset I will be using is from Connecticut Accidental Drug Related Deaths 2012-2017. Dataset has 4066 individual cases.

Let’s load dataset.

library(tidyverse)
library(knitr)
library(kableExtra)
library(reshape2)

CTDrug.data.df <- as.data.frame(read.csv("https://raw.githubusercontent.com/akulapa/Data608-FinalProject/master/CT_Drug.csv", header=T, stringsAsFactors = F, na.strings=c("","NA")))

CTDrug.df <- CTDrug.data.df %>% select(-Age,-AgeClass,-zip,-Longitude,-Latitude,-DeathLoc,-DeathState)
CTDrug.df$Id <- seq(1:nrow(CTDrug.df))

 CTDrug.df %>% head(20) %>% 
  kable("html",caption = "Sample Data") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "200px")
Sample Data
Year Sex Race Heroin Cocaine Fentanyl Oxycodone Oxymorphone EtOH Hydrocodone Benzodiazepine Amphet Tramad Morphine Other Opioid AgeGroup Id
2013 Female White N N N Y Y N N Y N N N N N 45-54 1
2012 Male White N Y N Y N N N N N N N N N 25-34 2
2014 Male White N N N N N Y N Y N N N N N 35-44 3
2014 Female White Y N Y N N N N N N N N N N 15-24 4
2013 Female White Y N N N N N N N N N N N N 25-34 5
2013 Female White N N N N N Y N Y N N N N N 45-54 6
2013 Male White Y N N N N Y N Y N N N N N 55-64 7
2013 Male Hispanic Y Y N N N N N N N N N N N 45-54 8
2014 Female White N Y N N N N N N N N N N N 45-54 9
2013 Male White Y N Y N N N N N N N N N N 25-34 10
2012 Male White Y N N N N N N Y N N N N N 15-24 11
2014 Male White Y N N N N Y N N N N N N N 25-34 12
2015 Male White Y Y N N N N N N N N N N Y 25-34 13
2013 Male Hispanic Y Y N N N N N N N N N N N 35-44 14
2015 Male White Y Y N N N Y N N N N N N Y 45-54 15
2013 Male Other Y N N N N N N N N N N N N 55-64 16
2015 Male White N N Y N N N N N N N N N Y 15-24 17
2014 Male White Y N Y N N N N N N N N N N 45-54 18
2015 Male Hispanic Y N Y N N Y N N N N N N Y 35-44 19
2015 Male White N Y N N N N N N N N N N N 45-54 20

The dataset contains four different race values. Let’s convert row value of Race to four individual columns.

unique(CTDrug.df$Race)
## [1] "White"    "Hispanic" "Other"    "Black"
#Convert Race to Columns
CTDrug.wide.df <- CTDrug.df %>% 
  gather(Race,name,starts_with('Race')) %>% 
  mutate(Race = name) %>%
  mutate(present = 1) %>% 
  select(-name) %>% 
  spread(Race,present,fill = 0)

 CTDrug.wide.df %>% head(20) %>% 
  kable("html",caption = "Sample Data") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "200px")
Sample Data
Year Sex Heroin Cocaine Fentanyl Oxycodone Oxymorphone EtOH Hydrocodone Benzodiazepine Amphet Tramad Morphine Opioid AgeGroup Id Black Hispanic Other White
2013 Female N N N Y Y N N Y N N N N 45-54 1 0 0 0 1
2012 Male N Y N Y N N N N N N N N 25-34 2 0 0 0 1
2014 Male N N N N N Y N Y N N N N 35-44 3 0 0 0 1
2014 Female Y N Y N N N N N N N N N 15-24 4 0 0 0 1
2013 Female Y N N N N N N N N N N N 25-34 5 0 0 0 1
2013 Female N N N N N Y N Y N N N N 45-54 6 0 0 0 1
2013 Male Y N N N N Y N Y N N N N 55-64 7 0 0 0 1
2013 Male Y Y N N N N N N N N N N 45-54 8 0 1 0 0
2014 Female N Y N N N N N N N N N N 45-54 9 0 0 0 1
2013 Male Y N Y N N N N N N N N N 25-34 10 0 0 0 1
2012 Male Y N N N N N N Y N N N N 15-24 11 0 0 0 1
2014 Male Y N N N N Y N N N N N N 25-34 12 0 0 0 1
2015 Male Y Y N N N N N N N N N Y 25-34 13 0 0 0 1
2013 Male Y Y N N N N N N N N N N 35-44 14 0 1 0 0
2015 Male Y Y N N N Y N N N N N Y 45-54 15 0 0 0 1
2013 Male Y N N N N N N N N N N N 55-64 16 0 0 1 0
2015 Male N N Y N N N N N N N N Y 15-24 17 0 0 0 1
2014 Male Y N Y N N N N N N N N N 45-54 18 0 0 0 1
2015 Male Y N Y N N Y N N N N N Y 35-44 19 0 1 0 0
2015 Male N Y N N N N N N N N N N 45-54 20 0 0 0 1

The data format is good if we are studying an individual case. In case, you want to generate drug summary it would be better if we had data in long form. Converting drug data into a long form.

Drug.df <- CTDrug.df %>% 
  melt(id.vars = c('Id','Year','Sex','Race','AgeGroup')) %>% 
  rename(Drug=variable,Observed=value) %>% 
  arrange(Id)

 Drug.df %>% head(20) %>% 
  kable("html",caption = "Sample Data") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "200px")
Sample Data
Id Year Sex Race AgeGroup Drug Observed
1 2013 Female White 45-54 Heroin N
1 2013 Female White 45-54 Cocaine N
1 2013 Female White 45-54 Fentanyl N
1 2013 Female White 45-54 Oxycodone Y
1 2013 Female White 45-54 Oxymorphone Y
1 2013 Female White 45-54 EtOH N
1 2013 Female White 45-54 Hydrocodone N
1 2013 Female White 45-54 Benzodiazepine Y
1 2013 Female White 45-54 Amphet N
1 2013 Female White 45-54 Tramad N
1 2013 Female White 45-54 Morphine N
1 2013 Female White 45-54 Other N
1 2013 Female White 45-54 Opioid N
2 2012 Male White 25-34 Heroin N
2 2012 Male White 25-34 Cocaine Y
2 2012 Male White 25-34 Fentanyl N
2 2012 Male White 25-34 Oxycodone Y
2 2012 Male White 25-34 Oxymorphone N
2 2012 Male White 25-34 EtOH N
2 2012 Male White 25-34 Hydrocodone N

To conclude, data in long form makes it easy to generate summaries, whereas wide form is useful if we want to study individual observations.