It covers 13 drugs across 17 age groups.
Header Definition
alcohol-use Percentage of those in an age group who used alcohol in the past 12 months
alcohol-frequency Median number of times a user in an age group used alcohol in the past 12
months
marijuana-use Percentage of those in an age group who used marijuana in the past 12 months
marijuana-frequency Median number of times a user in an age group used marijuana in the past 12 months
cocaine-use Percentage of those in an age group who used cocaine in the past 12 months
cocaine-frequency Median number of times a user in an age group used cocaine in the past 12 months
crack-use Percentage of those in an age group who used crack in the past 12 months
rack-frequency Median number of times a user in an age group used crack in the past 12 months
heroin-use Percentage of those in an age group who used heroin in the past 12 months
heroin-frequency Median number of times a user in an age group used heroin in the past 12 months
hallucinogen-use Percentage of those in an age group who used hallucinogens in the past 12 months
hallucinogen-frequency Median number of times a user in an age group used hallucinogens in the past 12 months
inhalant-use Percentage of those in an age group who used inhalants in the past 12 months
inhalant-frequency Median number of times a user in an age group used inhalants in the past 12 months
pain-releiver-use Percentage of those in an age group who used pain relievers in the past 12 months
pain-releiver-frequency Median number of times a user in an age group used pain relievers in the past 12 months
oxycontin-use Percentage of those in an age group who used oxycontin in the past 12 months
oxycontin-frequency Median number of times a user in an age group used oxycontin in the past 12 months
tranquilizer-use Percentage of those in an age group who used tranquilizer in the past 12 months
tranquilizer-frequency Median number of times a user in an age group used tranquilizer in the past 12 months
stimulant-use Percentage of those in an age group who used stimulants in the past 12 months
stimulant-frequency Median number of times a user in an age group used stimulants in the past 12 months
meth-use Percentage of those in an age group who used meth in the past 12 months
meth-frequency Median number of times a user in an age group used meth in the past 12 months
sedative-use Percentage of those in an age group who used sedatives in the past 12 months
sedative-frequency Median number of times a user in an age group used sedatives in the past 12 months
tidyverse package and load the dataset#install.packages("tidyverse")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.1.0 ✔ purrr 0.3.2
## ✔ tibble 2.1.1 ✔ dplyr 0.8.0.1
## ✔ tidyr 0.8.3 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Using readr to read csv
df_drug <- read_csv('https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv')
## Parsed with column specification:
## cols(
## .default = col_double(),
## age = col_character(),
## `cocaine-frequency` = col_character(),
## `crack-frequency` = col_character(),
## `heroin-frequency` = col_character(),
## `inhalant-frequency` = col_character(),
## `oxycontin-frequency` = col_character(),
## `meth-frequency` = col_character()
## )
## See spec(...) for full column specifications.
dplyr::glimpse#View(df_drug)
dplyr::glimpse(df_drug)
## Observations: 17
## Variables: 28
## $ age <chr> "12", "13", "14", "15", "16", "17", "1…
## $ n <dbl> 2798, 2757, 2792, 2956, 3058, 3038, 24…
## $ `alcohol-use` <dbl> 3.9, 8.5, 18.1, 29.2, 40.1, 49.3, 58.7…
## $ `alcohol-frequency` <dbl> 3, 6, 5, 6, 10, 13, 24, 36, 48, 52, 52…
## $ `marijuana-use` <dbl> 1.1, 3.4, 8.7, 14.5, 22.5, 28.0, 33.7,…
## $ `marijuana-frequency` <dbl> 4, 15, 24, 25, 30, 36, 52, 60, 60, 52,…
## $ `cocaine-use` <dbl> 0.1, 0.1, 0.1, 0.5, 1.0, 2.0, 3.2, 4.1…
## $ `cocaine-frequency` <chr> "5.0", "1.0", "5.5", "4.0", "7.0", "5.…
## $ `crack-use` <dbl> 0.0, 0.0, 0.0, 0.1, 0.0, 0.1, 0.4, 0.5…
## $ `crack-frequency` <chr> "-", "3.0", "-", "9.5", "1.0", "21.0",…
## $ `heroin-use` <dbl> 0.1, 0.0, 0.1, 0.2, 0.1, 0.1, 0.4, 0.5…
## $ `heroin-frequency` <chr> "35.5", "-", "2.0", "1.0", "66.5", "64…
## $ `hallucinogen-use` <dbl> 0.2, 0.6, 1.6, 2.1, 3.4, 4.8, 7.0, 8.6…
## $ `hallucinogen-frequency` <dbl> 52, 6, 3, 4, 3, 3, 4, 3, 2, 4, 3, 2, 3…
## $ `inhalant-use` <dbl> 1.6, 2.5, 2.6, 2.5, 3.0, 2.0, 1.8, 1.4…
## $ `inhalant-frequency` <chr> "19.0", "12.0", "5.0", "5.5", "3.0", "…
## $ `pain-releiver-use` <dbl> 2.0, 2.4, 3.9, 5.5, 6.2, 8.5, 9.2, 9.4…
## $ `pain-releiver-frequency` <dbl> 36, 14, 12, 10, 7, 9, 12, 12, 10, 15, …
## $ `oxycontin-use` <dbl> 0.1, 0.1, 0.4, 0.8, 1.1, 1.4, 1.7, 1.5…
## $ `oxycontin-frequency` <chr> "24.5", "41.0", "4.5", "3.0", "4.0", "…
## $ `tranquilizer-use` <dbl> 0.2, 0.3, 0.9, 2.0, 2.4, 3.5, 4.9, 4.2…
## $ `tranquilizer-frequency` <dbl> 52.0, 25.5, 5.0, 4.5, 11.0, 7.0, 12.0,…
## $ `stimulant-use` <dbl> 0.2, 0.3, 0.8, 1.5, 1.8, 2.8, 3.0, 3.3…
## $ `stimulant-frequency` <dbl> 2.0, 4.0, 12.0, 6.0, 9.5, 9.0, 8.0, 6.…
## $ `meth-use` <dbl> 0.0, 0.1, 0.1, 0.3, 0.3, 0.6, 0.5, 0.4…
## $ `meth-frequency` <chr> "-", "5.0", "24.0", "10.5", "36.0", "4…
## $ `sedative-use` <dbl> 0.2, 0.1, 0.2, 0.4, 0.2, 0.5, 0.4, 0.3…
## $ `sedative-frequency` <dbl> 13.0, 19.0, 16.5, 30.0, 3.0, 6.5, 10.0…
dplyr package, select the columns which ends with usedrug_use <- df_drug %>%
select(age,n,ends_with("use"))
drug-usehead(drug_use)
## # A tibble: 6 x 15
## age n `alcohol-use` `marijuana-use` `cocaine-use` `crack-use`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 12 2798 3.9 1.1 0.1 0
## 2 13 2757 8.5 3.4 0.1 0
## 3 14 2792 18.1 8.7 0.1 0
## 4 15 2956 29.2 14.5 0.5 0.1
## 5 16 3058 40.1 22.5 1 0
## 6 17 3038 49.3 28 2 0.1
## # … with 9 more variables: `heroin-use` <dbl>, `hallucinogen-use` <dbl>,
## # `inhalant-use` <dbl>, `pain-releiver-use` <dbl>,
## # `oxycontin-use` <dbl>, `tranquilizer-use` <dbl>,
## # `stimulant-use` <dbl>, `meth-use` <dbl>, `sedative-use` <dbl>
drugUse_name#drug_use
drug_use<-drug_use%>%
gather(-age,-n,key = "drugUse_name",value = "drugUse",`alcohol-use`,
`marijuana-use`,
`cocaine-use`,
`crack-use`,
`heroin-use`,
`hallucinogen-use`,
`inhalant-use`,
`pain-releiver-use`,
`oxycontin-use`,
`tranquilizer-use`,
`stimulant-use`,
`meth-use`,
`sedative-use`
)
dplyr package, select the columns which ends with frequencydrug_freq <- df_drug %>%
select(age,n,ends_with("frequency"))
head(drug_freq)
## # A tibble: 6 x 15
## age n `alcohol-freque… `marijuana-freq… `cocaine-freque…
## <chr> <dbl> <dbl> <dbl> <chr>
## 1 12 2798 3 4 5.0
## 2 13 2757 6 15 1.0
## 3 14 2792 5 24 5.5
## 4 15 2956 6 25 4.0
## 5 16 3058 10 30 7.0
## 6 17 3038 13 36 5.0
## # … with 10 more variables: `crack-frequency` <chr>,
## # `heroin-frequency` <chr>, `hallucinogen-frequency` <dbl>,
## # `inhalant-frequency` <chr>, `pain-releiver-frequency` <dbl>,
## # `oxycontin-frequency` <chr>, `tranquilizer-frequency` <dbl>,
## # `stimulant-frequency` <dbl>, `meth-frequency` <chr>,
## # `sedative-frequency` <dbl>
drugFreq_namedrug_freq<-drug_freq%>%
gather(-age,-n,key = "drugFreq_name",value = "drugFreq",`alcohol-frequency`,
`marijuana-frequency`,
`cocaine-frequency`,
`crack-frequency`,
`heroin-frequency`,
`hallucinogen-frequency`,
`inhalant-frequency`,
`pain-releiver-frequency`,
`oxycontin-frequency`,
`tranquilizer-frequency`,
`stimulant-frequency`,
`meth-frequency`,
`sedative-frequency`
)
drug_use and drug_freq in a single dataframe as tidy_drug_data using full_join() as a function provided by the dplyr packagetidy_drug_data <- full_join(drug_use,drug_freq,by=c("age","n"))
head(tidy_drug_data)
## # A tibble: 6 x 6
## age n drugUse_name drugUse drugFreq_name drugFreq
## <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 12 2798 alcohol-use 3.9 alcohol-frequency 3
## 2 12 2798 alcohol-use 3.9 marijuana-frequency 4
## 3 12 2798 alcohol-use 3.9 cocaine-frequency 5.0
## 4 12 2798 alcohol-use 3.9 crack-frequency -
## 5 12 2798 alcohol-use 3.9 heroin-frequency 35.5
## 6 12 2798 alcohol-use 3.9 hallucinogen-frequency 52
ggplot() along with facet_wrap to individually plot the variation of drugs with age.drugUse_plot <- ggplot(tidy_drug_data,aes(x = age, y = drugUse,color=drugUse_name)) +
geom_point() +
facet_wrap(~ drugUse_name, nrow = 5) +
geom_smooth(color = "black")
drugUse_plot
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
By plotting the graphs individually for each drug, it gives us a clear picture of which drug usage is higher at particular ages compared to other drugs.
n is the number of people survyed for a particular drug.ggplot(data = tidy_drug_data,
mapping = aes(x = age, y = drugUse)) +
geom_point(aes(fill = drugUse_name, size = n), shape = 21, color = "white") +
geom_smooth(aes(x = age, y = drugUse)) +
labs(
x = "Age",
y = "Drug use rate",
title = "drug use Data",
subtitle = "ages with drug use rate",
caption = "Source: ggplot2 package") +
scale_color_brewer(palette = "Set1") +
scale_size(range = c(0, 12)) +
guides(size = guide_legend(override.aes = list(col = "black")),
fill = guide_legend(override.aes = list(size = 5))) +
theme_bw()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
filter() function of dplyr package of tidyverse to group drugs whose usage is greater than 60%head(tidy_drug_data %>%
filter(`drugUse` > 60 ))
## # A tibble: 6 x 6
## age n drugUse_name drugUse drugFreq_name drugFreq
## <chr> <dbl> <chr> <dbl> <chr> <chr>
## 1 19 2223 alcohol-use 64.6 alcohol-frequency 36
## 2 19 2223 alcohol-use 64.6 marijuana-frequency 60
## 3 19 2223 alcohol-use 64.6 cocaine-frequency 5.5
## 4 19 2223 alcohol-use 64.6 crack-frequency 2.0
## 5 19 2223 alcohol-use 64.6 heroin-frequency 180.0
## 6 19 2223 alcohol-use 64.6 hallucinogen-frequency 3
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
As we can see, from the graphs and also from filtering data :
Alcohol is the most abused drug among the age group of 22-23.
More than 80% Percentage of those in an age group of 22-23 who used alcohol in the past 12 months.