library(tidyverse)
library(reshape)For this assignment I used the dataset contained in the article “How Baby Boomers get high”. The article link is here link The artcile analyzes drug usage (including alcohol) by different age-groups. The article focuses on baby-boomers defined in this article as people ages 50-64. General findings were that boomers consume drugs in lower percentages than younger generations, but they were consuming it at higher rates than their parents.
Just for fun for this excercise I will aim at comparing my age group (50-64) to my daughter’s age group (21 years old). Let’s see what I find.
Load the data directly from site. In this case we will use the drug use by age-group.
drug_use <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv")## Rows: 17 Columns: 28
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (7): age, cocaine-frequency, crack-frequency, heroin-frequency, inhalan...
## dbl (21): n, alcohol-use, alcohol-frequency, marijuana-use, marijuana-freque...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Now we will remove the columns we don’t need. We will focus of usage numbers
drug_use <- drug_use %>% select(-contains("frequency"))Now we will focus in my age group 50-64
drug_use <- drug_use %>% filter(age=="50-64" | age=="21")Lets remove some columns we don’t need
drug_use <- drug_use %>% select(-c(n))Lets transpose for ease of view and save it to another data frame
drug_use2 <- as_tibble(cbind(drug = names(drug_use), t(drug_use)))## Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
## Using compatibility `.name_repair`.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
Remove first row (age)
drug_use2 <- filter(drug_use2,drug != 'age')Convert to numeric
drug_use2$V2 <- as.numeric(drug_use2$V2)
drug_use2$V3 <- as.numeric(drug_use2$V3)Lets sort it
drug_use2 <- arrange(drug_use2,desc(V2)) %>%
dplyr::rename(Group_21_YO = V2, Group_50_64_YO =V3)
drug_use2## # A tibble: 13 x 3
## drug Group_21_YO Group_50_64_YO
## <chr> <dbl> <dbl>
## 1 alcohol-use 83.2 67.2
## 2 marijuana-use 33 7.3
## 3 pain-releiver-use 9 2.5
## 4 hallucinogen-use 6.3 0.3
## 5 cocaine-use 4.8 0.9
## 6 stimulant-use 4.1 0.3
## 7 tranquilizer-use 3.9 1.4
## 8 inhalant-use 1.4 0.2
## 9 oxycontin-use 1.3 0.4
## 10 heroin-use 0.6 0.1
## 11 meth-use 0.6 0.2
## 12 crack-use 0.5 0.4
## 13 sedative-use 0.3 0.2
Graph them, but first convert to long format from wide format
data_long <- gather(drug_use2, age_group, drug_use_pct, Group_21_YO:Group_50_64_YO, factor_key=TRUE)
ggplot(data_long, # the data that I am using
aes(x = drug, # 'aesthetic' includes x
y = drug_use_pct, fill=age_group)) + # and y
geom_bar(position="dodge", stat = "identity") + # use ACTUAL y for bar height
coord_flip()I would recommend repeat the study since it has been 10 year since this study came out. One of findings was the trend of baby boomers cosuming more drugs than their parents. After 10 yeasr we could repeat and see if this new cohort of 50-64 has continued the increasing trend or not.
It was evident my age group consume much less drugs than 21 yo. In fact besides alcohol and marijuana, my age group has little consumption of other drugs vs 21 yo. The article mentioned that consumption in younger people is more about getting high vs older people who if they use it is to cope with stress and other underlying issues.
…