## 1 Apply tidyverse tools (dplyr, ggplot2) to manipulate and visualize real world public data As described in the README.md file, I am interested in substance use disorders and how they differ by gender and other factors such as socioeconomic status in the community. I sought to apply dplyr and ggplot2 tools to data I downloaded from Nutritional Health and Nutrition Examination Survey (NHANES, https://wwwn.cdc.gov/nchs/nhanes/).The income:family size ratio was used as a marker of degree of poverty (higher numbers = less poverty) and the substances considered in this exploratory project include marijuana, cocaine, heroin, and methamphetamine. There are 4 cohorts from different time points (year of survey participation) included: 2009-2010, 2011-2012, 2013-2014, 2015-2016. DATA IS NOT WEIGHTED BY DEMOGRAPHICS AND THEREFORE MAY NOT ACCURATELY REFLECT US CIVILIAN POPULATION (see https://wwwn.cdc.gov/nchs/data/nhanes/analyticguidelines/11-16-analytic-guidelines.pdf). Descriptions interpreting data are directly below plots.

## 2 Packages Required

#for data wrangling; dplyr::select, mutate, select, recode
library(dplyr) 
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#for data visualization with various plots
library(ggplot2)

# 3 Dataset For the current project, I focused on adults 18+ years, from cross-sectional cohorts assessed at 4 different time points (2009-2010, 2011-2012, 2013-2014, 2015-2016). Demographic (“Demographic Variables and Sample Weights” data file) and questionnaire (“Drug Use” data file) data. These data sets were merged in Power Query (inner join), saved as a csv file and imported into RStudio.

#import dataset csv

library(readr)
use_this_July26_2023 <- read_csv("use_this_July26.2023.csv", 
    col_types = cols(Year = col_factor(levels = c("2009_2010", 
        "2011_2012", "2013_2014", "2015_2016")), 
        RIAGENDR = col_factor(levels = c("1", 
            "2")), RIDAGEYR = col_number(), 
        INDFMPIR = col_number(), nhanes_drug_use_csv.DUQ200 = col_factor(levels = c("1", 
            "2")), nhanes_drug_use_csv.DUQ210 = col_number(), 
        nhanes_drug_use_csv.DUQ213 = col_number(), 
        nhanes_drug_use_csv.DUQ230 = col_number(), 
        nhanes_drug_use_csv.DUQ240 = col_factor(levels = c("1", 
            "2")), nhanes_drug_use_csv.DUQ260 = col_number(), 
        nhanes_drug_use_csv.DUQ280 = col_number(), 
        nhanes_drug_use_csv.DUQ300 = col_number(), 
        nhanes_drug_use_csv.DUQ320 = col_number(), 
        nhanes_drug_use_csv.DUQ340 = col_number(), 
        nhanes_drug_use_csv.DUQ360 = col_number()), 
    na = "NA")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
dataset<-use_this_July26_2023

dataset
## # A tibble: 19,997 × 52
##     SEQN Year    RIAGE…¹ RIDAG…² RIDRE…³ DMDED…⁴ WTINT…⁵ WTMEC…⁶ SDMVPSU SDMVS…⁷
##    <dbl> <fct>   <fct>     <dbl>   <dbl>   <dbl> <lgl>   <lgl>     <dbl>   <dbl>
##  1 52298 2009_2… 1            18      NA      NA NA      NA            1      78
##  2 53441 2009_2… 2            18      NA      NA NA      NA            2      88
##  3 54851 2009_2… 1            18      NA      NA NA      NA            1      81
##  4 55394 2009_2… 1            18      NA      NA NA      NA            2      78
##  5 55786 2009_2… 2            18      NA      NA NA      NA            2      78
##  6 56181 2009_2… 2            18      NA      NA NA      NA            2      75
##  7 56446 2009_2… 2            18      NA      NA NA      NA            2      85
##  8 56604 2009_2… 2            18      NA      NA NA      NA            1      76
##  9 57619 2009_2… 2            18      NA      NA NA      NA            2      77
## 10 59518 2009_2… 1            18      NA      NA NA      NA            2      82
## # … with 19,987 more rows, 42 more variables: INDFMPIR <dbl>, RIDRETH1 <dbl>,
## #   DMDEDUC3 <dbl>, WTINT2YR <dbl>, WTMEC2YR <dbl>,
## #   NHANES_physical_csv.BMXWT <dbl>, NHANES_physical_csv.BMXHT <dbl>,
## #   NHANES_physical_csv.BMXBMI <dbl>, NHANES_physical_csv.BMXWAIST <dbl>,
## #   NHANES_physical_csv.BMXTRI <dbl>, nhanes_drug_use_csv.DUQ200 <fct>,
## #   nhanes_drug_use_csv.DUQ210 <dbl>, nhanes_drug_use_csv.DUQ211 <dbl>,
## #   nhanes_drug_use_csv.DUQ213 <dbl>, nhanes_drug_use_csv.DUQ215Q <dbl>, …
#name column variables for ease of use and understanding
dataset$Year -> year
dataset$RIAGENDR -> gender
dataset$RIDAGEYR -> age_years
dataset$INDFMPIR -> ratio_income_to_poverty_family
dataset$nhanes_drug_use_csv.DUQ200 -> mj_hashish_ever
dataset$nhanes_drug_use_csv.DUQ210 -> age_1st_mj_years
dataset$nhanes_drug_use_csv.DUQ213 -> age_start_reg_use_mj_years
dataset$nhanes_drug_use_csv.DUQ230 -> number_of_days_used_mj_in_last_month
dataset$nhanes_drug_use_csv.DUQ240 -> coc_her_meth_ever
dataset$nhanes_drug_use_csv.DUQ250 -> coc_ever
dataset$nhanes_drug_use_csv.DUQ260 -> age_1st_try_coc_years
dataset$nhanes_drug_use_csv.DUQ280 -> number_of_days_used_coc_in_last_month
dataset$nhanes_drug_use_csv.DUQ290 -> heroin_ever
dataset$nhanes_drug_use_csv.DUQ300 -> age_1st_her_years
dataset$nhanes_drug_use_csv.DUQ320 -> number_of_days_used_her_in_last_month
dataset$nhanes_drug_use_csv.DUQ330 -> meth_ever
dataset$nhanes_drug_use_csv.DUQ340 -> age_1st_meth_years
dataset$nhanes_drug_use_csv.DUQ360 -> number_of_days_used_meth_in_last_month
dataset$nhanes_drug_use_csv.DUQ370 -> needle_use_ever
#Housekeeping

#change factor levels to labels where appropriate

levels(dataset$RIAGENDR) <- c('male', 'female')
levels(gender) <- c('male', 'female')

# 4 Descriptive statistics and visualizations (NOT WEIGHTED BY DEMOGRAPHICS) for each drug of interest_____________________

#Marijuana________________________________________________________________________________________

##Descriptive statistics and boxplots

###Age first tried marijuana/hashish by gender (self-report, n = 19997-12583 = 7,414 participants)

dataset_by_gender <- dataset %>%
  group_by(RIAGENDR) 

dataset_by_gender %>%
  summarize(mean_age_1st_mj_use = mean(nhanes_drug_use_csv.DUQ210, na.rm = TRUE),     sd_age_1st_mj = sd(nhanes_drug_use_csv.DUQ210, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_age_1st_mj_use sd_age_1st_mj
##   <fct>                  <dbl>         <dbl>
## 1 male                    16.8          4.10
## 2 female                  17.4          4.48
ggplot(dataset, aes(x = gender, y = age_1st_mj_years, color = gender)) + 
          geom_boxplot(na.rm = TRUE)+ 
        facet_wrap(year) +
        scale_color_brewer(palette = "Set3")

Description of above: Males and females both self-reported that they first tried marijuana/hashish ~age 17.However, many people in the sample reported first trying marijuana/hashish at older ages (up to ~age 50). This pattern was consistent across time points.

###Age first regularly used marijuana/hashish by gender (self-report, n = 19997-16357 = 3640 participants)

dataset_by_gender %>%
  summarize(mean_age_reg_mj_use = mean(nhanes_drug_use_csv.DUQ213, na.rm = TRUE),     sd_age_reg_mj_use = sd(nhanes_drug_use_csv.DUQ213, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_age_reg_mj_use sd_age_reg_mj_use
##   <fct>                  <dbl>             <dbl>
## 1 male                    17.3              4.05
## 2 female                  18.4              5.49
ggplot(dataset, aes(x = gender, y = age_start_reg_use_mj_years, color = gender)) + 
  geom_boxplot(na.rm = TRUE) + 
  facet_wrap(year) + 
  scale_color_brewer(palette = "Set3")

Description of above: Males and females self-reported becoming regular users of marijuana/hashish ~17 or 18 years, soon after trying it for the first time. This pattern was consistent across time points.

###Number of days used marijuana/hashish in last month by gender (self-report, n = 19997-17836 = 2161 participants)

dataset_by_gender %>%
  summarize(mean_days_used_mj_in_last_mon = mean(nhanes_drug_use_csv.DUQ230, na.rm = TRUE),     sd_days_used_mj_in_last_mon = sd(nhanes_drug_use_csv.DUQ230, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_days_used_mj_in_last_mon sd_days_used_mj_in_last_mon
##   <fct>                            <dbl>                       <dbl>
## 1 male                              13.2                        11.4
## 2 female                            11.3                        11.4
ggplot(dataset, aes(x = gender, y = number_of_days_used_mj_in_last_month, color = gender)) + 
  geom_boxplot(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Males self-reported using marijuana/hashish slightly more past days (~13) during the last month than females (~11).This pattern was consistent across time points.

##Bar plots

###Ever tried marijuana/hashish? (self-report, yes or no, n = 14183 participants)

percent_mj_use_by_gender <- dataset %>%
  filter(complete.cases(nhanes_drug_use_csv.DUQ200)) %>%
  count(RIAGENDR, nhanes_drug_use_csv.DUQ200)%>%
  group_by(RIAGENDR)%>%
  transmute(nhanes_drug_use_csv.DUQ200, Percentage=n/sum(n)*100)

percent_mj_use_ever <- percent_mj_use_by_gender %>%
  filter(nhanes_drug_use_csv.DUQ200 == 1)

percent_mj_use_ever$RIAGENDR -> gender
percent_mj_use_ever$Percentage -> percentage_yes

ggplot(percent_mj_use_ever, aes(x = gender, y = percentage_yes, fill = gender)) +
  geom_col(na.rm = TRUE) +
  scale_fill_brewer(palette = "Set3")

Description of above: More than half of males reported ever trying marijuana/hashish, while slightly less than half of females reported ever trying marijuana/hashish.

##Histograms

###Age first tried marijuana/hashish? (self-report, n = 7414 participants)


ggplot(dataset, aes(x = age_1st_mj_years)) +
  geom_histogram(bins = 30,na.rm = TRUE, color = "darkblue", fill = "lightblue") + 
  facet_wrap(year) 

Description of above: Consistent with the box plots, most adults reported first trying marijuana/hashish in late teens/early twenties. However, the histogram is positively skewed, showing that some individuals were older when they first tried marijuana/hashish.These patterns are consistent across time points.

###Age first started regularly using marijuana/hashish? (self-report, n = 3640 participants)

ggplot(dataset, aes(x = age_start_reg_use_mj_years)) +
  geom_histogram(bins = 30,na.rm = TRUE, color = "darkblue", fill = "lightblue") + 
  facet_wrap(year)

Description of above: Similar to age at first use, most adults reported first becoming regular users of marijuana/hashish in late teens/early twenties. However, the histogram is positively skewed, showing that some individuals were older when they first became regular users.These patterns are consistent across time points.

##Number of days used marijuana/hashish in past month? (self-report, n = 2161 participants) 

ggplot(dataset, aes(x = number_of_days_used_mj_in_last_month)) +
  geom_histogram(bins = 30,na.rm = TRUE, color = "darkblue", fill = "lightblue") + 
  facet_wrap(year)

Description of above: Interestingly, current users of marijuana/hashsish reported a wide range of use during the past month, with many reporting one day of use and many reporting using every day. This pattern is consistent over each time point.

##Further exploration with scatterplots: income:family size ratio (higher is less poverty) and gender

dataset$RIAGENDR -> gender

ggplot(dataset,aes(x = ratio_income_to_poverty_family, y = age_1st_mj_years, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) + 
  scale_color_brewer(palette = "Set3")

Description of above: Regardless of income:family size ratio (where higher values = less poverty), men and women tended to try marijuana/hashish at late teens/early twenties. This pattern was consistent over time points.

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = number_of_days_used_mj_in_last_month, color = gender) ) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Unlike first age at trying marijuana/hashish, current self-reported lower income:family size ratio (where higher values = less poverty) tended to relate to a higher number of days of use over the past month in men and women. This pattern was consistent over time points. \(~\) #Cocaine__________________________________________________________________________________________

##Descriptive statistics and boxplots

###Age first tried cocaine by gender (self-report, n = 19997-17698 = 2299 participants)

dataset_by_gender <- dataset %>%
  group_by(RIAGENDR) 

dataset_by_gender %>%
  summarize(mean_age_1st_cocaine_use = mean(nhanes_drug_use_csv.DUQ260, na.rm = TRUE),     sd_age_1st_mj = sd(nhanes_drug_use_csv.DUQ260, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_age_1st_cocaine_use sd_age_1st_mj
##   <fct>                       <dbl>         <dbl>
## 1 male                         21.5          5.53
## 2 female                       21.7          6.00
ggplot(dataset, aes(x = gender, y = age_1st_try_coc_years, color = gender)) + 
  geom_boxplot(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Both men and women reported being a few years older (~22) when first trying cocaine as compared to first trying marijuana/hashish. This pattern was consistent over time points.

###Number of days used cocaine in last month by gender (self-report, n = 19997-19760 = 237 participants)

dataset_by_gender %>%
  summarize(mean_days_used_cocaine_in_last_mon = mean(nhanes_drug_use_csv.DUQ280, na.rm = TRUE),sd_days_used_mj_in_last_mon = sd(nhanes_drug_use_csv.DUQ280, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_days_used_cocaine_in_last_mon sd_days_used_mj_in_last_mon
##   <fct>                                 <dbl>                       <dbl>
## 1 male                                   3.99                        5.60
## 2 female                                 4.95                        6.08
ggplot(dataset, aes(x = gender, y = number_of_days_used_coc_in_last_month, color = gender)) + 
  geom_boxplot(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Unlike marijuana/hashish, men and women self-reported using cocaine consistently fewer days (~4-5) during the past month. Some individuals did report using cocaine more often during the past month, especially at the first time point (2009-2010).

##Bar plots

###Ever tried cocaine, heroin and/or methamphetamines? (self-report, yes or no)

percent_polydrug_use_by_gender <- dataset %>%
  filter(complete.cases(nhanes_drug_use_csv.DUQ240)) %>%
  count(RIAGENDR, nhanes_drug_use_csv.DUQ240)%>%
  group_by(RIAGENDR)%>%
  transmute(nhanes_drug_use_csv.DUQ240, Percentage=n/sum(n)*100)

percent_polydrug_use_ever <- percent_polydrug_use_by_gender %>%
  filter(nhanes_drug_use_csv.DUQ240 == 1)

percent_polydrug_use_ever$RIAGENDR -> gender
percent_polydrug_use_ever$Percentage -> percentage_yes

ggplot(percent_polydrug_use_ever, aes(x = gender, y = percentage_yes, fill = gender)) +
  geom_col() + 
  scale_color_brewer(palette = "Set3")

Description of above: Self-reported use of ever trying cocaine, heroin or methamphetamine was low (~12-20%) compared to marijuana/hashish. Men reported ever trying cocaine at a higher percentage (~22%) compared to women (~12%).

##Histograms

###Age first tried cocaine? (self-report, n =  19997 - 17698 = 2299 participants)

ggplot(dataset, aes(x = age_1st_try_coc_years)) +
  geom_histogram(bins = 30, na.rm = TRUE,color = "darkblue", fill = "lightblue" )+
  facet_wrap(year) 

Description of above: As the boxplots showed, people tended to report first trying cocaine in their early twenties. Similar to marijuana/hashish, the distribution is positively skewed, indicating some adults reported first trying cocaine at older ages. These patters were similar across time points.

##Number of days used cocaine in past month? (self-report, n = 19997 - 19760 = 237 participants) 

ggplot(dataset, aes(x = number_of_days_used_coc_in_last_month)) +
  geom_histogram(bins = 30, na.rm = TRUE, color = "darkblue", fill = "lightblue") +
  facet_wrap(year)

Description of above: As the boxplots showed, cocaine use was reported as used less than 10 days during the past month by most participants. This was consistent across time points.

##Further exploration with scatterplots: income:family size ratio (higher is less poverty) and gender  

dataset$RIAGENDR -> gender #re-run this line, otherwise error results: "problem while computing aesthetics"

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = age_1st_try_coc_years, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) + 
  scale_color_brewer(palette = "Set3")

Description of above: Unlike marijuana/hashish, men and women who reported first trying cocaine during their younger years tended to have lower current income:family size ratio. However, the pattern is subtle and it should be noted that this a small sample size. No statistics have been performed to test whether this relationship is statistically significant. The pattern tends to be similar across time points.

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = number_of_days_used_coc_in_last_month, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Similar to marijuana/hashish, there appears to be a relationship between greater reported number of days that cocaine was used and lower current income:family size ratio. Note this is a small sample and no statistical analysis has been done. The pattern is similar across time points.

#Heroin________________________________________________________________________________________________________________________________

##Descriptive statistics and boxplots

###Age first tried heroin by gender (self-report, n = 19997-19669 = 328 participants)

dataset_by_gender <- dataset %>%
  group_by(RIAGENDR) 

dataset_by_gender %>%
  summarize(mean_age_1st_her_use = mean(nhanes_drug_use_csv.DUQ300, na.rm = TRUE),     sd_age_1st_her = sd(nhanes_drug_use_csv.DUQ300, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_age_1st_her_use sd_age_1st_her
##   <fct>                   <dbl>          <dbl>
## 1 male                     24.3           7.78
## 2 female                   24.6           8.59
ggplot(dataset, aes(x = gender, y = age_1st_her_years, color = gender)) + 
  geom_boxplot(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Reported age at first trying heroin (~24-25) is older compared to ages at first trying cocaine or marijuana/hashish.This pattern is especially true for 2011-2012 and 2013-2014 cohorts.

###Number of days used heroin in last month by gender (self-report, n = 19997-19760 = 237 participants)

dataset_by_gender %>%
  summarize(mean_days_used_heroin_in_last_mon = mean(nhanes_drug_use_csv.DUQ320, na.rm = TRUE),     sd_days_used_her_in_last_mon = sd(nhanes_drug_use_csv.DUQ320, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_days_used_heroin_in_last_mon sd_days_used_her_in_last_mon
##   <fct>                                <dbl>                        <dbl>
## 1 male                                 12.9                          11.8
## 2 female                                7.25                         11.3
ggplot(dataset, aes(x = gender, y = number_of_days_used_her_in_last_month, color = gender)) + 
  geom_boxplot(na.rm = TRUE) + 
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: Similar to reported cocaine use, heroin was reported to be used less frequently during the past month compared to marijuana/hashish. Although not seen in the 2009-2010 cohort, females tended to report less frequent use (~7 days) compared to males (~13 days).

##Histograms

###Age first tried heroin? (self-report, n =  19997 - 19669 = 328 participants)

ggplot(dataset, aes(x = age_1st_her_years)) +
  geom_histogram(bins = 30, na.rm = TRUE,color = "darkblue", fill = "lightblue" ) +
  facet_wrap(year)

Description of above: Consistent with the boxplots and similar to reported first age of trying cocaine, many participants reported first trying heroin in their early twenties. However, some individuals reported first trying heroin at older ages, also similar to marijuana and cocaine. This pattern was similar across time points.

##Number of days used heroin in past month? (self-report, n = 19997 - 19959 = 38 participants) 

ggplot(dataset, aes(x = number_of_days_used_her_in_last_month)) +
  geom_histogram(bins = 30, na.rm = TRUE,color = "darkblue", fill = "lightblue") +
  facet_wrap(year)

Description of above: In this small sample, similar to marijuana, participants tended to report using heroin very few days or almost daily during the past month. This pattern was similar across all time points.

##Further exploration with scatterplots: income:family size ratio (higher is less poverty) and gender

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = age_1st_her_years, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) + 
  scale_color_brewer(palette = "Set3")

Description of above: There appears to be a relationship between younger reported age at first trying heroin and lower current income:family size ratio. Note this is a small sample and no statistical analysis has been done. The pattern is similar across time points.

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = number_of_days_used_her_in_last_month, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: This is a very small sample and is composed mostly of males. There may or may not be a relationship between greater reported number of days that heroin was used and lower current income:family size ratio. Note this is a small sample and no statistical analysis has been done. The pattern is similar across time points except for 2011-2012. \(~\) #Methamphetamines______________________________________________________________________________________________________________________

##Descriptive statistics and boxplots

###Age first tried heroin by gender (self-report, n = 19997-19090 = 907 participants)

dataset_by_gender <- dataset %>%
  group_by(RIAGENDR) 

dataset_by_gender %>%
  summarize(mean_age_1st_meth_use = mean(nhanes_drug_use_csv.DUQ340, na.rm = TRUE),     sd_age_1st_meth = sd(nhanes_drug_use_csv.DUQ340, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_age_1st_meth_use sd_age_1st_meth
##   <fct>                    <dbl>           <dbl>
## 1 male                      22.3            6.90
## 2 female                    21.9            6.95
ggplot(dataset, aes(x = gender, y = age_1st_meth_years, fill = gender)) + 
  geom_boxplot(na.rm = TRUE) +
  facet_wrap(year) +
    scale_fill_brewer(palette = "Set3")

Description of above: Reported age at first trying methamphetamine (~22) is similar to ages at first trying cocaine or marijuana/hashish. This pattern is true in men and women across time points.

###Number of days used heroin in last month by gender (self-report, n = 19997-19915 = 82 participants)

dataset_by_gender %>%
  summarize(mean_days_used_meth_in_last_mon = mean(nhanes_drug_use_csv.DUQ360, na.rm = TRUE),     sd_days_used_meth_in_last_mon = sd(nhanes_drug_use_csv.DUQ360, na.rm = TRUE))
## # A tibble: 2 × 3
##   RIAGENDR mean_days_used_meth_in_last_mon sd_days_used_meth_in_last_mon
##   <fct>                              <dbl>                         <dbl>
## 1 male                                6.22                          7.61
## 2 female                              6.83                          7.23
ggplot(dataset, aes(x = gender, y = number_of_days_used_meth_in_last_month, fill = gender)) + 
  geom_boxplot(na.rm = TRUE) + 
  facet_wrap(year) +
    scale_fill_brewer(palette = "Set3")

Description of above: Similar to reported cocaine and heroin use, methamphetamine was reported to be used less frequently (~6 days) during the past month compared to marijuana/hashish. This pattern was consistent across gender and time points.

##Histograms

###Age first tried heroin? (self-report, n =  907 participants)

ggplot(dataset, aes(x = age_1st_meth_years)) +
  geom_histogram(bins = 30, na.rm = TRUE, color = "darkblue", fill = "lightblue") +
  facet_wrap(year) 

Description of above: Consistent with the boxplots and similar to reported first age of trying marijuana, many participants reported first trying methamphetamine in their early twenties or late teens. However, some individuals reported first trying methamphetamine at older ages, also similar to marijuana, cocaine and heroin. This pattern was similar across time points.

##Number of days used heroin in past month? (self-report, n = 19997 - 19959 = 82 participants) 

ggplot(dataset, aes(x = number_of_days_used_meth_in_last_month)) +
  geom_histogram(bins = 30, na.rm = TRUE, color = "darkblue", fill = "lightblue") +
  facet_wrap(year)

Description of above: In this small sample, similar to cocaine, participants tended to report using methamphetamine <10 days during the past month. This pattern was similar across all time points.

##Further exploration with scatterplots: income:family size ratio (higher is less poverty) and gender

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = age_1st_meth_years, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year) +
  scale_color_brewer(palette = "Set3")

Description of above: In men and women, there appears to be a relationship between younger reported age at first trying methamphetamine and lower current income:family size ratio, similar to cocaine and heroin.The pattern is similar across time points.

ggplot(dataset, aes(x = ratio_income_to_poverty_family, y = number_of_days_used_meth_in_last_month, color = gender)) + 
  geom_point(na.rm = TRUE) +
  facet_wrap(year)+
  scale_color_brewer(palette = "Set3")

Description of above: This is a very small sample and is composed mostly of males. There may or may not be a relationship between lower current income:family size ratio and greater number of reported days using methamphetamine (or currently using amphetamine at all) during the past month. Note this is a small sample and no statistical analysis has been done. The pattern is similar across time points.