Drug Use by Age

Over the last decade, there has been an increase in illicit drug use. With individuals having easy access to these drugs, countless surveys were conducted on this matter. According to fivethirtyeight.com’s article How Baby Boomers Get High, they analyzed the trends between age and specific drug usage. They extracted the dataset from National Survey on Drug Use and Health, 2012.

The drugs database contains the ages and percentage of drug usage used within the last year.

drugs<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv", na.strings=c("NA", "NULL"))

However, I noticed there may be a correlation between other drugs and alcohol use against different age groups.

This survey was collected and prepared for release by Research Triangle Institute, Research Triangle Park, North Carolina. According to the Substance Abuse & Mental Health Data Archive, the National Survey on Drug Use and Health (NSDUH) used an audio computer-assisted self interview (ACASI) mode of observation to conduct the survey to the participants.

## [1] "There were 55268 participants in this study."
# Fixed typos here!
names(drugs)[names(drugs) == "pain.releiver.use"] <- "pain.reliever.use"
names(drugs)[names(drugs) == "pain.releiver.frequency"] <- "pain.reliever.frequency"

ANOVA Test

Where \(\mu\) = group mean and k = number of groups

aov1 <- aov(alcohol.frequency ~ marijuana.frequency, data = drugs)
summary(aov1)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## marijuana.frequency  1   4875    4875   30.51 5.84e-05 ***
## Residuals           15   2397     160                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov2 <- aov(alcohol.frequency ~ cocaine.frequency, data = drugs)
summary(aov2)
##                   Df Sum Sq Mean Sq F value Pr(>F)
## cocaine.frequency  9   4320   480.0   1.138  0.442
## Residuals          7   2952   421.7
aov3 <- aov(alcohol.frequency ~ crack.frequency, data = drugs)
summary(aov3)
##                 Df Sum Sq Mean Sq F value Pr(>F)
## crack.frequency 12   5726   477.2   1.235  0.458
## Residuals        4   1546   386.5
aov4 <- aov(alcohol.frequency ~ heroin.frequency, data = drugs)
summary(aov4)
##                  Df Sum Sq Mean Sq
## heroin.frequency 16   7272   454.5
aov5 <- aov(alcohol.frequency ~ hallucinogen.frequency, data = drugs)
summary(aov5)
##                        Df Sum Sq Mean Sq F value Pr(>F)
## hallucinogen.frequency  1    222   221.7   0.472  0.503
## Residuals              15   7050   470.0
aov6 <- aov(alcohol.frequency ~ inhalant.frequency, data = drugs)
summary(aov6)
##                    Df Sum Sq Mean Sq F value Pr(>F)
## inhalant.frequency 10   5621   562.1   2.043  0.197
## Residuals           6   1651   275.1
aov7 <- aov(alcohol.frequency ~ pain.reliever.frequency, data = drugs)
summary(aov7)
##                         Df Sum Sq Mean Sq F value Pr(>F)
## pain.reliever.frequency  1      0     0.4   0.001  0.978
## Residuals               15   7272   484.8
aov8 <- aov(alcohol.frequency ~ oxycontin.frequency, data = drugs)
summary(aov8)
##                     Df Sum Sq Mean Sq F value  Pr(>F)   
## oxycontin.frequency 14   7264   518.8   129.7 0.00768 **
## Residuals            2      8     4.0                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov9 <- aov(alcohol.frequency ~ tranquilizer.frequency, data = drugs)
summary(aov9)
##                        Df Sum Sq Mean Sq F value Pr(>F)  
## tranquilizer.frequency  1   1253  1252.6   3.121 0.0976 .
## Residuals              15   6019   401.3                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov10 <- aov(alcohol.frequency ~ stimulant.frequency, data = drugs)
summary(aov1)
##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## marijuana.frequency  1   4875    4875   30.51 5.84e-05 ***
## Residuals           15   2397     160                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov11 <- aov(alcohol.frequency ~ meth.frequency, data = drugs)
summary(aov11)
##                Df Sum Sq Mean Sq F value Pr(>F)
## meth.frequency 13   5783   444.9   0.897  0.621
## Residuals       3   1488   496.2
aov12 <- aov(alcohol.frequency ~ sedative.frequency, data = drugs)
summary(aov12)
##                    Df Sum Sq Mean Sq F value Pr(>F)
## sedative.frequency  1    295   295.0   0.634  0.438
## Residuals          15   6977   465.1

Since the one-way ANOVA returned a statistically different result, we can reject \(H_0\) and accept \(H_A\), which is that there are at least two group means that are statistically different from each other.

Correlation Test

Second Dataframe Created

drugs2 <- drugs %>%
    select(age, alcohol.use, cocaine.use, marijuana.use, crack.use, heroin.use, hallucinogen.use, inhalant.use, pain.reliever.use,oxycontin.use,tranquilizer.use,stimulant.use, meth.use, sedative.use)%>%
    arrange(desc(alcohol.use))
View(drugs2)

Ages 22 to 23 displayed the most consumed volume of alcoholic use over the course of twelve months.

ggplot(drugs2, aes(age, alcohol.use)) + geom_point(color = 'blue') +
    xlab("Age") + ylab("Alcohol Use") +
    ggtitle("Age vs Alcohol Use") + theme_bw()              

Since drugs2$age uses a character string, by reviewing the graph, we can reject \(H_0\) and conclude there is a nonzero correlation between age and alcohol usage.

Correlation Test between Alcohol and the Other Drugs

cor(drugs2$alcohol.use, drugs2$cocaine.use, method="pearson")
## [1] 0.7734581
cor(drugs2$alcohol.use, drugs2$marijuana.use, method="pearson")
## [1] 0.5941651
cor(drugs2$alcohol.use, drugs2$crack.use, method="pearson")
## [1] 0.877378
cor(drugs2$alcohol.use, drugs2$heroin.use, method="pearson")
## [1] 0.6776138
cor(drugs2$alcohol.use, drugs2$hallucinogen.use, method="pearson")
## [1] 0.4637019
cor(drugs2$alcohol.use, drugs2$inhalant.use, method="pearson")
## [1] -0.6482481
cor(drugs2$alcohol.use, drugs2$pain.reliever.use, method="pearson")
## [1] 0.6175227
cor(drugs2$alcohol.use, drugs2$oxycontin.use, method="pearson")
## [1] 0.5892193
cor(drugs2$alcohol.use, drugs2$tranquilizer.use, method="pearson")
## [1] 0.7357849
cor(drugs2$alcohol.use, drugs2$stimulant.use, method="pearson")
## [1] 0.5822415
cor(drugs2$alcohol.use, drugs2$meth.use, method="pearson")
## [1] 0.6825311
cor(drugs2$alcohol.use, drugs2$sedative.use, method="pearson")
## [1] 0.3182684

Omitting Age Column Variable

drugs3 <- cor(drugs2[, c(2,3,4,5,6,7,8,9,10,11,12,13,14)])
corrplot(drugs3, type = "full", order = "hclust", 
         tl.col = "black", tl.srt = 45)

col<- colorRampPalette(c("blue", "white", "red"))(40)
heatmap(x = drugs3, col = col, symm = TRUE)

chart.Correlation(drugs3, histogram=TRUE, pch=19)

Linear Regression Test

#Comparing Alcohol usage with other drug variables

lm1 <- lm(alcohol.use ~ marijuana.use + cocaine.use + crack.use + heroin.use + hallucinogen.use + inhalant.use + pain.reliever.use + oxycontin.use + tranquilizer.use + stimulant.use + meth.use + sedative.use, data = drugs2)
summary(lm1)
## 
## Call:
## lm(formula = alcohol.use ~ marijuana.use + cocaine.use + crack.use + 
##     heroin.use + hallucinogen.use + inhalant.use + pain.reliever.use + 
##     oxycontin.use + tranquilizer.use + stimulant.use + meth.use + 
##     sedative.use, data = drugs2)
## 
## Residuals:
##       1       2       3       4       5       6       7       8       9      10 
## -1.1814  0.1298 -4.0238  7.0832 -1.0339 -1.8100  0.4503  3.0268  3.2428 -4.3926 
##      11      12      13      14      15      16      17 
## -1.9223 -0.2091  3.0287 -0.2669  1.5217  0.3444 -3.9878 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)   
## (Intercept)         41.698      6.267   6.654  0.00265 **
## marijuana.use        4.128      3.490   1.183  0.30236   
## cocaine.use        -18.171     12.049  -1.508  0.20601   
## crack.use           49.927     42.122   1.185  0.30150   
## heroin.use          14.813     44.466   0.333  0.75577   
## hallucinogen.use    -7.791      2.893  -2.693  0.05450 . 
## inhalant.use       -18.750      4.455  -4.209  0.01360 * 
## pain.reliever.use    4.580      3.733   1.227  0.28711   
## oxycontin.use      -41.763     31.580  -1.322  0.25656   
## tranquilizer.use     4.441     11.413   0.389  0.71701   
## stimulant.use        9.715     22.333   0.435  0.68600   
## meth.use            18.614     25.047   0.743  0.49866   
## sedative.use       -71.361     44.283  -1.611  0.18236   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.971 on 4 degrees of freedom
## Multiple R-squared:  0.9877, Adjusted R-squared:  0.9507 
## F-statistic: 26.68 on 12 and 4 DF,  p-value: 0.003067

Linear Equations

Residual Analysis

plot(lm1)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

Conclusion

After running several tests, there were conflicting results. Although there is evidence of correlation between other drugs and alcohol usage between age groups over time, some evidence point in the opposite direction. Further tests are needed, since the information provided in the survey relates to baby boomers. There is a probability that the choice of drug usage is subject to change over a period of time.