Over the last decade, there has been an increase in illicit drug use. With individuals having easy access to these drugs, countless surveys were conducted on this matter. According to fivethirtyeight.com’s article How Baby Boomers Get High, they analyzed the trends between age and specific drug usage. They extracted the dataset from National Survey on Drug Use and Health, 2012.
The drugs database contains the ages and percentage of drug usage used within the last year.
drugs<-read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/drug-use-by-age/drug-use-by-age.csv", na.strings=c("NA", "NULL"))
However, I noticed there may be a correlation between other drugs and alcohol use against different age groups.
This survey was collected and prepared for release by Research Triangle Institute, Research Triangle Park, North Carolina. According to the Substance Abuse & Mental Health Data Archive, the National Survey on Drug Use and Health (NSDUH) used an audio computer-assisted self interview (ACASI) mode of observation to conduct the survey to the participants.
## [1] "There were 55268 participants in this study."
# Fixed typos here!
names(drugs)[names(drugs) == "pain.releiver.use"] <- "pain.reliever.use"
names(drugs)[names(drugs) == "pain.releiver.frequency"] <- "pain.reliever.frequency"
Where \(\mu\) = group mean and k = number of groups
aov1 <- aov(alcohol.frequency ~ marijuana.frequency, data = drugs)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## marijuana.frequency 1 4875 4875 30.51 5.84e-05 ***
## Residuals 15 2397 160
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov2 <- aov(alcohol.frequency ~ cocaine.frequency, data = drugs)
summary(aov2)
## Df Sum Sq Mean Sq F value Pr(>F)
## cocaine.frequency 9 4320 480.0 1.138 0.442
## Residuals 7 2952 421.7
aov3 <- aov(alcohol.frequency ~ crack.frequency, data = drugs)
summary(aov3)
## Df Sum Sq Mean Sq F value Pr(>F)
## crack.frequency 12 5726 477.2 1.235 0.458
## Residuals 4 1546 386.5
aov4 <- aov(alcohol.frequency ~ heroin.frequency, data = drugs)
summary(aov4)
## Df Sum Sq Mean Sq
## heroin.frequency 16 7272 454.5
aov5 <- aov(alcohol.frequency ~ hallucinogen.frequency, data = drugs)
summary(aov5)
## Df Sum Sq Mean Sq F value Pr(>F)
## hallucinogen.frequency 1 222 221.7 0.472 0.503
## Residuals 15 7050 470.0
aov6 <- aov(alcohol.frequency ~ inhalant.frequency, data = drugs)
summary(aov6)
## Df Sum Sq Mean Sq F value Pr(>F)
## inhalant.frequency 10 5621 562.1 2.043 0.197
## Residuals 6 1651 275.1
aov7 <- aov(alcohol.frequency ~ pain.reliever.frequency, data = drugs)
summary(aov7)
## Df Sum Sq Mean Sq F value Pr(>F)
## pain.reliever.frequency 1 0 0.4 0.001 0.978
## Residuals 15 7272 484.8
aov8 <- aov(alcohol.frequency ~ oxycontin.frequency, data = drugs)
summary(aov8)
## Df Sum Sq Mean Sq F value Pr(>F)
## oxycontin.frequency 14 7264 518.8 129.7 0.00768 **
## Residuals 2 8 4.0
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov9 <- aov(alcohol.frequency ~ tranquilizer.frequency, data = drugs)
summary(aov9)
## Df Sum Sq Mean Sq F value Pr(>F)
## tranquilizer.frequency 1 1253 1252.6 3.121 0.0976 .
## Residuals 15 6019 401.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov10 <- aov(alcohol.frequency ~ stimulant.frequency, data = drugs)
summary(aov1)
## Df Sum Sq Mean Sq F value Pr(>F)
## marijuana.frequency 1 4875 4875 30.51 5.84e-05 ***
## Residuals 15 2397 160
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
aov11 <- aov(alcohol.frequency ~ meth.frequency, data = drugs)
summary(aov11)
## Df Sum Sq Mean Sq F value Pr(>F)
## meth.frequency 13 5783 444.9 0.897 0.621
## Residuals 3 1488 496.2
aov12 <- aov(alcohol.frequency ~ sedative.frequency, data = drugs)
summary(aov12)
## Df Sum Sq Mean Sq F value Pr(>F)
## sedative.frequency 1 295 295.0 0.634 0.438
## Residuals 15 6977 465.1
Since the one-way ANOVA returned a statistically different result, we can reject \(H_0\) and accept \(H_A\), which is that there are at least two group means that are statistically different from each other.
drugs2 <- drugs %>%
select(age, alcohol.use, cocaine.use, marijuana.use, crack.use, heroin.use, hallucinogen.use, inhalant.use, pain.reliever.use,oxycontin.use,tranquilizer.use,stimulant.use, meth.use, sedative.use)%>%
arrange(desc(alcohol.use))
View(drugs2)
Ages 22 to 23 displayed the most consumed volume of alcoholic use over the course of twelve months.
ggplot(drugs2, aes(age, alcohol.use)) + geom_point(color = 'blue') +
xlab("Age") + ylab("Alcohol Use") +
ggtitle("Age vs Alcohol Use") + theme_bw()
Since drugs2$age uses a character string, by reviewing the graph, we can reject \(H_0\) and conclude there is a nonzero correlation between age and alcohol usage.
cor(drugs2$alcohol.use, drugs2$cocaine.use, method="pearson")
## [1] 0.7734581
cor(drugs2$alcohol.use, drugs2$marijuana.use, method="pearson")
## [1] 0.5941651
cor(drugs2$alcohol.use, drugs2$crack.use, method="pearson")
## [1] 0.877378
cor(drugs2$alcohol.use, drugs2$heroin.use, method="pearson")
## [1] 0.6776138
cor(drugs2$alcohol.use, drugs2$hallucinogen.use, method="pearson")
## [1] 0.4637019
cor(drugs2$alcohol.use, drugs2$inhalant.use, method="pearson")
## [1] -0.6482481
cor(drugs2$alcohol.use, drugs2$pain.reliever.use, method="pearson")
## [1] 0.6175227
cor(drugs2$alcohol.use, drugs2$oxycontin.use, method="pearson")
## [1] 0.5892193
cor(drugs2$alcohol.use, drugs2$tranquilizer.use, method="pearson")
## [1] 0.7357849
cor(drugs2$alcohol.use, drugs2$stimulant.use, method="pearson")
## [1] 0.5822415
cor(drugs2$alcohol.use, drugs2$meth.use, method="pearson")
## [1] 0.6825311
cor(drugs2$alcohol.use, drugs2$sedative.use, method="pearson")
## [1] 0.3182684
drugs3 <- cor(drugs2[, c(2,3,4,5,6,7,8,9,10,11,12,13,14)])
corrplot(drugs3, type = "full", order = "hclust",
tl.col = "black", tl.srt = 45)
col<- colorRampPalette(c("blue", "white", "red"))(40)
heatmap(x = drugs3, col = col, symm = TRUE)
chart.Correlation(drugs3, histogram=TRUE, pch=19)
#Comparing Alcohol usage with other drug variables
lm1 <- lm(alcohol.use ~ marijuana.use + cocaine.use + crack.use + heroin.use + hallucinogen.use + inhalant.use + pain.reliever.use + oxycontin.use + tranquilizer.use + stimulant.use + meth.use + sedative.use, data = drugs2)
summary(lm1)
##
## Call:
## lm(formula = alcohol.use ~ marijuana.use + cocaine.use + crack.use +
## heroin.use + hallucinogen.use + inhalant.use + pain.reliever.use +
## oxycontin.use + tranquilizer.use + stimulant.use + meth.use +
## sedative.use, data = drugs2)
##
## Residuals:
## 1 2 3 4 5 6 7 8 9 10
## -1.1814 0.1298 -4.0238 7.0832 -1.0339 -1.8100 0.4503 3.0268 3.2428 -4.3926
## 11 12 13 14 15 16 17
## -1.9223 -0.2091 3.0287 -0.2669 1.5217 0.3444 -3.9878
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.698 6.267 6.654 0.00265 **
## marijuana.use 4.128 3.490 1.183 0.30236
## cocaine.use -18.171 12.049 -1.508 0.20601
## crack.use 49.927 42.122 1.185 0.30150
## heroin.use 14.813 44.466 0.333 0.75577
## hallucinogen.use -7.791 2.893 -2.693 0.05450 .
## inhalant.use -18.750 4.455 -4.209 0.01360 *
## pain.reliever.use 4.580 3.733 1.227 0.28711
## oxycontin.use -41.763 31.580 -1.322 0.25656
## tranquilizer.use 4.441 11.413 0.389 0.71701
## stimulant.use 9.715 22.333 0.435 0.68600
## meth.use 18.614 25.047 0.743 0.49866
## sedative.use -71.361 44.283 -1.611 0.18236
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.971 on 4 degrees of freedom
## Multiple R-squared: 0.9877, Adjusted R-squared: 0.9507
## F-statistic: 26.68 on 12 and 4 DF, p-value: 0.003067
plot(lm1)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
After running several tests, there were conflicting results. Although there is evidence of correlation between other drugs and alcohol usage between age groups over time, some evidence point in the opposite direction. Further tests are needed, since the information provided in the survey relates to baby boomers. There is a probability that the choice of drug usage is subject to change over a period of time.