HYPOTHESIS: Frequent cannabis use may increase depressive tendencies in American adults.
The dataset is from the National Survey on Drug Use and Health (NSDUH), released by the Substance Abuse and Mental Health Services Administration (SAMHSA), part of the U.S. Department of Health and Human Services, and it is accompanied by a codebook, it is cleaned of personal data, health, and other sensitive information, and it is authorized for public distribution and disclosure. Additionally, it provides nationally representative data.
This is relevant to public administration because it is a public health issue. While drug prohibition has proved useless, education can help the general and vulnerable populations make more informed decisions.
EXPLORATORY STATISTICS:
load("NSDUH_2022.Rdata")
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mjadults <- NSDUH_2022 %>%
filter(mjyrtot <= 365, AGE3 >= 4)
cannabis <- mjadults$mjyrtot
hopless <- mjadults %>%
filter(DSTHOP12 <= 5)
hope <- hopless$DSTHOP12
cheered <- mjadults %>%
filter(DSTCHR12 <= 5)
nocheer <- cheered$DSTCHR12
alleff <- mjadults %>%
filter(DSTEFF12 <= 5)
effort <- alleff$DSTEFF12
SUMMARIES:
summary(cannabis)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 10.0 100.0 147.4 312.0 365.0
summary(effort)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 1.00 2.00 2.48 3.00 5.00
summary(hope)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 2.754 4.000 5.000
summary(nocheer)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 2.00 3.00 2.77 4.00 5.00
The summaries show that parameters contained only the desired values relative to the population and variables.
HISTOGRAMS AND BAR GRAPHS:
Below is a histogram representing the number of days that respondents used cannabis:
hist(cannabis)
The bar graphs below represent the dependent variables:
## HOPELESSNESS
ggplot(hopless, aes(x=hope)) +
geom_bar(fill = "Gray") +
labs(title = "How often did you feel hopeless?",
x = " '1' = All of the time - '5' = None of the time' ",
y = "Number of respondents") +
theme_minimal()
## CHEERED UP
ggplot(cheered, aes(x=nocheer)) +
geom_bar(fill = "Gray") +
labs(title = "So sad or depressed nothing could cheer you up",
x = " '1' = All of the time - '5' = None of the time' ",
y = "Number of respondents") +
theme_minimal()
## EVERYTHING AN EFFORT
ggplot(alleff, aes(x=effort)) +
geom_bar(fill = "Gray") +
labs(title = "Felt like everything was an effort",
x = " '1' = All of the time - '5' = None of the time' ",
y = "Number of respondents") +
theme_minimal()
To compare between independent and dependent variables, they must be the same length. For this reason, the subset of adults who have consumed cannabis must also be limited to match the responses to each of the dependent variables. Additionally, each of the dependent variables are converted into categorical below:
cannabishop <- hopless$mjyrtot
hopecat <- factor(hope)
cannabische <- cheered$mjyrtot
nocheercat <- factor(nocheer)
cannabiseff <- alleff$mjyrtot
effortcat <- factor(effort)
The graphs below compare between the independent and dependent variables:
plot(hope,cannabishop)
plot(nocheer,cannabische)
plot(effort,cannabiseff)
With the help of chatgpt, I also was able to put together jitter graphs using ggplot:
### DAYS CANNABIS CONSUMED V HOPELESSNESS
ggplot(hopless, aes(x=hopecat, y=cannabishop)) +
geom_violin() +
labs(title = "Cannabis consumed v hopelessness",
x = "Hopeless: '1'=All of the time-'5'=None of the time'",
y = "Days cannabis consumed") +
theme_minimal()
### DAYS CANNABIS CONSUMED V CHEERED UP
ggplot(cheered, aes(x=nocheercat, y=cannabische)) +
geom_violin() +
labs(title = "Cannabis consumed v not cheered up",
x = "Not able to cheer up: '1'=All of time-'5'=None of time'",
y = "Days cannabis consumed") +
theme_minimal()
### DAYS CANNABIS CONSUMED V everything effort
ggplot(alleff, aes(x=effortcat, y=cannabiseff)) +
geom_violin() +
labs(title = "Cannabis consumed v everything effort",
x = "Everything was an effort: '1'=All of time-'5'=None of time'",
y = "Days cannabis consumed") +
theme_minimal()
CORRELATION BETWEEN VARIABLES Correlation between the number of days cannabis was consumed and the amount of time that respondents felt hopeless:
cor(hope,cannabishop)
## [1] -0.1273356
-0.13 represents a weak positive correlation, given that the amount of time feeling hopeless is highest at 1 and lowest at 5, and the sign should be changed to positive 0.13. As can be plainly seen on the graphs below, there seems to be absolutely no relationship between the two variables. The other two correlations result in similar weak positive correlations:
cor(nocheer,cannabische)
## [1] -0.1204951
cor(effort,cannabiseff)
## [1] -0.09742517
Based on the results above, the initial hypothesis is likely incorrect, as the level of depressive tendencies does not seem to vary according to the level of cannabis use.