Data Selection W5

HYPOTHESIS: Frequent cannabis use may increase depressive tendencies in American adults.

The dataset is from the National Survey on Drug Use and Health (NSDUH), released by the Substance Abuse and Mental Health Services Administration (SAMHSA), part of the U.S. Department of Health and Human Services, and it is accompanied by a codebook, it is cleaned of personal data, health, and other sensitive information, and it is authorized for public distribution and disclosure. Additionally, it provides nationally representative data.

This is relevant to public administration because it is a public health issue. While drug prohibition has proved useless, education can help the general and vulnerable populations make more informed decisions.

EXPLORATORY STATISTICS:

First step is to load and create an object using the dataset. The dataset used here was already in R format, available in SAMHSA’s website. Once the working directory is set, simply clicking on the Rdata file generates a command in the console to “load()” the file in its location, and creates an object with the Rdata file name (NSDUH_2022). The full dataset contains 59,069 observations and 2,605 variables.

load("NSDUH_2022.Rdata")

Second, it is probably best to load tidyverse:

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

The chunk below extracts a subset of the population containing only adults (AGE3 >= 4, where “4” represents respondents being at 18 years old), who reported having consumed cannabis products at least once in the previous 12 months (mjyrtot <= 365, representing 365 or less days):

mjadults <- NSDUH_2022 %>%
    filter(mjyrtot <= 365, AGE3 >= 4)

The chunk below extracts the independent variable: “Total number of days in which respondents reported having used marijuana or any cannabis products.”

    cannabis <- mjadults$mjyrtot

Three dependent variables will be used to gauge depressive tendencies, based on “one month in the [prior] 12 months when [respondents] were the most depressed, anxious, or emotionally stressed”:

How often respondents felt hopeless, where “5” means “none of the time,” and “1” represents “all of the time”:

hopless <- mjadults %>%
    filter(DSTHOP12 <= 5)
hope <- hopless$DSTHOP12

How often respondents felt “so sad or depressed that nothing could cheer [them] up, where”5” means “none of the time,” and “1” represents “all of the time”:

cheered <- mjadults %>%
    filter(DSTCHR12 <= 5)
nocheer <- cheered$DSTCHR12

How often respondents felt that everything was an effort, where “5” means “none of the time,” and “1” represents “all of the time”:

alleff <- mjadults %>%
    filter(DSTEFF12 <= 5)
effort <- alleff$DSTEFF12

EXPLORATORY STATISTICS AND THEIR COMMANDS

SUMMARIES:

summary(cannabis)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0    10.0   100.0   147.4   312.0   365.0

summary(effort)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    1.00    2.00    2.48    3.00    5.00

summary(hope)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   2.754   4.000   5.000

summary(nocheer)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    2.00    3.00    2.77    4.00    5.00

The summaries show that parameters contained only the desired values relative to the population and variables.

HISTOGRAMS AND BAR GRAPHS:

Below is a histogram representing the number of days that respondents used cannabis:

hist(cannabis)

The bar graphs below represent the dependent variables:

## HOPELESSNESS
ggplot(hopless, aes(x=hope)) +
  geom_bar(fill = "Gray") +
  labs(title = "How often did you feel hopeless?",
    x = " '1' = All of the time  -  '5' = None of the time' ",
    y = "Number of respondents") +
  theme_minimal()

## CHEERED UP
ggplot(cheered, aes(x=nocheer)) +
  geom_bar(fill = "Gray") +
  labs(title = "So sad or depressed nothing could cheer you up",
    x = " '1' = All of the time  -  '5' = None of the time' ",
    y = "Number of respondents") +
  theme_minimal()

## EVERYTHING AN EFFORT
ggplot(alleff, aes(x=effort)) +
  geom_bar(fill = "Gray") +
  labs(title = "Felt like everything was an effort",
    x = " '1' = All of the time  -  '5' = None of the time' ",
    y = "Number of respondents") +
  theme_minimal()

To compare between independent and dependent variables, they must be the same length. For this reason, the subset of adults who have consumed cannabis must also be limited to match the responses to each of the dependent variables. Additionally, each of the dependent variables are converted into categorical below:

cannabishop <- hopless$mjyrtot
hopecat <- factor(hope)
cannabische <- cheered$mjyrtot
nocheercat <- factor(nocheer)
cannabiseff <- alleff$mjyrtot
effortcat <- factor(effort)

The graphs below compare between the independent and dependent variables:

plot(hope,cannabishop)

plot(nocheer,cannabische)

plot(effort,cannabiseff)

With the help of chatgpt, I also was able to put together jitter graphs using ggplot:

### DAYS CANNABIS CONSUMED V HOPELESSNESS
ggplot(hopless, aes(x=hopecat, y=cannabishop)) +
  geom_violin() +
  labs(title = "Cannabis consumed v hopelessness",
    x = "Hopeless: '1'=All of the time-'5'=None of the time'",
    y = "Days cannabis consumed") +
  theme_minimal()

### DAYS CANNABIS CONSUMED V CHEERED UP
ggplot(cheered, aes(x=nocheercat, y=cannabische)) +
  geom_violin() +
  labs(title = "Cannabis consumed v not cheered up",
    x = "Not able to cheer up: '1'=All of time-'5'=None of time'",
    y = "Days cannabis consumed") +
  theme_minimal()

### DAYS CANNABIS CONSUMED V everything effort
ggplot(alleff, aes(x=effortcat, y=cannabiseff)) +
  geom_violin() +
  labs(title = "Cannabis consumed v everything effort",
    x = "Everything was an effort: '1'=All of time-'5'=None of time'",
    y = "Days cannabis consumed") +
  theme_minimal()

CORRELATION BETWEEN VARIABLES Correlation between the number of days cannabis was consumed and the amount of time that respondents felt hopeless:

cor(hope,cannabishop)

## [1] -0.1273356

-0.13 represents a weak positive correlation, given that the amount of time feeling hopeless is highest at 1 and lowest at 5, and the sign should be changed to positive 0.13. As can be plainly seen on the graphs below, there seems to be absolutely no relationship between the two variables. The other two correlations result in similar weak positive correlations:

cor(nocheer,cannabische)

## [1] -0.1204951

cor(effort,cannabiseff)

## [1] -0.09742517

Based on the results above, the initial hypothesis is likely incorrect, as the level of depressive tendencies does not seem to vary according to the level of cannabis use.

Data Selection W5

Lester Velazquez-Po

2024-09-22