library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.0     âś” stringr   1.5.1
## âś” ggplot2   3.5.2     âś” tibble    3.3.0
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

duh_data <- load("NSDUH 2022.Rdata")

nsduh <- puf2022_110424 

nsduh_clean <- nsduh |> filter(ALCDAYS<32, DSTNRV30<6)

#ALCDAYS = During the past 30 days, on how many days did you drink one or more drinks of an alcoholic beverage? #DSTNRV30 = During the past 30 days, how often did you feel nervous?

cor(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30)
## [1] 0.02035272
pairs(~ALCDAYS+DSTNRV30, data=nsduh_clean)

cor.test(nsduh_clean$ALCDAYS,nsduh_clean$DSTNRV30,method="spearman")
## Warning in cor.test.default(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30, method =
## "spearman"): Cannot compute exact p-value with ties
## 
##  Spearman's rank correlation rho
## 
## data:  nsduh_clean$ALCDAYS and nsduh_clean$DSTNRV30
## S = 2.3958e+12, p-value = 0.1292
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## 0.009715473

#The variables are not normally distributed and the histograms showerd they are strongly skewed. Most respondents reported low drinking frequency and low nervousness. This is also a very large sample size. For these reasons, I believe that a non-parametric test like Spearman’s correlation was most appropriate to determine the relationship between the two variables. The results genuinely surprised me; I thought for sure that there would be a stronger positive relationship between alcohol consumtion and psychological distress.