library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.4 âś” readr 2.1.5
## âś” forcats 1.0.0 âś” stringr 1.5.1
## âś” ggplot2 3.5.2 âś” tibble 3.3.0
## âś” lubridate 1.9.4 âś” tidyr 1.3.1
## âś” purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
duh_data <- load("NSDUH 2022.Rdata")
nsduh <- puf2022_110424
nsduh_clean <- nsduh |> filter(ALCDAYS<32, DSTNRV30<6)
#ALCDAYS = During the past 30 days, on how many days did you drink one or more drinks of an alcoholic beverage? #DSTNRV30 = During the past 30 days, how often did you feel nervous?
cor(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30)
## [1] 0.02035272
pairs(~ALCDAYS+DSTNRV30, data=nsduh_clean)
cor.test(nsduh_clean$ALCDAYS,nsduh_clean$DSTNRV30,method="spearman")
## Warning in cor.test.default(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30, method =
## "spearman"): Cannot compute exact p-value with ties
##
## Spearman's rank correlation rho
##
## data: nsduh_clean$ALCDAYS and nsduh_clean$DSTNRV30
## S = 2.3958e+12, p-value = 0.1292
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.009715473
#The variables are not normally distributed and the histograms showerd they are strongly skewed. Most respondents reported low drinking frequency and low nervousness. This is also a very large sample size. For these reasons, I believe that a non-parametric test like Spearman’s correlation was most appropriate to determine the relationship between the two variables. The results genuinely surprised me; I thought for sure that there would be a stronger positive relationship between alcohol consumtion and psychological distress.