PAD 6833 HW5

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readxl)

duh_data <- load("NSDUH 2022.Rdata")

nsduh <- puf2022_110424 

nsduh_clean <- nsduh |> filter(ALCDAYS<32, DSTNRV30<6)

#ALCDAYS = During the past 30 days, on how many days did you drink one or more drinks of an alcoholic beverage? #DSTNRV30 = During the past 30 days, how often did you feel nervous?

cor(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30)

## [1] 0.02035272

pairs(~ALCDAYS+DSTNRV30, data=nsduh_clean)

cor.test(nsduh_clean$ALCDAYS,nsduh_clean$DSTNRV30,method="spearman")

## Warning in cor.test.default(nsduh_clean$ALCDAYS, nsduh_clean$DSTNRV30, method =
## "spearman"): Cannot compute exact p-value with ties

## 
##  Spearman's rank correlation rho
## 
## data:  nsduh_clean$ALCDAYS and nsduh_clean$DSTNRV30
## S = 2.3958e+12, p-value = 0.1292
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## 0.009715473

#The variables are not normally distributed and the histograms showerd they are strongly skewed. Most respondents reported low drinking frequency and low nervousness. This is also a very large sample size. For these reasons, I believe that a non-parametric test like Spearman’s correlation was most appropriate to determine the relationship between the two variables. The results genuinely surprised me; I thought for sure that there would be a stronger positive relationship between alcohol consumtion and psychological distress.

PAD 6833 HW5

Erik Mata

2025-10-15