library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)

duh_data <- load("NSDUH 2022.Rdata")
nsduh <- puf2022_110424 
pairs(~DSTNRV30 + BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + METHAM30N + PNRNM30FQ, data = nsduh)

model1<-lm(DSTNRV30 ~ BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + METHAM30N + PNRNM30FQ, data = nsduh)
summary(model1)
## 
## Call:
## lm(formula = DSTNRV30 ~ BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + 
##     METHAM30N + PNRNM30FQ, data = nsduh)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -31.592 -26.435 -20.387   0.808 107.863 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -15.955689   3.635631  -4.389 1.14e-05 ***
## BNGDRMDAYS    4.716930   0.114330  41.257  < 2e-16 ***
## MJDAY30A      0.072222   0.005881  12.281  < 2e-16 ***
## CIG30USE      0.150844   0.006936  21.748  < 2e-16 ***
## COCUS30A     -0.020433   0.022978  -0.889    0.374    
## METHAM30N     0.038633   0.031141   1.241    0.215    
## PNRNM30FQ     0.009348   0.021026   0.445    0.657    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 38.55 on 59062 degrees of freedom
## Multiple R-squared:  0.05059,    Adjusted R-squared:  0.05049 
## F-statistic: 524.5 on 6 and 59062 DF,  p-value: < 2.2e-16

5. The R-squared is valued at about 5%, this means that the independant variables explain a relatively small amount of the variation in the psychological distress score. Maybe this is to be expected in social and behavioral data since human behavior is influenced by many factors besides substance use. According to their p-values, binge drinking, marajuana use, and cigarette use are all statistically significant in relation to nervousness. The insignificant variables include meth use and pain reliever misuse.

6. According to the beta coefficients, binge drinking has the strongest positive impact on nervousness compared to the other variables in the model. Each additional binge drinkimg day is linked to a nearly 5 point increase in distress score. Similarly, it appears that cigarette and marijuana smokers also tend to report higher distress, although at lower levels.

plot(model1,which=1)

7 The plot shows a visible downward trend, which makes me unsure if it strictly meets the assumption of linearity. The red line is mostly straight and doesn’t show a strong curve. Overall, I’d say tje model probably meets the assumption but not perfectly.