library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
duh_data <- load("NSDUH 2022.Rdata")
nsduh <- puf2022_110424
pairs(~DSTNRV30 + BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + METHAM30N + PNRNM30FQ, data = nsduh)

model1<-lm(DSTNRV30 ~ BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + METHAM30N + PNRNM30FQ, data = nsduh)
summary(model1)
##
## Call:
## lm(formula = DSTNRV30 ~ BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A +
## METHAM30N + PNRNM30FQ, data = nsduh)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31.592 -26.435 -20.387 0.808 107.863
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.955689 3.635631 -4.389 1.14e-05 ***
## BNGDRMDAYS 4.716930 0.114330 41.257 < 2e-16 ***
## MJDAY30A 0.072222 0.005881 12.281 < 2e-16 ***
## CIG30USE 0.150844 0.006936 21.748 < 2e-16 ***
## COCUS30A -0.020433 0.022978 -0.889 0.374
## METHAM30N 0.038633 0.031141 1.241 0.215
## PNRNM30FQ 0.009348 0.021026 0.445 0.657
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 38.55 on 59062 degrees of freedom
## Multiple R-squared: 0.05059, Adjusted R-squared: 0.05049
## F-statistic: 524.5 on 6 and 59062 DF, p-value: < 2.2e-16
5. The R-squared is valued at about 5%, this means that the
independant variables explain a relatively small amount of the variation
in the psychological distress score. Maybe this is to be expected in
social and behavioral data since human behavior is influenced by many
factors besides substance use. According to their p-values, binge
drinking, marajuana use, and cigarette use are all statistically
significant in relation to nervousness. The insignificant variables
include meth use and pain reliever misuse.
6. According to the beta coefficients, binge drinking has the
strongest positive impact on nervousness compared to the other variables
in the model. Each additional binge drinkimg day is linked to a nearly 5
point increase in distress score. Similarly, it appears that cigarette
and marijuana smokers also tend to report higher distress, although at
lower levels.
plot(model1,which=1)

7 The plot shows a visible downward trend, which makes me unsure if
it strictly meets the assumption of linearity. The red line is mostly
straight and doesn’t show a strong curve. Overall, I’d say tje model
probably meets the assumption but not perfectly.