library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:dplyr':
##
## select
duh_data <- load("NSDUH 2022.Rdata")
nsduh <- puf2022_110424
model1<-lm(DSTNRV30 ~ BNGDRMDAYS + MJDAY30A + CIG30USE + COCUS30A + METHAM30N + PNRNM30FQ, data = nsduh)
plot(model1,which=1)

raintest(model1)
##
## Rainbow test
##
## data: model1
## Rain = 1.0151, df1 = 29535, df2 = 29527, p-value = 0.09906
because the pvalue is greater than .05, there is no evidence of
nonlinearity and the model meets the linearity assumption.
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
durbinWatsonTest(model1)
## lag Autocorrelation D-W Statistic p-value
## 1 -0.005955922 2.011897 0.162
## Alternative hypothesis: rho != 0
because the pvalue is greater than .05, this means the errors are
independent and the model meets the independence of errors
assumption
plot(model1,which=3)

bptest(model1)
##
## studentized Breusch-Pagan test
##
## data: model1
## BP = 4189.2, df = 6, p-value < 2.2e-16
the plot shows a noticable curved pattern which indicates that the
variance is not constant across the model; additionaly, the BP test has
a pvalue well below .05 meaning homoscedasticity is rejected and
violates the assumption
plot(model1,which=2)

ks.test(model1$residuals,"pnorm")
## Warning in ks.test.default(model1$residuals, "pnorm"): ties should not be
## present for the one-sample Kolmogorov-Smirnov test
##
## Asymptotic one-sample Kolmogorov-Smirnov test
##
## data: model1$residuals
## D = 0.72628, p-value < 2.2e-16
## alternative hypothesis: two-sided
my plot shows a strong S shape which means they deviate from the
line. This indicates that the residuals are not normally distributed and
the normality assumption is violated. Because the dataset is so large,
the sample size exceeds the shapiro test limit and could not be
run.
vif(model1)
## BNGDRMDAYS MJDAY30A CIG30USE COCUS30A METHAM30N PNRNM30FQ
## 1.064535 1.093127 1.077673 1.023520 1.034364 1.025844