Research Question: What individual and behavioral factors are associated with the number of mentally unhealthy days reported by U.S. adults, and does the association between current smoking and mental health differ by sex?

Part 0: Data Acquistion and Preparation (15 points)

0.1 Packages and Import

# Load required packages
library(tidyverse)
library(haven)
library(kableExtra)
library(broom)
library(car)
library(gtsummary)
library(ggeffects)
library(dplyr)
#upload BRFSS datatset
library(haven)
brfss_raw <- read_xpt("~/Desktop/EPI553/data/LLCP2023.XPT ")
# Quick check
glimpse(brfss_raw)
## Rows: 433,323
## Columns: 350
## $ `_STATE`   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ FMONTH     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ IDATE      <chr> "03012023", "01062023", "03082023", "03062023", "01062023",…
## $ IMONTH     <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01",…
## $ IDAY       <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08",…
## $ IYEAR      <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "20…
## $ DISPCODE   <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100,…
## $ SEQNO      <chr> "2023000001", "2023000002", "2023000003", "2023000004", "20…
## $ `_PSU`     <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.02…
## $ CTELENM1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ PVTRESD1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ COLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATERE1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CELPHON1   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ LADULT1    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMADULT   <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ RESPSLC1   <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, …
## $ LANDSEX2   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ LNDSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SAFETIME   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CTELNUM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLFON5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CADULT1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLSEX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PVTRESD3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CCLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSTATE1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LANDLINE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HHADULT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SEXVAR     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ GENHLTH    <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3,…
## $ PHYSHLTH   <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH   <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ POORHLTH   <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1…
## $ PRIMINS1   <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ PERSDOC3   <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ MEDCOST1   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHECKUP1   <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ EXERANY2   <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ EXRACT12   <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, …
## $ EXEROFT1   <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105,…
## $ EXERHMM1   <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ EXRACT22   <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 8…
## $ EXEROFT2   <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, N…
## $ EXERHMM2   <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA…
## $ STRENGTH   <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888,…
## $ BPHIGH6    <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1,…
## $ BPMEDS1    <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1…
## $ CHOLCHK3   <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
## $ TOLDHI3    <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHOLMED3   <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ CVDINFR4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDCRHD4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDSTRK3   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,…
## $ ASTHMA3    <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ ASTHNOW    <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CHCSCNC1   <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCOCNC1   <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHCCOPD3   <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2,…
## $ ADDEPEV3   <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCKDNY2   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2,…
## $ HAVARTH4   <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ DIABETE4   <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1,…
## $ DIABAGE4   <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA,…
## $ MARITAL    <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
## $ EDUCA      <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ RENTHOM1   <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMHHOL4   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ NUMPHON4   <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CPDEMO1C   <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ VETERAN3   <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2,…
## $ EMPLOY1    <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7,…
## $ CHILDREN   <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ INCOME3    <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99…
## $ PREGNANT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WEIGHT2    <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145,…
## $ HEIGHT3    <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505,…
## $ DEAF       <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ BLIND      <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ DECIDE     <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7,…
## $ DIFFWALK   <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1,…
## $ DIFFDRES   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ DIFFALON   <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ FALL12MN   <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1,…
## $ FALLINJ5   <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, …
## $ SMOKE100   <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1,…
## $ SMOKDAY2   <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, N…
## $ USENOW3    <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ ECIGNOW2   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1,…
## $ ALCDAY4    <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888,…
## $ AVEDRNK3   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA,…
## $ DRNK3GE5   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ MAXDRNKS   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA,…
## $ FLUSHOT7   <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ FLSHTMY3   <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 1…
## $ PNEUVAC4   <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1,…
## $ SHINGLE2   <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2,…
## $ HIVTST7    <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ HIVTSTD3   <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, …
## $ SEATBELT   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKDRI2   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ COVIDPO1   <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1,…
## $ COVIDSM1   <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA…
## $ COVIDACT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, …
## $ PDIABTS1   <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8…
## $ PREDIAB2   <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3…
## $ DIABTYPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INSULIN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CHKHEMO3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EYEEXAM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEYE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEDU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FEETSORE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEXER   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEDU    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LMTJOIN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHDIS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ JOINPAI2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSFIRST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSLAST    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSNUMCG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTSC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSSCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTWHN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADMAM     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HOWLONG    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CERVSCRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLCNC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLPAP   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLHPV   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADHYST2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATEST1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATIME1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCPSARS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSASUGS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCSTALK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADSIGM4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNSIGM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIGMTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSIG4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VIRCOLO1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VCLNTES2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SMALSTOL   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOLTEST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOOLDN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BLDSTFIT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SDNATES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRDIFF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRAGE    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRTYP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVTRT3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDOC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVSUM    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVRTRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINSR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDEIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCLIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVPAIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCTL2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDORTAN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ NUMBURN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SUNPRTCT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKDAYOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKENDOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CIMEMLO1   <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2…
## $ CDWORRY    <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA,…
## $ CDDISCU1   <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDHOUS1    <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDSOCIA1   <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CAREGIV1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVREL4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVLNG1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHRS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPRB3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVALZD   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPER1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHOU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVEXPT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSMK2   <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, …
## $ STOPSMK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTCIGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTECIG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HEATTBCO   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIREARM5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GUNLOAD    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOADULK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP2    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP3    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP4    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP5    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP6    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP6   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIRSTAID   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ASPIRIN    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BIRTHSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SOMALE     <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA,…
## $ SOFEMALE   <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, …
## $ TRNSGNDR   <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ MARIJAN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJSMOK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJEAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJVAPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJDAB    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJOTHR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ USEMRJN4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDEPRS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRINK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRUGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPRISN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDIVRC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPUNCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHURT1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACESWEAR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETOUCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETTHEM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHVSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADSAF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADNED   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ IMFVPLA4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADVC4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADSHT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TETANUS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDVA1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVACGE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDNU2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LSATISFY   <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2…
## $ EMTSUPRT   <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2…
## $ SDLONELY   <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2…
## $ SDHEMPLY   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ FOODSTMP   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHFOOD1   <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5…
## $ SDHBILLS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHUTILS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHTRNSP   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHSTRE1   <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4…
## $ RRCLASS3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRCOGNT2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRTREAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRATWRK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRHCARE4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRPHYSM2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSGEND1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSXBRTH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSRLTN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHDX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHNO2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ QSTVER     <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ QSTLANG    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_METSTAT` <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2,…
## $ `_URBSTAT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MSCODE     <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1,…
## $ `_STSTR`   <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 110…
## $ `_STRWT`   <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158…
## $ `_RAWRAKE` <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ `_WT2RAKE` <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316…
## $ `_IMPRACE` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_CHISPNC` <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ `_CRACE1`  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CAGEG      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_CLLCPWT` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_DUALUSE` <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DUALCOR` <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, …
## $ `_LLCPWT2` <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.164…
## $ `_LLCPWT`  <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.335…
## $ `_RFHLTH`  <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,…
## $ `_PHYS14D` <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1,…
## $ `_MENT14D` <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1,…
## $ `_HLTHPL1` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_HCVU653` <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9,…
## $ `_TOTINDA` <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ METVL12_   <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA,…
## $ METVL22_   <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ MAXVO21_   <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875,…
## $ FC601_     <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328,…
## $ ACTIN13_   <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, …
## $ ACTIN23_   <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, N…
## $ PADUR1_    <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ PADUR2_    <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ PAFREQ1_   <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, N…
## $ PAFREQ2_   <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA,…
## $ `_MINAC12` <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA…
## $ `_MINAC22` <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0,…
## $ STRFREQ_   <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000,…
## $ PAMISS3_   <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1,…
## $ PAMIN13_   <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, …
## $ PAMIN23_   <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, N…
## $ PA3MIN_    <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, …
## $ PAVIG13_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, …
## $ PAVIG23_   <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA,…
## $ PA3VIGM_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA…
## $ `_PACAT3`  <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9,…
## $ `_PAINDX3` <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ `_PA150R4` <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1,…
## $ `_PA300R4` <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9,…
## $ `_PA30023` <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9,…
## $ `_PASTRNG` <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2,…
## $ `_PAREC3`  <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2,…
## $ `_PASTAE3` <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFHYPE6` <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHOLCH3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFCHOL3` <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2,…
## $ `_MICHD`   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_LTASTH1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ `_CASTHM1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1,…
## $ `_ASTHMS1` <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3,…
## $ `_DRDXAR2` <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ `_MRACE1`  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_HISPANC` <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_RACE`    <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEG21` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEGR3` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEPRV` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_SEX`     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_AGE65YR` <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ `_AGE80`   <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79,…
## $ `_AGE_G`   <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6,…
## $ HTIN4      <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62,…
## $ HTM4       <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165,…
## $ WTKG3      <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257,…
## $ `_BMI5`    <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296,…
## $ `_BMI5CAT` <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3,…
## $ `_RFBMI5`  <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHLDCNT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_EDUCAG`  <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2,…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ `_SMOKER3` <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3,…
## $ `_RFSMOK3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ `_CURECI2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKANY6   <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1,…
## $ DROCDY4_   <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 3…
## $ `_RFBING6` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKWK2` <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, …
## $ `_RFDRHV8` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_FLSHOT7` <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2,…
## $ `_PNEUMO3` <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1,…
## $ `_AIDTST4` <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFSEAT2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFSEAT3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKDRV` <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2,…

In the raw dataset there are 433,323 rows and 350 columns

0.2 Variable selection and recoding

#create new dataset with variables of interest (9 variables)
brfss_new <- brfss_raw %>%
  select(MENTHLTH, PHYSHLTH,'_BMI5', SEXVAR, EXERANY2, '_AGEG5YR', '_INCOMG1', EDUCA, '_SMOKER3')

#check new dataset 
glimpse(brfss_new)
## Rows: 433,323
## Columns: 9
## $ MENTHLTH   <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ PHYSHLTH   <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ `_BMI5`    <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296,…
## $ SEXVAR     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ EXERANY2   <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ EDUCA      <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ `_SMOKER3` <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3,…
# Recode variables

brfss_clean <- brfss_new %>%
  mutate(
    
    # meantally unhealthy days
    menthlth_days = case_when(
      MENTHLTH == 88 ~ 0,
      MENTHLTH %in% 77:99 ~ NA_real_,
      MENTHLTH >= 1 & MENTHLTH <= 30 ~ as.numeric(MENTHLTH),
      TRUE ~ NA_real_
    ),
    
     # Physically unhealthy days 
    physhlth_days = case_when(
      PHYSHLTH == 88 ~ 0,
      PHYSHLTH %in% 77:99 ~ NA_real_,
      PHYSHLTH >= 1 & PHYSHLTH <= 30 ~ as.numeric(PHYSHLTH),
      TRUE ~ NA_real_
    ),
    
    # BMI 
    bmi = ifelse(`_BMI5` > 0, `_BMI5` / 100, NA_real_),
    
    # sex
    sex = factor(SEXVAR,
                 levels = c(1, 2),
                 labels = c("Male", "Female")),
    
     # exercise 
    exercise = factor(case_when(
      EXERANY2 == 1 ~ "Yes",
      EXERANY2 == 2 ~ "No",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),
    
    # age groups 
    age_group=factor(case_when(
      `_AGEG5YR` == 1 ~ "18-24",
      `_AGEG5YR` == 2 ~ "25-29",
      `_AGEG5YR` == 3 ~ "30-34",
      `_AGEG5YR` == 4 ~ "35-39",
      `_AGEG5YR` == 5 ~ "40-44",
      `_AGEG5YR` == 6 ~ "45-49",
      `_AGEG5YR` == 7 ~ "50-54",
      `_AGEG5YR` == 8 ~ "55-59",
      `_AGEG5YR` == 9 ~ "60-64",
      `_AGEG5YR` == 10 ~ "65-69",
      `_AGEG5YR` == 11 ~ "70-74",
      `_AGEG5YR` == 12 ~ "75-79",
      `_AGEG5YR` == 13 ~ "80+",
      `_AGEG5YR` == 14 ~ NA_character_,
      TRUE ~ NA_character_
    )),
    
    # income
    income= factor(case_when(
      `_INCOMG1` == 1 ~ "Less than $15,000",
      `_INCOMG1` == 2 ~ "$15,000–$24,999",
      `_INCOMG1` == 3 ~ "$25,000–$34,999",
      `_INCOMG1` == 4 ~ "$35,000–$49,999",
      `_INCOMG1` == 5 ~ "$50,000–$99,999",
      `_INCOMG1` == 6 ~ "$100,000–$199,999",
      `_INCOMG1` == 7 ~ "$200,000 or more",
      `_INCOMG1` == 9 ~ NA_character_,
      TRUE ~ NA_character_
    )),

    # education 
    education = factor(case_when(
      EDUCA %in% c(1, 2) ~ "Less than HS",
      EDUCA == 3 ~ "HS diploma or GED",
      EDUCA == 4 ~ "Some college or technical school",
      EDUCA == 5 ~ "College graduate",
      EDUCA == 6 ~ "Graduate or professional degree",
      EDUCA == 9 ~ NA_character_,
      TRUE ~ NA_character_
    )), 

     # Smoking status
    smoking = factor(case_when(
      `_SMOKER3` == 1 ~ "Current daily smoker",
      `_SMOKER3` == 2 ~ "Current some-day smoker",
      `_SMOKER3` == 3 ~ "Former smoker",
      `_SMOKER3` == 4 ~ "Never smoker",
      `_SMOKER3` == 9 ~ NA_character_,
      TRUE ~ NA_character_
    ))
    
    )

0.3 Missingness Report

# Report missing values for the key variables
cat("Total N:", nrow(brfss_clean), "\n\n")
## Total N: 433323
miss_info <- function(x) {
  n <- sum(is.na(x))
  pct <- round(100 * n / length(x), 2)
  paste0(n, " (", pct, "%)")
}

cat("Missing mentally unhealth days:", miss_info(brfss_clean$menthlth_days), "\n")
## Missing mentally unhealth days: 8108 (1.87%)
cat("Missing physically unhealthy days:", miss_info(brfss_clean$physhlth_days), "\n")
## Missing physically unhealthy days: 10785 (2.49%)
cat("Missing bmi:", miss_info(brfss_clean$bmi), "\n")
## Missing bmi: 40535 (9.35%)
cat("Missing sex:", miss_info(brfss_clean$sex), "\n")
## Missing sex: 0 (0%)
cat("Missing exercise:", miss_info(brfss_clean$exercise), "\n")
## Missing exercise: 1251 (0.29%)
cat("Missing age:", miss_info(brfss_clean$age_group), "\n")
## Missing age: 7779 (1.8%)
cat("Missing income:", miss_info(brfss_clean$income), "\n")
## Missing income: 86623 (19.99%)
cat("Missing education:", miss_info(brfss_clean$education), "\n")
## Missing education: 2325 (0.54%)
cat("Missing smoking:", miss_info(brfss_clean$smoking), "\n")
## Missing smoking: 23062 (5.32%)
brfss_clean <- brfss_clean %>%
  mutate(
    income = relevel(income, ref = "Less than $15,000")
  )

0.4 Analytic Sample

set.seed(1220)
brfss_analytic <- brfss_clean |>
drop_na(menthlth_days, physhlth_days, bmi, sex, exercise,
age_group, income, education, smoking) |>
slice_sample(n = 8000)

summary(brfss_analytic)
##     MENTHLTH        PHYSHLTH         _BMI5          SEXVAR         EXERANY2   
##  Min.   : 1.00   Min.   : 1.00   Min.   :1226   Min.   :1.000   Min.   :1.00  
##  1st Qu.:10.00   1st Qu.:10.00   1st Qu.:2433   1st Qu.:1.000   1st Qu.:1.00  
##  Median :88.00   Median :88.00   Median :2750   Median :2.000   Median :1.00  
##  Mean   :56.29   Mean   :57.36   Mean   :2870   Mean   :1.508   Mean   :1.22  
##  3rd Qu.:88.00   3rd Qu.:88.00   3rd Qu.:3189   3rd Qu.:2.000   3rd Qu.:1.00  
##  Max.   :88.00   Max.   :88.00   Max.   :9402   Max.   :2.000   Max.   :2.00  
##                                                                               
##     _AGEG5YR         _INCOMG1         EDUCA          _SMOKER3    
##  Min.   : 1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 5.000   1st Qu.:4.000   1st Qu.:4.000   1st Qu.:3.000  
##  Median : 8.000   Median :5.000   Median :5.000   Median :4.000  
##  Mean   : 7.696   Mean   :4.598   Mean   :5.119   Mean   :3.404  
##  3rd Qu.:11.000   3rd Qu.:6.000   3rd Qu.:6.000   3rd Qu.:4.000  
##  Max.   :13.000   Max.   :7.000   Max.   :6.000   Max.   :4.000  
##                                                                  
##  menthlth_days    physhlth_days         bmi            sex       exercise  
##  Min.   : 0.000   Min.   : 0.000   Min.   :12.26   Male  :3936   No :1760  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:24.33   Female:4064   Yes:6240  
##  Median : 0.000   Median : 0.000   Median :27.50                           
##  Mean   : 4.318   Mean   : 4.426   Mean   :28.70                           
##  3rd Qu.: 5.000   3rd Qu.: 4.000   3rd Qu.:31.89                           
##  Max.   :30.000   Max.   :30.000   Max.   :94.02                           
##                                                                            
##    age_group                  income    
##  65-69  : 901   Less than $15,000: 407  
##  70-74  : 808   $100,000–$199,999:1865  
##  60-64  : 787   $15,000–$24,999  : 641  
##  75-79  : 663   $200,000 or more : 638  
##  55-59  : 660   $25,000–$34,999  : 871  
##  80+    : 631   $35,000–$49,999  :1067  
##  (Other):3550   $50,000–$99,999  :2511  
##                             education                       smoking    
##  College graduate                :2138   Current daily smoker   : 658  
##  Graduate or professional degree :3659   Current some-day smoker: 268  
##  HS diploma or GED               : 252   Former smoker          :2262  
##  Less than HS                    : 124   Never smoker           :4812  
##  Some college or technical school:1827                                 
##                                                                        
## 

0.5 Descriptive Table

brfss_analytic %>%
  select(menthlth_days, physhlth_days, bmi,
         income, sex, exercise, age_group, education, smoking) %>%
  tbl_summary(
    label = list(
      menthlth_days ~ "Mentally unhealthy days (past 30)",
      physhlth_days ~ "Physically unhealthy days (past 30)",
      bmi ~ "body mass index",
      sex ~ "Sex",
      exercise ~ "Any physical activity (past 30 days)",
      age_group ~ "Age group in 5 year categories",
      income ~ "Annual household income",
      education ~ "Highest level of education completed",
      smoking ~ "Smoking status"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = all_continuous() ~ 1,
    missing = "no"
  ) %>%
  add_n() %>%
  # add_overall() %>%
  bold_labels() %>%
  italicize_levels() %>%
  modify_caption("**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**") %>%
  as_flex_table()
**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**

Characteristic

N

N = 8,0001

Mentally unhealthy days (past 30)

8,000

4.3 (8.1)

Physically unhealthy days (past 30)

8,000

4.4 (8.7)

body mass index

8,000

28.7 (6.5)

Annual household income

8,000

Less than $15,000

407 (5.1%)

$100,000–$199,999

1,865 (23%)

$15,000–$24,999

641 (8.0%)

$200,000 or more

638 (8.0%)

$25,000–$34,999

871 (11%)

$35,000–$49,999

1,067 (13%)

$50,000–$99,999

2,511 (31%)

Sex

8,000

Male

3,936 (49%)

Female

4,064 (51%)

Any physical activity (past 30 days)

8,000

6,240 (78%)

Age group in 5 year categories

8,000

18-24

406 (5.1%)

25-29

408 (5.1%)

30-34

463 (5.8%)

35-39

565 (7.1%)

40-44

582 (7.3%)

45-49

518 (6.5%)

50-54

608 (7.6%)

55-59

660 (8.3%)

60-64

787 (9.8%)

65-69

901 (11%)

70-74

808 (10%)

75-79

663 (8.3%)

80+

631 (7.9%)

Highest level of education completed

8,000

College graduate

2,138 (27%)

Graduate or professional degree

3,659 (46%)

HS diploma or GED

252 (3.2%)

Less than HS

124 (1.6%)

Some college or technical school

1,827 (23%)

Smoking status

8,000

Current daily smoker

658 (8.2%)

Current some-day smoker

268 (3.4%)

Former smoker

2,262 (28%)

Never smoker

4,812 (60%)

1Mean (SD); n (%)


Part 1: Multiple Linear Regression (25 points)

Research Question: What is the independent association of each predictor with the number of mentally unhealthy days in the past 30 days?

1.1 Model fit and output

model <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education +smoking, 
            data= brfss_analytic
            )
summary(model)
## 
## Call:
## lm(formula = menthlth_days ~ physhlth_days + bmi + sex + exercise + 
##     age_group + income + education + smoking, data = brfss_analytic)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.080  -3.865  -1.597   0.712  30.471 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                               11.44144    0.73022  15.669  < 2e-16
## physhlth_days                              0.26558    0.01007  26.384  < 2e-16
## bmi                                        0.03338    0.01321   2.527 0.011510
## sexFemale                                  1.39038    0.16750   8.301  < 2e-16
## exerciseYes                               -0.65116    0.21472  -3.033 0.002432
## age_group25-29                            -1.05660    0.51938  -2.034 0.041950
## age_group30-34                            -1.09395    0.50646  -2.160 0.030803
## age_group35-39                            -1.81103    0.48851  -3.707 0.000211
## age_group40-44                            -2.89307    0.48749  -5.935 3.07e-09
## age_group45-49                            -3.05618    0.49769  -6.141 8.61e-10
## age_group50-54                            -3.51674    0.48323  -7.277 3.72e-13
## age_group55-59                            -4.49597    0.47555  -9.454  < 2e-16
## age_group60-64                            -4.52215    0.45848  -9.863  < 2e-16
## age_group65-69                            -5.57795    0.45019 -12.390  < 2e-16
## age_group70-74                            -6.02536    0.45741 -13.173  < 2e-16
## age_group75-79                            -6.28656    0.47484 -13.239  < 2e-16
## age_group80+                              -6.81968    0.47684 -14.302  < 2e-16
## income$100,000–$199,999                   -3.49984    0.42923  -8.154 4.07e-16
## income$15,000–$24,999                     -1.63703    0.46899  -3.491 0.000485
## income$200,000 or more                    -3.78409    0.50036  -7.563 4.38e-14
## income$25,000–$34,999                     -2.07637    0.44797  -4.635 3.63e-06
## income$35,000–$49,999                     -2.54685    0.43819  -5.812 6.40e-09
## income$50,000–$99,999                     -3.05048    0.41069  -7.428 1.22e-13
## educationGraduate or professional degree  -0.05310    0.21116  -0.251 0.801448
## educationHS diploma or GED                -1.71101    0.49969  -3.424 0.000620
## educationLess than HS                     -1.79091    0.69119  -2.591 0.009585
## educationSome college or technical school -0.71412    0.23816  -2.998 0.002722
## smokingCurrent some-day smoker            -1.58670    0.53523  -2.965 0.003041
## smokingFormer smoker                      -1.98971    0.33713  -5.902 3.74e-09
## smokingNever smoker                       -2.93681    0.32162  -9.131  < 2e-16
##                                              
## (Intercept)                               ***
## physhlth_days                             ***
## bmi                                       *  
## sexFemale                                 ***
## exerciseYes                               ** 
## age_group25-29                            *  
## age_group30-34                            *  
## age_group35-39                            ***
## age_group40-44                            ***
## age_group45-49                            ***
## age_group50-54                            ***
## age_group55-59                            ***
## age_group60-64                            ***
## age_group65-69                            ***
## age_group70-74                            ***
## age_group75-79                            ***
## age_group80+                              ***
## income$100,000–$199,999                   ***
## income$15,000–$24,999                     ***
## income$200,000 or more                    ***
## income$25,000–$34,999                     ***
## income$35,000–$49,999                     ***
## income$50,000–$99,999                     ***
## educationGraduate or professional degree     
## educationHS diploma or GED                ***
## educationLess than HS                     ** 
## educationSome college or technical school ** 
## smokingCurrent some-day smoker            ** 
## smokingFormer smoker                      ***
## smokingNever smoker                       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.352 on 7970 degrees of freedom
## Multiple R-squared:  0.1853, Adjusted R-squared:  0.1824 
## F-statistic: 62.52 on 29 and 7970 DF,  p-value: < 2.2e-16

1.2 Fitted regression equation

\[\widehat{\text{Mental Health Days}} = 11.441 + 0.266(physhlth_days) + 0.033(bmi) + 1.390(sexFemale) + -0.651(exerciseYes) + -1.057(age_group25-29) + -1.094(age_group30-34) + -1.811(age_group35-39) + -2.893(age_group40-44)+ -3.056(age_group45-49) + -3.517(age_group50-54) + -4.496(age_group55-59) + -4.522(age_group60-64) + -5.578(age_group65-69) + -6.025(age_group70-74) + -6.287(age_group75-79) + -6.820(age_group80+) + -3.500(income$100,000-$199,999) + -1.637(income$15,000-$24,999) + -3.784(income$200,000 or more) + -2.076(income$25,000-$34,999) + -2.547(income$35,000-$49,999) + -3.050(income$50,000-$99,999) + -0.053(educationGraduate or professional degree) + -1.711(educationHS diploma or GED) + -1.791(educationLess than HS) + -0.714(educationSome college or technical school) + -1.587(smokingCurrent some-day smoker) + -1.990(smokingFormer smoker) + -2.937(smokingNever smoker)\]

1.3 Continuous coefficient interpretation

Physical health days(\(\hat{\beta}\) = 0.266) Each additional day of poor physical health is associated with an average increase of 0.266 mentally unhealthy days, holding all other variables constant

BMI(\(\hat{\beta}\) = 0.0.033) Each one-unit increase in BMI is associated with an average increase of 0.033 meantaly unhealthy days, holding all else constant.

1.4 Categorical coefficient interpretation

Sex(\(\hat{\beta}\) = 0.1.390) In comparison to males (the reference group), females have an average of 1.390 more mentally unhealthy days, holding all else constant.

Exercise(\(\hat{\beta}\) = -0.651) Individuals who engaged in any physical activity in the past 30 days had on average 0.651 less mentally unhealthy days in comparing to those who didn’t exercise, holding all other variables constant. 80+ age group(\(\hat{\beta}\) = -6.820) Individuals 80 years or older had an estimated 6.820 less mentally unhealthy days in comparison to individuals in the 18-24 age group, holding all other variables constant.

15,000-24,999 income group(\(\hat{\beta}\) = -1.637) Individuals with an annual household income of $15,000 to $24,999 have roughly 1.637 less mentally unhealthy days in comparison to those with an annual income of less than $15,000, holding all other variables constant.

200,000 or more income group(\(\hat{\beta}\) = -3.784) In comparison to individuals with an average household income of less than $15,000, those that have $200,000 or more have 3.784 less mentally unhealthy days, holding all else constant.

1.5 R-squared and Adjusted R-squared

glance(model) %>%
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  pivot_longer(everything(), names_to = "Statistic", values_to = "Value") %>%
  mutate(Statistic = dplyr::recode(Statistic,
    "r.squared"     = "R²",
    "adj.r.squared" = "Adjusted R²",
    "sigma"         = "Residual Std. Error (Root MSE)",
    "statistic"     = "F-statistic",
    "p.value"       = "p-value (overall F-test)",
    "df"            = "Model df (p)",
    "df.residual"   = "Residual df (n − p − 1)",
    "nobs"          = "n (observations)"
  )) %>%
  kable(caption = "Overall Model Summary") %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Overall Model Summary
Statistic Value
0.1853
Adjusted R² 0.1824
Residual Std. Error (Root MSE) 7.3516
F-statistic 62.5234
p-value (overall F-test) 0.0000
Model df (p) 29.0000
Residual df (n − p − 1) 7970.0000
n (observations) 8000.0000

R^2= 0.1853 The R^2 value is 0.1853, therefore approximately 18.5% of the total variance (variability) in mentally unhealthy days in explained by the predictors in the model.

Adjusted R^2= 0.1824 The adjusted R^2 is 0.1824, which is slightly smaller than the R^2. The adjusted R^2 penalizes for the amount of predictors and can decrease if a useless predictor is added to the model.

Root MSE= 7.3516 The root MSE is 7.3516, meaning on average the model’s prediction is off by 7.3516 days.

1.6 Overall F-test

H0= All regression coefficients= 0 H1= At least one regression coefficient is not equal to 0 (meaning at least one is associated with mentally unhealthy days) F-statistic= 62.5234 df(numerator)= 29 df(denominator)= 7970 p-value= 2.2e-16

The f statistic is 62.5234 and a p-value less than 0.001 which means we reject the null hypothesis. This indicates that the model is statistically significant and at least one of the predictors is associated with the number of mentally unhealthy days.

Part 2: Test of Hypotheses (20 points)

2.1 Type III SS Table

# Type III using car::Anova()
anova_type3 <- Anova(model, type = "III")

anova_type3 %>%
  as.data.frame() %>%
  rownames_to_column("Source") %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.")
Type III (Partial) Sums of Squares — car::Anova()
Source Sum of Sq df F value p-value
(Intercept) 13268.5959 1 245.5036 0.0000
physhlth_days 37622.7952 1 696.1198 0.0000
bmi 345.2408 1 6.3879 0.0115
sex 3723.8662 1 68.9012 0.0000
exercise 497.0434 1 9.1966 0.0024
age_group 30092.1774 12 46.3986 0.0000
income 4733.8943 6 14.5982 0.0000
education 1265.1504 4 5.8521 0.0001
smoking 5101.1076 3 31.4613 0.0000
Residuals 430750.0872 7970 NA NA
Note:
Each F-test uses the MSE from the full model. Each variable is tested given ALL other variables.

Using type III sum of squares, all predictors have a p-value less than 0.05, which means each variable has a statistically significant partial association with the outcome at the 0.05 level. Therefore, physically unhealthy days, BMI, sex, exercise, age_group, income, education, and smoking are statistically significant predictors in the model.

2.2 Chunk test for income: fit

# Reduced model (excludes income)
m_no_income <- lm(menthlth_days ~ physhlth_days + age_group + bmi + exercise + sex + education + smoking,
                 data = brfss_analytic)

# Full model (includes income)
m_full <- lm(menthlth_days ~ physhlth_days + age_group + bmi + sex + exercise + education + smoking + income,
             data = brfss_analytic)

# Compare using anova() — this performs the partial F-test
anova(m_no_income, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no income)", "Full (with income)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Partial F-Test: Does income add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Partial F-Test: Does income add to the model?
Model Res. df RSS df Sum of Sq F p-value
Reduced (no income) 7976 435484.0 NA NA NA NA
Full (with income) 7970 430750.1 6 4733.894 14.5982 0

2.3 Chunk test for income: inference H0= 𝛽∗= 0 (Income does not add to the model given the other variables) H1= 𝛽∗≠ 0 (Income does add to the model)

F-statistic= 14.5982 df= 6 p-value= 0

Since the p-value<0.05, we reject H0 and conclude that income adds statistically significant information to the prediction of mental health days, given that physical health, age, exercise, sex, smoking, BMI, and education are in the model.

2.4 Chunk test for education

# Reduced model (excludes income)
m_no_education <- lm(menthlth_days ~ physhlth_days + age_group + bmi + exercise + sex + income + smoking,
                 data = brfss_analytic)

# Compare using anova() — this performs the partial F-test
anova(m_no_education, m_full) %>%
  as.data.frame() %>%
  rownames_to_column("Model") %>%
  mutate(
    Model = c("Reduced (no education)", "Full (with education)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  kable(
    caption = "Partial F-Test: Does education add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Partial F-Test: Does education add to the model?
Model Res. df RSS df Sum of Sq F p-value
Reduced (no education) 7974 432015.2 NA NA NA NA
Full (with education) 7970 430750.1 4 1265.15 5.8521 1e-04

H0= 𝛽∗= 0 (Education does not add to the model given the other variables) H1= 𝛽∗≠ 0 (Education does add to the model)

F-statistic= 5.8521 df= 4 p-value= 1e-04

Since the p-value<0.05, we reject H0 and conclude that education adds statistically significant information to the prediction of mental health days, given that physical health, age, exercise, sex, smoking, BMI, and income are in the model.

2.5 Written synthesis

The type III sum of squares tests the contribution of each variable to the model given all the other variables already in the model. Based on the type III SS, all variables, including physical health, age_group, exercise, sex, smoking, BMI, income, and education, were statistically significant at the a= 0.05 level. This indicates each variable has a significant partial association with mentally unhealthy days. The two chunk tests indicated that income and education both significantly improve the model after adjusting for other variables. Both variables have p-values less than 0.05, which mean the results of each test were statistically significant. The f-statistic for income is 14.5982, which strongly suggests income adds important predictive information to the model. The f statistic for education is smaller (5.8521), but it still shows that a model with education explains more variation than the model without. In comparison to the type III sum of squares test, the chunk test compares the influence of a variable based on all categories. The type III sum of squares tests the statistical significance of each predictor (each level of income separately) while the chunk test removes all categories of a variable to allow for the overall importance of the group to be seen.

Part 3: Interaction Analysis (25 Points)

3.1 smoker_current variable

# Create binary variable
brfss_binary <- brfss_analytic %>%
  mutate(smoker_current = case_when(
    `_SMOKER3` %in% c(1, 2) ~ "Current smoker",
    `_SMOKER3` %in% c(3, 4) ~ "Non-smoker",
    TRUE ~ NA_character_
  )) %>%

 mutate(smoker_current = factor(smoker_current,
                                levels = c("Non-smoker", "Current smoker")))
summary(brfss_binary$smoker_current)
##     Non-smoker Current smoker 
##           7074            926

3.2 Model A fit

model_A <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education + smoker_current,
              data = brfss_binary)

3.3 Model B fit

model_B <- lm(menthlth_days ~ physhlth_days + bmi + sex * smoker_current + exercise + age_group + income + education,
              data = brfss_binary)

3.4 Interaction Test

# Compare the two models
anova(model_A, model_B) %>%
  as_tibble() %>%
  mutate(
    Model = c("Model A (no interaction)", 
              "Model B (sex x smoker_current)"),
    across(where(is.numeric), ~ round(., 4))
  ) %>%
  select(Model, everything()) %>%
  kable(
    caption   = "F-Test Comparing Models With and Without Interaction",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value"),
    format    = "html"
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) %>%
  footnote(general = "NA indicates the baseline model — no comparison is made against itself.")
F-Test Comparing Models With and Without Interaction
Model Res. df RSS df Sum of Sq F p-value
Model A (no interaction) 7972 432508.9 NA NA NA NA
Model B (sex x smoker_current) 7971 432174.9 1 333.9749 6.1598 0.0131
Note:
NA indicates the baseline model — no comparison is made against itself.

H0= the interaction between sex and smoking_current is not significant H1= the interaction between sex and smoking current is significant

F-statistic= 6.1598 df= 1 p-value= 0.0131

We reject H0 since the p-value<0.05 (0.0131). The interaction between sex and smoking_current is statistically significant and indicates that the association between smoking status and amount of mentally unhealthy days differs by sex.

3.5 Interaction Interpretation

tidy(model_B, conf.int = TRUE) %>%
  mutate(across(where(is.numeric), ~ round(., 4))) %>%
  kable(
    caption = "Model with Sex × Smoking Status Interaction",
    col.names = c("Term", "Estimate", "Std. Error", "t-statistic", "p-value", "95% CI Lower", "95% CI Upper")) %>%
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Model with Sex × Smoking Status Interaction
Term Estimate Std. Error t-statistic p-value 95% CI Lower 95% CI Upper
(Intercept) 8.7173 0.6962 12.5204 0.0000 7.3524 10.0821
physhlth_days 0.2686 0.0101 26.6788 0.0000 0.2489 0.2883
bmi 0.0331 0.0132 2.5017 0.0124 0.0072 0.0590
sexFemale 1.1855 0.1775 6.6784 0.0000 0.8376 1.5335
smoker_currentCurrent smoker 1.5208 0.3654 4.1621 0.0000 0.8045 2.2371
exerciseYes -0.6728 0.2150 -3.1301 0.0018 -1.0942 -0.2515
age_group25-29 -0.9202 0.5196 -1.7709 0.0766 -1.9388 0.0984
age_group30-34 -0.8924 0.5059 -1.7640 0.0778 -1.8841 0.0993
age_group35-39 -1.5929 0.4875 -3.2673 0.0011 -2.5485 -0.6372
age_group40-44 -2.6286 0.4856 -5.4127 0.0000 -3.5806 -1.6766
age_group45-49 -2.8425 0.4969 -5.7209 0.0000 -3.8165 -1.8686
age_group50-54 -3.2778 0.4820 -6.8012 0.0000 -4.2226 -2.3331
age_group55-59 -4.2499 0.4740 -8.9652 0.0000 -5.1791 -3.3206
age_group60-64 -4.2640 0.4567 -9.3364 0.0000 -5.1593 -3.3688
age_group65-69 -5.2506 0.4466 -11.7563 0.0000 -6.1261 -4.3751
age_group70-74 -5.7111 0.4544 -12.5686 0.0000 -6.6018 -4.8203
age_group75-79 -5.9076 0.4702 -12.5646 0.0000 -6.8292 -4.9859
age_group80+ -6.4995 0.4736 -13.7239 0.0000 -7.4278 -5.5711
income$100,000–$199,999 -3.5097 0.4300 -8.1623 0.0000 -4.3526 -2.6668
income$15,000–$24,999 -1.6357 0.4700 -3.4804 0.0005 -2.5570 -0.7144
income$200,000 or more -3.8405 0.5010 -7.6651 0.0000 -4.8226 -2.8583
income$25,000–$34,999 -2.0746 0.4486 -4.6243 0.0000 -2.9540 -1.1952
income$35,000–$49,999 -2.5455 0.4392 -5.7964 0.0000 -3.4064 -1.6847
income$50,000–$99,999 -3.0430 0.4116 -7.3935 0.0000 -3.8498 -2.2362
educationGraduate or professional degree -0.1488 0.2105 -0.7069 0.4797 -0.5615 0.2639
educationHS diploma or GED -1.6923 0.5005 -3.3815 0.0007 -2.6734 -0.7113
educationLess than HS -1.8179 0.6928 -2.6239 0.0087 -3.1760 -0.4598
educationSome college or technical school -0.7000 0.2385 -2.9351 0.0033 -1.1675 -0.2325
sexFemale:smoker_currentCurrent smoker 1.2833 0.5171 2.4819 0.0131 0.2697 2.2968

The coefficient for smoker_currentCurrentsmoker is 1.5208, indicating that among males, current smokers are estimated to have 1.5208 more mentally unhealthy days in comparison to non-smokers, holding all other variables constant. For females, the estimated effect of current smoking is 2.8041 (1.2833+1.5208) more mentally unhealthy days compared to non-smokers.Based on the estimates, smoking is more strongly associated with mentally unhealthy days in females than in males. There is a 1.2833 increase in mentally unhealthy days among females.

3.6 Visualization

ggpredict(model_B, terms = c("smoker_current", "sex")) %>%
  plot() +
  labs(
    title    = "Predicted Mental Health Days by Smoking Status and Sex",
    subtitle = "Interaction model: gaps are allowed to differ by sex",
    x        = "Current Smoking Status",
    y        = "Predicted Mentally Unhealthy Days",
    color    = "Sex", 
    fill = "Sex"
  ) +
  theme_minimal(base_size = 13)

3.7 Public Health Paragraph

Based on the regression model that included the interaction between sex and smoking status (model B), there is a difference by sex in the relationship between smoking and mentally unhealthy days. Current smokers tend to report more mentally unhealthy days in comparison to non-smokers in both males and females, but females experience a greater increase in mentally unhealthy days associated with smoking. In other words, smoking is more strongly associated with worse mental health outcomes among women. This difference among males and females highlights an important area for consideration when designing public health interventions that target smoking cessation and mental health.

Part 4: Reflection (15 points)

Based on the results from the multiple linear regression models, there are several determinants of mental health among U.S. adults, including physical health, sex, age, BMI, exercise, income, education, and smoking status. All predictors in the model were statistically significant, indicating that each contributes to the variation in mentally unhealthy days. Based on the F-statistics, physical health (696.1198), sex (68.9012), and age_group (46.3986) were most strongly associated with mental health outcomes, as larger F-statistics indicate a greater contribution to explaining the variance. Also, higher age groups were associated with less mentally unhealthy days, suggesting that mental health outcomes are better in older age if all other variables are held constant. Since I used cross-sectional survey data, causality cannot be established because exposure and outcome were measured at the same time, but associations can be observed between variables. In addition, confounders that were not analyzed can influence the data, such as access to healthcare, stress, and sleep duration. It is possible that higher stress levels or short sleep duration can negatively influence mental health status and predictors like physical health or exercise. Adjusted R^2 indicates the proportion of variance in the outcome explained by the model but it penalizes for the number of predictors, making it less than or equal to R^2. The adjusted R^2 can be useful when comparing different models with varying predictors since it adjusts for the differences without simply increasing as more predictors are added, like regular R^2 does. Analyzing the adjusted R^2 is especially important when adding multiple categorical predictors since each category has dummy variables, which can increase regular R^2. It is useful to test groups of predictors collectively with chunk tests because they asses if group variables significantly improve the model. Individual t-tests only analyze each group separately so whole group contribution to the outcome may be missed. It is possible that a single group is significantly associated with an outcome but overall (on a group level) there may be no association. Throughout this assignment I used AI assistance to help debug some parts of my code. During my recoding in part 0 I was receiving error codes I did not understand, but it pointed out that I was using ” then ’ instead of ` (backticks) around some of my variables. To solve this issue I input the error code I received when the chunk would not run. Also, when doing part 1 I was getting an error I could not identify myself but it turned out to be an extra comma. My prompts into AI assistance tools to debug include the error code I got, based off the error AI tends to know what went wrong. No AI was used to write my code for me or write my reflections.

Example of my debugging conversation I am using R Studio and receiving this error message Error in select(., MENTHLTH, PHYSHLTH, “_BMI5”, SEXVAR, EXERANY2, “_AGEG5YR”, : unused arguments (MENTHLTH, PHYSHLTH, “_BMI5”, SEXVAR, EXERANY2, “_AGEG5YR”, “_INCOMG1”, EDUCA, “_SMOKER3”) what am I doing wrong? AI said “Use backticks (not quotes) for variables with underscores” to which I still made the mistake of not using backticks and used apostrophes but after another message that was basically the same I realized my mistake.