Part 0: Data Acquisition and Preparation (15 Points)

Load Packages

library(tidyverse)
library(haven)
library(broom)
library(kableExtra)
library(car)
library(gtsummary)
library(ggeffects)

Import Raw Data

brfss_raw <- read_xpt("LLCP2023.XPT")
glimpse(brfss_raw)
## Rows: 433,323
## Columns: 350
## $ `_STATE`   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ FMONTH     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ IDATE      <chr> "03012023", "01062023", "03082023", "03062023", "01062023",…
## $ IMONTH     <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01",…
## $ IDAY       <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08",…
## $ IYEAR      <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "20…
## $ DISPCODE   <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100,…
## $ SEQNO      <chr> "2023000001", "2023000002", "2023000003", "2023000004", "20…
## $ `_PSU`     <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.02…
## $ CTELENM1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ PVTRESD1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ COLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATERE1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CELPHON1   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ LADULT1    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMADULT   <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ RESPSLC1   <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, …
## $ LANDSEX2   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ LNDSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SAFETIME   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CTELNUM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLFON5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CADULT1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLSEX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PVTRESD3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CCLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSTATE1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LANDLINE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HHADULT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SEXVAR     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ GENHLTH    <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3,…
## $ PHYSHLTH   <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH   <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ POORHLTH   <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1…
## $ PRIMINS1   <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ PERSDOC3   <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ MEDCOST1   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHECKUP1   <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ EXERANY2   <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ EXRACT12   <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, …
## $ EXEROFT1   <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105,…
## $ EXERHMM1   <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ EXRACT22   <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 8…
## $ EXEROFT2   <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, N…
## $ EXERHMM2   <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA…
## $ STRENGTH   <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888,…
## $ BPHIGH6    <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1,…
## $ BPMEDS1    <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1…
## $ CHOLCHK3   <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
## $ TOLDHI3    <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHOLMED3   <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ CVDINFR4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDCRHD4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDSTRK3   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,…
## $ ASTHMA3    <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ ASTHNOW    <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CHCSCNC1   <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCOCNC1   <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHCCOPD3   <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2,…
## $ ADDEPEV3   <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCKDNY2   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2,…
## $ HAVARTH4   <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ DIABETE4   <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1,…
## $ DIABAGE4   <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA,…
## $ MARITAL    <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
## $ EDUCA      <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ RENTHOM1   <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMHHOL4   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ NUMPHON4   <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CPDEMO1C   <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ VETERAN3   <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2,…
## $ EMPLOY1    <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7,…
## $ CHILDREN   <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ INCOME3    <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99…
## $ PREGNANT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WEIGHT2    <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145,…
## $ HEIGHT3    <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505,…
## $ DEAF       <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ BLIND      <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ DECIDE     <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7,…
## $ DIFFWALK   <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1,…
## $ DIFFDRES   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ DIFFALON   <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ FALL12MN   <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1,…
## $ FALLINJ5   <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, …
## $ SMOKE100   <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1,…
## $ SMOKDAY2   <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, N…
## $ USENOW3    <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ ECIGNOW2   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1,…
## $ ALCDAY4    <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888,…
## $ AVEDRNK3   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA,…
## $ DRNK3GE5   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ MAXDRNKS   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA,…
## $ FLUSHOT7   <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ FLSHTMY3   <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 1…
## $ PNEUVAC4   <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1,…
## $ SHINGLE2   <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2,…
## $ HIVTST7    <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ HIVTSTD3   <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, …
## $ SEATBELT   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKDRI2   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ COVIDPO1   <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1,…
## $ COVIDSM1   <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA…
## $ COVIDACT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, …
## $ PDIABTS1   <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8…
## $ PREDIAB2   <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3…
## $ DIABTYPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INSULIN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CHKHEMO3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EYEEXAM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEYE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEDU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FEETSORE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEXER   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEDU    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LMTJOIN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHDIS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ JOINPAI2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSFIRST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSLAST    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSNUMCG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTSC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSSCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTWHN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADMAM     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HOWLONG    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CERVSCRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLCNC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLPAP   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLHPV   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADHYST2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATEST1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATIME1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCPSARS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSASUGS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCSTALK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADSIGM4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNSIGM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIGMTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSIG4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VIRCOLO1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VCLNTES2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SMALSTOL   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOLTEST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOOLDN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BLDSTFIT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SDNATES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRDIFF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRAGE    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRTYP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVTRT3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDOC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVSUM    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVRTRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINSR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDEIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCLIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVPAIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCTL2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDORTAN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ NUMBURN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SUNPRTCT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKDAYOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKENDOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CIMEMLO1   <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2…
## $ CDWORRY    <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA,…
## $ CDDISCU1   <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDHOUS1    <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDSOCIA1   <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CAREGIV1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVREL4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVLNG1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHRS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPRB3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVALZD   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPER1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHOU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVEXPT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSMK2   <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, …
## $ STOPSMK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTCIGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTECIG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HEATTBCO   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIREARM5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GUNLOAD    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOADULK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP2    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP3    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP4    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP5    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP6    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP6   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIRSTAID   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ASPIRIN    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BIRTHSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SOMALE     <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA,…
## $ SOFEMALE   <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, …
## $ TRNSGNDR   <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ MARIJAN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJSMOK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJEAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJVAPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJDAB    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJOTHR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ USEMRJN4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDEPRS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRINK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRUGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPRISN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDIVRC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPUNCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHURT1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACESWEAR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETOUCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETTHEM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHVSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADSAF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADNED   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ IMFVPLA4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADVC4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADSHT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TETANUS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDVA1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVACGE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDNU2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LSATISFY   <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2…
## $ EMTSUPRT   <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2…
## $ SDLONELY   <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2…
## $ SDHEMPLY   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ FOODSTMP   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHFOOD1   <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5…
## $ SDHBILLS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHUTILS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHTRNSP   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHSTRE1   <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4…
## $ RRCLASS3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRCOGNT2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRTREAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRATWRK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRHCARE4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRPHYSM2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSGEND1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSXBRTH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSRLTN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHDX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHNO2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ QSTVER     <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ QSTLANG    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_METSTAT` <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2,…
## $ `_URBSTAT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MSCODE     <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1,…
## $ `_STSTR`   <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 110…
## $ `_STRWT`   <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158…
## $ `_RAWRAKE` <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ `_WT2RAKE` <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316…
## $ `_IMPRACE` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_CHISPNC` <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ `_CRACE1`  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CAGEG      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_CLLCPWT` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_DUALUSE` <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DUALCOR` <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, …
## $ `_LLCPWT2` <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.164…
## $ `_LLCPWT`  <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.335…
## $ `_RFHLTH`  <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,…
## $ `_PHYS14D` <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1,…
## $ `_MENT14D` <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1,…
## $ `_HLTHPL1` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_HCVU653` <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9,…
## $ `_TOTINDA` <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ METVL12_   <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA,…
## $ METVL22_   <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ MAXVO21_   <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875,…
## $ FC601_     <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328,…
## $ ACTIN13_   <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, …
## $ ACTIN23_   <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, N…
## $ PADUR1_    <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ PADUR2_    <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ PAFREQ1_   <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, N…
## $ PAFREQ2_   <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA,…
## $ `_MINAC12` <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA…
## $ `_MINAC22` <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0,…
## $ STRFREQ_   <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000,…
## $ PAMISS3_   <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1,…
## $ PAMIN13_   <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, …
## $ PAMIN23_   <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, N…
## $ PA3MIN_    <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, …
## $ PAVIG13_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, …
## $ PAVIG23_   <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA,…
## $ PA3VIGM_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA…
## $ `_PACAT3`  <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9,…
## $ `_PAINDX3` <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ `_PA150R4` <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1,…
## $ `_PA300R4` <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9,…
## $ `_PA30023` <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9,…
## $ `_PASTRNG` <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2,…
## $ `_PAREC3`  <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2,…
## $ `_PASTAE3` <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFHYPE6` <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHOLCH3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFCHOL3` <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2,…
## $ `_MICHD`   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_LTASTH1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ `_CASTHM1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1,…
## $ `_ASTHMS1` <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3,…
## $ `_DRDXAR2` <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ `_MRACE1`  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_HISPANC` <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_RACE`    <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEG21` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEGR3` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEPRV` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_SEX`     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_AGE65YR` <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ `_AGE80`   <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79,…
## $ `_AGE_G`   <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6,…
## $ HTIN4      <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62,…
## $ HTM4       <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165,…
## $ WTKG3      <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257,…
## $ `_BMI5`    <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296,…
## $ `_BMI5CAT` <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3,…
## $ `_RFBMI5`  <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHLDCNT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_EDUCAG`  <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2,…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ `_SMOKER3` <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3,…
## $ `_RFSMOK3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ `_CURECI2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKANY6   <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1,…
## $ DROCDY4_   <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 3…
## $ `_RFBING6` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKWK2` <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, …
## $ `_RFDRHV8` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_FLSHOT7` <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2,…
## $ `_PNEUMO3` <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1,…
## $ `_AIDTST4` <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFSEAT2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFSEAT3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKDRV` <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2,…
brfss_selected <- brfss_raw |>
  select(
    MENTHLTH,
    PHYSHLTH,
    `_BMI5`,
    SEXVAR,
    EXERANY2,
    `_AGEG5YR`,
    `_INCOMG1`,
    EDUCA,
    `_SMOKER3`
  )

Recode Variables

brfss_clean <- brfss_selected |>
  mutate(
    # Outcome: mentally unhealthy days
    menthlth_days = case_when(
      MENTHLTH == 88                    ~ 0,
      MENTHLTH >= 1 & MENTHLTH <= 30   ~ as.numeric(MENTHLTH),
      TRUE                              ~ NA_real_
    ),

    # Physically unhealthy days
    physhlth_days = case_when(
      PHYSHLTH == 88                    ~ 0,
      PHYSHLTH >= 1 & PHYSHLTH <= 30   ~ as.numeric(PHYSHLTH),
      TRUE                              ~ NA_real_
    ),

    # BMI: divide by 100; 9999 = NA
    bmi = case_when(
      `_BMI5` == 9999 ~ NA_real_,
      TRUE            ~ as.numeric(`_BMI5`) / 100
    ),

    # Sex
    sex = factor(case_when(
      SEXVAR == 1 ~ "Male",
      SEXVAR == 2 ~ "Female",
      TRUE        ~ NA_character_
    ), levels = c("Male", "Female")),

    # Exercise
    exercise = factor(case_when(
      EXERANY2 == 1 ~ "Yes",
      EXERANY2 == 2 ~ "No",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),

    # Age group
    age_group = factor(case_when(
      `_AGEG5YR` == 1  ~ "18-24",
      `_AGEG5YR` == 2  ~ "25-29",
      `_AGEG5YR` == 3  ~ "30-34",
      `_AGEG5YR` == 4  ~ "35-39",
      `_AGEG5YR` == 5  ~ "40-44",
      `_AGEG5YR` == 6  ~ "45-49",
      `_AGEG5YR` == 7  ~ "50-54",
      `_AGEG5YR` == 8  ~ "55-59",
      `_AGEG5YR` == 9  ~ "60-64",
      `_AGEG5YR` == 10 ~ "65-69",
      `_AGEG5YR` == 11 ~ "70-74",
      `_AGEG5YR` == 12 ~ "75-79",
      `_AGEG5YR` == 13 ~ "80+",
      TRUE             ~ NA_character_
    ), levels = c("18-24","25-29","30-34","35-39","40-44",
                  "45-49","50-54","55-59","60-64","65-69",
                  "70-74","75-79","80+")),

    # Income
    income = factor(case_when(
      `_INCOMG1` == 1 ~ "Less than $15,000",
      `_INCOMG1` == 2 ~ "$15,000-$24,999",
      `_INCOMG1` == 3 ~ "$25,000-$34,999",
      `_INCOMG1` == 4 ~ "$35,000-$49,999",
      `_INCOMG1` == 5 ~ "$50,000-$99,999",
      `_INCOMG1` == 6 ~ "$100,000-$199,999",
      `_INCOMG1` == 7 ~ "$200,000 or more",
      TRUE            ~ NA_character_
    ), levels = c("Less than $15,000", "$15,000-$24,999",
                  "$25,000-$34,999",  "$35,000-$49,999",
                  "$50,000-$99,999",  "$100,000-$199,999",
                  "$200,000 or more")),

    # Education
    education = factor(case_when(
      EDUCA %in% c(1, 2) ~ "Less than high school",
      EDUCA == 3         ~ "High school diploma or GED",
      EDUCA == 4         ~ "Some college or technical school",
      EDUCA == 5         ~ "College graduate",
      EDUCA == 6         ~ "Graduate or professional degree",
      TRUE               ~ NA_character_
    ), levels = c("Less than high school",
                  "High school diploma or GED",
                  "Some college or technical school",
                  "College graduate",
                  "Graduate or professional degree")),

    # Smoking status
    smoking = factor(case_when(
      `_SMOKER3` == 1 ~ "Current daily smoker",
      `_SMOKER3` == 2 ~ "Current some-day smoker",
      `_SMOKER3` == 3 ~ "Former smoker",
      `_SMOKER3` == 4 ~ "Never smoker",
      TRUE            ~ NA_character_
    ), levels = c("Current daily smoker", "Current some-day smoker",
                  "Former smoker", "Never smoker"))
  )

Report Missing Values

tibble(
  Variable = c("menthlth_days", "physhlth_days", "bmi",
               "sex", "exercise", "smoking"),
  `N Missing` = c(
    sum(is.na(brfss_clean$menthlth_days)),
    sum(is.na(brfss_clean$physhlth_days)),
    sum(is.na(brfss_clean$bmi)),
    sum(is.na(brfss_clean$sex)),
    sum(is.na(brfss_clean$exercise)),
    sum(is.na(brfss_clean$smoking))
  ),
  `% Missing` = round(c(
    mean(is.na(brfss_clean$menthlth_days)),
    mean(is.na(brfss_clean$physhlth_days)),
    mean(is.na(brfss_clean$bmi)),
    mean(is.na(brfss_clean$sex)),
    mean(is.na(brfss_clean$exercise)),
    mean(is.na(brfss_clean$smoking))
  ) * 100, 2)
) |>
  kable(caption = "Table 0a. Missing Values Before Exclusions") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 0a. Missing Values Before Exclusions
Variable N Missing % Missing
menthlth_days 8108 1.87
physhlth_days 10785 2.49
bmi 40535 9.35
sex 0 0.00
exercise 1251 0.29
smoking 23062 5.32

Create Analytic Dataset

set.seed(1220)
brfss_analytic <- brfss_clean |>
  drop_na(menthlth_days, physhlth_days, bmi, sex, exercise,
          age_group, income, education, smoking) |>
  slice_sample(n = 8000)

cat("Final analytic n:", nrow(brfss_analytic), "\n")
## Final analytic n: 8000

Descriptive Statistics Table

brfss_analytic |>
  dplyr::select(menthlth_days, physhlth_days, bmi, exercise,
         age_group, income, education, smoking) |>
  tbl_summary(
    by = NULL,
    label = list(
      menthlth_days ~ "Mentally unhealthy days (past 30)",
      physhlth_days ~ "Physically unhealthy days (past 30)",
      bmi           ~ "Body mass index (kg/m^2^)",
      exercise      ~ "Any exercise (past 30 days)",
      age_group     ~ "Age group",
      income        ~ "Annual household income",
      education     ~ "Education level",
      smoking       ~ "Smoking status"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits  = all_continuous() ~ 1,
    missing = "no"
  ) |>
  bold_labels()
Characteristic N = 8,0001
Mentally unhealthy days (past 30) 4.3 (8.1)
Physically unhealthy days (past 30) 4.4 (8.7)
Body mass index (kg/m^2^) 28.7 (6.5)
Any exercise (past 30 days) 6,240 (78%)
Age group
    18-24 406 (5.1%)
    25-29 408 (5.1%)
    30-34 463 (5.8%)
    35-39 565 (7.1%)
    40-44 582 (7.3%)
    45-49 518 (6.5%)
    50-54 608 (7.6%)
    55-59 660 (8.3%)
    60-64 787 (9.8%)
    65-69 901 (11%)
    70-74 808 (10%)
    75-79 663 (8.3%)
    80+ 631 (7.9%)
Annual household income
    Less than $15,000 407 (5.1%)
    $15,000-$24,999 641 (8.0%)
    $25,000-$34,999 871 (11%)
    $35,000-$49,999 1,067 (13%)
    $50,000-$99,999 2,511 (31%)
    $100,000-$199,999 1,865 (23%)
    $200,000 or more 638 (8.0%)
Education level
    Less than high school 124 (1.6%)
    High school diploma or GED 252 (3.2%)
    Some college or technical school 1,827 (23%)
    College graduate 2,138 (27%)
    Graduate or professional degree 3,659 (46%)
Smoking status
    Current daily smoker 658 (8.2%)
    Current some-day smoker 268 (3.4%)
    Former smoker 2,262 (28%)
    Never smoker 4,812 (60%)
1 Mean (SD); n (%)
# Stratified by sex
brfss_analytic |>
  dplyr::select(menthlth_days, physhlth_days, bmi, exercise,
         age_group, income, education, smoking, sex) |>
  tbl_summary(
    by = sex,
    label = list(
      menthlth_days ~ "Mentally unhealthy days (past 30)",
      physhlth_days ~ "Physically unhealthy days (past 30)",
      bmi           ~ "Body mass index (kg/m^2^)",
      exercise      ~ "Any exercise (past 30 days)",
      age_group     ~ "Age group",
      income        ~ "Annual household income",
      education     ~ "Education level",
      smoking       ~ "Smoking status"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits  = all_continuous() ~ 1,
    missing = "no"
  ) |>
  bold_labels() |>
  modify_caption("**Table 1. Descriptive Statistics by Sex -- BRFSS 2023 (n = 8,000)**")
Table 1. Descriptive Statistics by Sex – BRFSS 2023 (n = 8,000)
Characteristic Male
N = 3,936
1
Female
N = 4,064
1
Mentally unhealthy days (past 30) 3.5 (7.5) 5.1 (8.6)
Physically unhealthy days (past 30) 4.0 (8.4) 4.9 (8.9)
Body mass index (kg/m^2^) 28.7 (6.0) 28.7 (7.0)
Any exercise (past 30 days) 3,146 (80%) 3,094 (76%)
Age group

    18-24 235 (6.0%) 171 (4.2%)
    25-29 219 (5.6%) 189 (4.7%)
    30-34 253 (6.4%) 210 (5.2%)
    35-39 263 (6.7%) 302 (7.4%)
    40-44 290 (7.4%) 292 (7.2%)
    45-49 266 (6.8%) 252 (6.2%)
    50-54 305 (7.7%) 303 (7.5%)
    55-59 308 (7.8%) 352 (8.7%)
    60-64 408 (10%) 379 (9.3%)
    65-69 418 (11%) 483 (12%)
    70-74 382 (9.7%) 426 (10%)
    75-79 325 (8.3%) 338 (8.3%)
    80+ 264 (6.7%) 367 (9.0%)
Annual household income

    Less than $15,000 160 (4.1%) 247 (6.1%)
    $15,000-$24,999 271 (6.9%) 370 (9.1%)
    $25,000-$34,999 376 (9.6%) 495 (12%)
    $35,000-$49,999 482 (12%) 585 (14%)
    $50,000-$99,999 1,251 (32%) 1,260 (31%)
    $100,000-$199,999 996 (25%) 869 (21%)
    $200,000 or more 400 (10%) 238 (5.9%)
Education level

    Less than high school 75 (1.9%) 49 (1.2%)
    High school diploma or GED 130 (3.3%) 122 (3.0%)
    Some college or technical school 950 (24%) 877 (22%)
    College graduate 1,018 (26%) 1,120 (28%)
    Graduate or professional degree 1,763 (45%) 1,896 (47%)
Smoking status

    Current daily smoker 339 (8.6%) 319 (7.8%)
    Current some-day smoker 151 (3.8%) 117 (2.9%)
    Former smoker 1,207 (31%) 1,055 (26%)
    Never smoker 2,239 (57%) 2,573 (63%)
1 Mean (SD); n (%)

Part 1: Multiple Linear Regression (25 Points)

Step 1: Fit the Full Model

model_full <- lm(
  menthlth_days ~ physhlth_days + bmi + sex + exercise +
    age_group + income + education + smoking,
  data = brfss_analytic
)

summary(model_full)
## 
## Call:
## lm(formula = menthlth_days ~ physhlth_days + bmi + sex + exercise + 
##     age_group + income + education + smoking, data = brfss_analytic)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.080  -3.865  -1.597   0.712  30.471 
## 
## Coefficients:
##                                           Estimate Std. Error t value Pr(>|t|)
## (Intercept)                                9.65053    0.95407  10.115  < 2e-16
## physhlth_days                              0.26558    0.01007  26.384  < 2e-16
## bmi                                        0.03338    0.01321   2.527 0.011510
## sexFemale                                  1.39038    0.16750   8.301  < 2e-16
## exerciseYes                               -0.65116    0.21472  -3.033 0.002432
## age_group25-29                            -1.05660    0.51938  -2.034 0.041950
## age_group30-34                            -1.09395    0.50646  -2.160 0.030803
## age_group35-39                            -1.81103    0.48851  -3.707 0.000211
## age_group40-44                            -2.89307    0.48749  -5.935 3.07e-09
## age_group45-49                            -3.05618    0.49769  -6.141 8.61e-10
## age_group50-54                            -3.51674    0.48323  -7.277 3.72e-13
## age_group55-59                            -4.49597    0.47555  -9.454  < 2e-16
## age_group60-64                            -4.52215    0.45848  -9.863  < 2e-16
## age_group65-69                            -5.57795    0.45019 -12.390  < 2e-16
## age_group70-74                            -6.02536    0.45741 -13.173  < 2e-16
## age_group75-79                            -6.28656    0.47484 -13.239  < 2e-16
## age_group80+                              -6.81968    0.47684 -14.302  < 2e-16
## income$15,000-$24,999                     -1.63703    0.46899  -3.491 0.000485
## income$25,000-$34,999                     -2.07637    0.44797  -4.635 3.63e-06
## income$35,000-$49,999                     -2.54685    0.43819  -5.812 6.40e-09
## income$50,000-$99,999                     -3.05048    0.41069  -7.428 1.22e-13
## income$100,000-$199,999                   -3.49984    0.42923  -8.154 4.07e-16
## income$200,000 or more                    -3.78409    0.50036  -7.563 4.38e-14
## educationHigh school diploma or GED        0.07991    0.81066   0.099 0.921484
## educationSome college or technical school  1.07679    0.68973   1.561 0.118520
## educationCollege graduate                  1.79091    0.69119   2.591 0.009585
## educationGraduate or professional degree   1.73781    0.69250   2.509 0.012111
## smokingCurrent some-day smoker            -1.58670    0.53523  -2.965 0.003041
## smokingFormer smoker                      -1.98971    0.33713  -5.902 3.74e-09
## smokingNever smoker                       -2.93681    0.32162  -9.131  < 2e-16
##                                              
## (Intercept)                               ***
## physhlth_days                             ***
## bmi                                       *  
## sexFemale                                 ***
## exerciseYes                               ** 
## age_group25-29                            *  
## age_group30-34                            *  
## age_group35-39                            ***
## age_group40-44                            ***
## age_group45-49                            ***
## age_group50-54                            ***
## age_group55-59                            ***
## age_group60-64                            ***
## age_group65-69                            ***
## age_group70-74                            ***
## age_group75-79                            ***
## age_group80+                              ***
## income$15,000-$24,999                     ***
## income$25,000-$34,999                     ***
## income$35,000-$49,999                     ***
## income$50,000-$99,999                     ***
## income$100,000-$199,999                   ***
## income$200,000 or more                    ***
## educationHigh school diploma or GED          
## educationSome college or technical school    
## educationCollege graduate                 ** 
## educationGraduate or professional degree  *  
## smokingCurrent some-day smoker            ** 
## smokingFormer smoker                      ***
## smokingNever smoker                       ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.352 on 7970 degrees of freedom
## Multiple R-squared:  0.1853, Adjusted R-squared:  0.1824 
## F-statistic: 62.52 on 29 and 7970 DF,  p-value: < 2.2e-16

Step 2: Fitted Regression Equation

tidy(model_full, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption   = "Table 2. Full Model Coefficients",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 2. Full Model Coefficients
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 9.6505 0.9541 10.1151 0.0000 7.7803 11.5208
physhlth_days 0.2656 0.0101 26.3841 0.0000 0.2459 0.2853
bmi 0.0334 0.0132 2.5274 0.0115 0.0075 0.0593
sexFemale 1.3904 0.1675 8.3007 0.0000 1.0620 1.7187
exerciseYes -0.6512 0.2147 -3.0326 0.0024 -1.0721 -0.2303
age_group25-29 -1.0566 0.5194 -2.0343 0.0420 -2.0747 -0.0385
age_group30-34 -1.0939 0.5065 -2.1600 0.0308 -2.0867 -0.1012
age_group35-39 -1.8110 0.4885 -3.7072 0.0002 -2.7686 -0.8534
age_group40-44 -2.8931 0.4875 -5.9346 0.0000 -3.8487 -1.9375
age_group45-49 -3.0562 0.4977 -6.1408 0.0000 -4.0318 -2.0806
age_group50-54 -3.5167 0.4832 -7.2775 0.0000 -4.4640 -2.5695
age_group55-59 -4.4960 0.4755 -9.4543 0.0000 -5.4282 -3.5638
age_group60-64 -4.5221 0.4585 -9.8633 0.0000 -5.4209 -3.6234
age_group65-69 -5.5779 0.4502 -12.3903 0.0000 -6.4604 -4.6955
age_group70-74 -6.0254 0.4574 -13.1728 0.0000 -6.9220 -5.1287
age_group75-79 -6.2866 0.4748 -13.2392 0.0000 -7.2174 -5.3557
age_group80+ -6.8197 0.4768 -14.3019 0.0000 -7.7544 -5.8850
income$15,000-$24,999 -1.6370 0.4690 -3.4905 0.0005 -2.5564 -0.7177
income$25,000-$34,999 -2.0764 0.4480 -4.6351 0.0000 -2.9545 -1.1982
income$35,000-$49,999 -2.5469 0.4382 -5.8122 0.0000 -3.4058 -1.6879
income$50,000-$99,999 -3.0505 0.4107 -7.4277 0.0000 -3.8555 -2.2454
income$100,000-$199,999 -3.4998 0.4292 -8.1537 0.0000 -4.3413 -2.6584
income$200,000 or more -3.7841 0.5004 -7.5628 0.0000 -4.7649 -2.8033
educationHigh school diploma or GED 0.0799 0.8107 0.0986 0.9215 -1.5092 1.6690
educationSome college or technical school 1.0768 0.6897 1.5612 0.1185 -0.2753 2.4288
educationCollege graduate 1.7909 0.6912 2.5911 0.0096 0.4360 3.1458
educationGraduate or professional degree 1.7378 0.6925 2.5095 0.0121 0.3803 3.0953
smokingCurrent some-day smoker -1.5867 0.5352 -2.9645 0.0030 -2.6359 -0.5375
smokingFormer smoker -1.9897 0.3371 -5.9020 0.0000 -2.6506 -1.3289
smokingNever smoker -2.9368 0.3216 -9.1313 0.0000 -3.5673 -2.3063
b <- round(coef(model_full), 3)

Fitted equation (selected key terms – full equation includes all dummy variable terms):

\[\widehat{\text{menthlth\_days}} = \hat{\beta}_0 + 0.266(\text{physhlth\_days}) + 0.033(\text{BMI}) + 1.390(\text{Female}) + -0.651(\text{Exercise: Yes}) + \cdots\]

The full equation includes all category-specific dummy coefficients for age_group, income, education, and smoking as shown in Table 2.

Step 3: Coefficient Interpretations

# Extract key coefficients for interpretation
b_phys    <- round(coef(model_full)["physhlth_days"], 3)
b_bmi     <- round(coef(model_full)["bmi"], 3)
b_female  <- round(coef(model_full)["sexFemale"], 3)
b_exer    <- round(coef(model_full)["exerciseYes"], 3)

physhlth_days (0.266): For each additional day of poor physical health the expected number of mentally unhealthy days increases by 0.266 days on average, holding all other variables constant. This positive association is the strongest continuous predictor in the model; reflects the well-established link between physical and mental health.

bmi (0.033): For one unit increase in BMI (kg/m2), the expected number of mentally unhealthy days increases by 0.033 days on average, holding all other variables constant. The association is statistically significant (p = 0.012) and small in practical terms. A 10 unit increase in BMI would correspond to only approximately one-third of an additional mentally unhealthy day. This suggests that BMI’s contribution to mental health burden is modest (after adjusting for physical health days and other covariates).

sex: Female vs. Male (1.390): Compared to males, females report an estimated 1.390 more mentally unhealthy days on average, holding all other variables constant.

exercise: Yes vs. No (-0.651): Compared to those who did not exercise in the past 30 days, those who did exercise report an estimated 0.651 fewer mentally unhealthy days on average, holding all other variables constant.

One age_group coefficient – age 55-59 (-4.496): Compared to adults aged 18-24 (the reference group), those aged 55-59 report an estimated 4.50 fewer mentally unhealthy days on average, holding all other variables constant. This large negative coefficient is consistent with middle-aged and older adults having better emotional regulation and fewer days of psychological distress than younger adults. And this is despite having a worse physical fitness.

Two income coefficients: The coefficient for the $100,000-$199,999 income group is -3.50. This means that compared to those earning less than $15,000 per year (the reference group) individuals in this income bracket report an estimated 3.50 fewer mentally unhealthy days on average, holding all other variables constant. The coefficient for $200,000 group or more is -3.784. This is indicating that the highest earners report approximately 3.78 fewer mentally unhealthy days compared to the lowest income group, holding all other variables constant. Together these two coefficients demonstrate the strong inverse income gradient in mental health as higher income is associated with meaningfully better mental health.

Step 4: Model-Level Statistics

glance(model_full) |>
  dplyr::select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) |>
  pivot_longer(everything(), names_to = "Metric", values_to = "Value") |>
  mutate(
    Value  = round(Value, 4),
    Metric = dplyr::recode(Metric,
      "r.squared"     = "R-squared",
      "adj.r.squared" = "Adjusted R-squared",
      "sigma"         = "Residual Standard Error",
      "statistic"     = "F-statistic (overall)",
      "p.value"       = "p-value (overall F-test)",
      "df"            = "Model df",
      "df.residual"   = "Residual df",
      "nobs"          = "N"
    )
  ) |>
  kable(caption = "Table 3. Overall Model Summary") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 3. Overall Model Summary
Metric Value
R-squared 0.1853
Adjusted R-squared 0.1824
Residual Standard Error 7.3516
F-statistic (overall) 62.5234
p-value (overall F-test) 0.0000
Model df 29.0000
Residual df 7970.0000
N 8000.0000
r2     <- round(summary(model_full)$r.squared, 4)
adj_r2 <- round(summary(model_full)$adj.r.squared, 4)
rse    <- round(summary(model_full)$sigma, 3)
f_stat <- summary(model_full)$fstatistic

R-squared = 0.1853: The model explains 18.53% of the variance in mentally unhealthy days.This indicates that a big proportion of variation in mental health is unexplained by the included predictors, despite statistical significance.

Adjusted R-squared = 0.1824: The Adjusted R-squared penalizes for the number of predictors in the model. It is lower than R-squared (0.1853 vs. 0.1824) and reflects the adjustment for model complexity. The small difference indicates that the included predictors are contributing to the explanatory power of the model rather than inflating R-squared through the addition of uninformative variables.

Residual Standard Error = 7.352 days: On average, the predictions of the model are off by approximately 7.352 days. Given that mentally unhealthy days range from 0 to 30, this represents a meaningful prediction error. It also reinforces that individual mental health is difficult to predict from survey based predictors alone.

Overall F-test: H0: all regression coefficients equal zero. None of the predictors is associated with mentally unhealthy days. F(29, 7970) = 62.52, p < 0.0001. We reject H0 and conclude that the model as a whole explains a statistically significant amount of variation in mentally unhealthy days.


Part 2: Tests of Hypotheses (20 Points)

Step 1: Type III Partial F-Tests

anova_type3 <- Anova(model_full, type = "III")

anova_type3 |>
  as.data.frame() |>
  rownames_to_column("Source") |>
  mutate(
    Significant = ifelse(`Pr(>F)` < 0.05, "Yes", "No"),
    across(where(is.numeric), \(x) round(x, 4))
  ) |>
  kable(
    caption   = "Table 4. Type III Partial F-Tests -- Full Model",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value", "Significant (a=0.05)")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) |>
  column_spec(6, bold = TRUE)
Table 4. Type III Partial F-Tests – Full Model
Source Sum of Sq df F value p-value Significant (a=0.05)
(Intercept) 5529.7722 1 102.3152 0.0000 Yes
physhlth_days 37622.7952 1 696.1198 0.0000 Yes
bmi 345.2408 1 6.3879 0.0115 Yes
sex 3723.8662 1 68.9012 0.0000 Yes
exercise 497.0434 1 9.1966 0.0024 Yes
age_group 30092.1774 12 46.3986 0.0000 Yes
income 4733.8943 6 14.5982 0.0000 Yes
education 1265.1504 4 5.8521 0.0001 Yes
smoking 5101.1076 3 31.4613 0.0000 Yes
Residuals 430750.0872 7970 NA NA NA

Significant predictors at alpha = 0.05: All eight predictors in the model are statistically significant at alpha = 0.05: physhlth_days (F = 696.12, p < 0.0001), bmi (F = 6.39, p = 0.0115), sex (F = 68.90, p < 0.0001), exercise (F = 9.20, p = 0.0024), age_group (F = 46.40, p < 0.0001), income (F = 14.60, p < 0.0001), education (F = 5.85, p = 0.0001), and smoking (F = 31.46, p < 0.0001). Every predictor included in the model contributes significantly to the prediction of mentally unhealthy days after adjusting for all other variables.

Step 2: Chunk Test for Income

# Reduced model: all predictors except income
model_no_income <- lm(
  menthlth_days ~ physhlth_days + bmi + sex + exercise +
    age_group + education + smoking,
  data = brfss_analytic
)

anova(model_no_income, model_full) |>
  as.data.frame() |>
  rownames_to_column("Model") |>
  mutate(
    Model = c("Reduced (no income)", "Full (with income)"),
    across(where(is.numeric), \(x) round(x, 4))
  ) |>
  kable(
    caption   = "Table 5. Chunk Test: Does Income Add to the Model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 5. Chunk Test: Does Income Add to the Model?
Model Res. df RSS df Sum of Sq F p-value
Reduced (no income) 7976 435484.0 NA NA NA NA
Full (with income) 7970 430750.1 6 4733.894 14.5982 0

H0: \(\beta_{\$15k-\$24.9k} = \beta_{\$25k-\$34.9k} = \beta_{\$35k-\$49.9k} = \beta_{\$50k-\$99.9k} = \beta_{\$100k-\$199.9k} = \beta_{\$200k+} = 0\) –> income as a group does not improve prediction of mentally unhealthy days after adjusting for all other predictors.

HA: At least one income coefficient differs from zero –> income adds predictive information.

F(6, 7970) = 14.60, p < 0.0001. Since p < 0.05, we reject H0 and conclude that income as a group significantly improves the prediction of mentally unhealthy days after adjusting for physical health, BMI, sex, exercise, age, education, and smoking. The six income dummy variables explain a statistically meaningful amount of additional variance in mental health besides what the other predictors already capture.

Step 3: Chunk Test for Education

# Reduced model: all predictors except education
model_no_educ <- lm(
  menthlth_days ~ physhlth_days + bmi + sex + exercise +
    age_group + income + smoking,
  data = brfss_analytic
)

anova(model_no_educ, model_full) |>
  as.data.frame() |>
  rownames_to_column("Model") |>
  mutate(
    Model = c("Reduced (no education)", "Full (with education)"),
    across(where(is.numeric), \(x) round(x, 4))
  ) |>
  kable(
    caption   = "Table 6. Chunk Test: Does Education Add to the Model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 6. Chunk Test: Does Education Add to the Model?
Model Res. df RSS df Sum of Sq F p-value
Reduced (no education) 7974 432015.2 NA NA NA NA
Full (with education) 7970 430750.1 4 1265.15 5.8521 1e-04

H0: All education dummy coefficients equal zero –> education as a group does not improve prediction of mentally unhealthy days after adjusting for all other predictors.

HA: At least one education coefficient differs from zero –> education collectively adds predictive information.

F(4, 7970) = 5.85, p = 0.0001. Since p < 0.05, we reject H0 and conclude that education as a group significantly improves the prediction of mentally unhealthy days after adjusting for all other predictors. it;s important to note that the individual t-tests in Table 2 showed that the first two education categories (High school diploma or GED; Some college or technical school) were not significant individually compared to the “Less than high school” reference. However, the chunk test shows that the four education dummies together explain a meaningful amount of additional variance –> this is why chunk tests are necessary and why relying only on individual t-tests can be misleading.

Step 4: Written Synthesis

The Type III partial F-tests identify which predictors are independently associated with mentally unhealthy days after adjusting for all other variables; significance of each assessed after removing confounding by the full set of covariates. Physical health days and smoking status consistently appear as the strongest individual predictors (large F-statistics and highly significant p-values). The chunk tests for income and education provide an important addition to the individual t-tests from summary(). Even if some individual income or education category comparisons do not reach significance on their own (small cell sizes, overlapping confidence intervals) the chunk test evaluates whether the entire categorical variable contributes meaningfully as a unit. A categorical variable with six or seven levels can matter even if no single coefficient stands out. Additionally, the chunk test is the appropriate way to find that out. The individual t-tests answer “does this specific category differ from the reference?” while the chunk test answers “does this variable matter at all?”


Part 3: Interaction Analysis (25 Points)

Step 1: Create Binary Smoking Variable

brfss_analytic <- brfss_analytic |>
  mutate(
    smoker_current = factor(case_when(
      smoking %in% c("Current daily smoker", "Current some-day smoker") ~ "Current smoker",
      smoking %in% c("Former smoker", "Never smoker")                   ~ "Non-smoker",
      TRUE                                                               ~ NA_character_
    ), levels = c("Non-smoker", "Current smoker"))
  )

brfss_analytic |>
  count(smoker_current) |>
  mutate(`%` = round(n / sum(n) * 100, 1)) |>
  kable(
    caption   = "Table 7. Distribution of Binary Smoking Variable",
    col.names = c("Smoking Status", "N", "%")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 7. Distribution of Binary Smoking Variable
Smoking Status N %
Non-smoker 7074 88.4
Current smoker 926 11.6

Step 2: Fit Model A (Main Effects) and Model B (With Interaction)

# Model A: main effects only
model_A <- lm(
  menthlth_days ~ physhlth_days + bmi + sex + smoker_current +
    exercise + age_group + income + education,
  data = brfss_analytic
)

# Model B: with sex * smoker_current interaction
model_B <- lm(
  menthlth_days ~ physhlth_days + bmi + sex * smoker_current +
    exercise + age_group + income + education,
  data = brfss_analytic
)

tidy(model_A, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption   = "Table 8. Model A: Main Effects Only",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 8. Model A: Main Effects Only
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.7553 0.9271 7.2869 0.0000 4.9381 8.5726
physhlth_days 0.2686 0.0101 26.6731 0.0000 0.2489 0.2884
bmi 0.0334 0.0132 2.5250 0.0116 0.0075 0.0593
sexFemale 1.3331 0.1673 7.9675 0.0000 1.0051 1.6611
smoker_currentCurrent smoker 2.1287 0.2712 7.8489 0.0000 1.5971 2.6604
exerciseYes -0.6725 0.2150 -3.1274 0.0018 -1.0940 -0.2510
age_group25-29 -0.9149 0.5198 -1.7602 0.0784 -1.9338 0.1040
age_group30-34 -0.8823 0.5061 -1.7434 0.0813 -1.8743 0.1097
age_group35-39 -1.5810 0.4877 -3.2421 0.0012 -2.5369 -0.6251
age_group40-44 -2.6157 0.4858 -5.3847 0.0000 -3.5680 -1.6635
age_group45-49 -2.8246 0.4970 -5.6836 0.0000 -3.7988 -1.8504
age_group50-54 -3.2600 0.4821 -6.7628 0.0000 -4.2050 -2.3151
age_group55-59 -4.2301 0.4741 -8.9219 0.0000 -5.1595 -3.3007
age_group60-64 -4.2484 0.4568 -9.3000 0.0000 -5.1439 -3.3529
age_group65-69 -5.2338 0.4467 -11.7163 0.0000 -6.1095 -4.3582
age_group70-74 -5.7023 0.4545 -12.5457 0.0000 -6.5933 -4.8113
age_group75-79 -5.8977 0.4703 -12.5401 0.0000 -6.8197 -4.9758
age_group80+ -6.4888 0.4737 -13.6975 0.0000 -7.4174 -5.5602
income$15,000-$24,999 -1.6797 0.4698 -3.5754 0.0004 -2.6007 -0.7588
income$25,000-$34,999 -2.1023 0.4486 -4.6861 0.0000 -2.9817 -1.2229
income$35,000-$49,999 -2.5869 0.4390 -5.8931 0.0000 -3.4474 -1.7264
income$50,000-$99,999 -3.0823 0.4114 -7.4921 0.0000 -3.8887 -2.2758
income$100,000-$199,999 -3.5360 0.4300 -8.2232 0.0000 -4.3789 -2.6931
income$200,000 or more -3.8625 0.5011 -7.7078 0.0000 -4.8448 -2.8802
educationHigh school diploma or GED 0.2139 0.8119 0.2634 0.7922 -1.3776 1.8053
educationSome college or technical school 1.1965 0.6907 1.7323 0.0833 -0.1575 2.5505
educationCollege graduate 1.9035 0.6922 2.7499 0.0060 0.5466 3.2604
educationGraduate or professional degree 1.7456 0.6938 2.5160 0.0119 0.3856 3.1057
tidy(model_B, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption   = "Table 9. Model B: With Sex x Smoking Interaction",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 9. Model B: With Sex x Smoking Interaction
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.8994 0.9286 7.4301 0.0000 5.0791 8.7196
physhlth_days 0.2686 0.0101 26.6788 0.0000 0.2489 0.2883
bmi 0.0331 0.0132 2.5017 0.0124 0.0072 0.0590
sexFemale 1.1855 0.1775 6.6784 0.0000 0.8376 1.5335
smoker_currentCurrent smoker 1.5208 0.3654 4.1621 0.0000 0.8045 2.2371
exerciseYes -0.6728 0.2150 -3.1301 0.0018 -1.0942 -0.2515
age_group25-29 -0.9202 0.5196 -1.7709 0.0766 -1.9388 0.0984
age_group30-34 -0.8924 0.5059 -1.7640 0.0778 -1.8841 0.0993
age_group35-39 -1.5929 0.4875 -3.2673 0.0011 -2.5485 -0.6372
age_group40-44 -2.6286 0.4856 -5.4127 0.0000 -3.5806 -1.6766
age_group45-49 -2.8425 0.4969 -5.7209 0.0000 -3.8165 -1.8686
age_group50-54 -3.2778 0.4820 -6.8012 0.0000 -4.2226 -2.3331
age_group55-59 -4.2499 0.4740 -8.9652 0.0000 -5.1791 -3.3206
age_group60-64 -4.2640 0.4567 -9.3364 0.0000 -5.1593 -3.3688
age_group65-69 -5.2506 0.4466 -11.7563 0.0000 -6.1261 -4.3751
age_group70-74 -5.7111 0.4544 -12.5686 0.0000 -6.6018 -4.8203
age_group75-79 -5.9076 0.4702 -12.5646 0.0000 -6.8292 -4.9859
age_group80+ -6.4995 0.4736 -13.7239 0.0000 -7.4278 -5.5711
income$15,000-$24,999 -1.6357 0.4700 -3.4804 0.0005 -2.5570 -0.7144
income$25,000-$34,999 -2.0746 0.4486 -4.6243 0.0000 -2.9540 -1.1952
income$35,000-$49,999 -2.5455 0.4392 -5.7964 0.0000 -3.4064 -1.6847
income$50,000-$99,999 -3.0430 0.4116 -7.3935 0.0000 -3.8498 -2.2362
income$100,000-$199,999 -3.5097 0.4300 -8.1623 0.0000 -4.3526 -2.6668
income$200,000 or more -3.8405 0.5010 -7.6651 0.0000 -4.8226 -2.8583
educationHigh school diploma or GED 0.1256 0.8124 0.1546 0.8772 -1.4669 1.7180
educationSome college or technical school 1.1179 0.6912 1.6172 0.1059 -0.2371 2.4729
educationCollege graduate 1.8179 0.6928 2.6239 0.0087 0.4598 3.1760
educationGraduate or professional degree 1.6691 0.6943 2.4040 0.0162 0.3081 3.0300
sexFemale:smoker_currentCurrent smoker 1.2833 0.5171 2.4819 0.0131 0.2697 2.2968

Step 3: Test Whether the Interaction is Significant

anova(model_A, model_B) |>
  as.data.frame() |>
  rownames_to_column("Model") |>
  mutate(
    Model = c("Model A (main effects)", "Model B (+ interaction)"),
    across(where(is.numeric), \(x) round(x, 4))
  ) |>
  kable(
    caption   = "Table 10. Partial F-Test: Is the Sex x Smoking Interaction Significant?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Table 10. Partial F-Test: Is the Sex x Smoking Interaction Significant?
Model Res. df RSS df Sum of Sq F p-value
Model A (main effects) 7972 432508.9 NA NA NA NA
Model B (+ interaction) 7971 432174.9 1 333.9749 6.1598 0.0131
int_result <- anova(model_A, model_B)

H0: \(\beta_{\text{sexFemale:smoker\_currentCurrent smoker}} = 0\) – the association between current smoking and mentally unhealthy days is the same for men and women (no interaction).

HA: \(\beta_{\text{sexFemale:smoker\_currentCurrent smoker}} \neq 0\) – the association between current smoking and mentally unhealthy days differs by sex (interaction is present).

F(1, 7971) = 6.16, p = 0.0131. Since p < 0.05, we reject H0 and conclude that the association between current smoking and mentally unhealthy days does differ significantly by sex. The interaction is statistically significant at alpha = 0.05 –> a single pooled smoking coefficient (as in Model A) would be misleading –> the mental health burden associated with current smoking is larger for women than for men.

Step 4: Interpret the Interaction Coefficients

b_B <- round(coef(model_B), 3)

b_smoker   <- b_B["smoker_currentCurrent smoker"]
b_int_term <- b_B["sexFemale:smoker_currentCurrent smoker"]
b_female_smoker <- b_smoker + b_int_term

cat("Effect of current smoking among men:", b_smoker, "days\n")
## Effect of current smoking among men: 1.521 days
cat("Interaction term (Female x Current smoker):", b_int_term, "days\n")
## Interaction term (Female x Current smoker): 1.283 days
cat("Effect of current smoking among women:", b_female_smoker, "days\n")
## Effect of current smoking among women: 2.804 days

Effect of current smoking among men (1.521 days): Among males, current smokers report an estimated 1.521 more mentally unhealthy days compared to non-smokers, holding all other variables constant. This represents the smoking-mental health association specifically for men (reference category).

Effect of current smoking among women (2.804 days): Among females, current smokers report an estimated 2.804 more mentally unhealthy days compared to non-smokers, holding all other variables constant. This is calculated as the sum of the main effect for smoking (1.521) and the interaction term (1.283).

What these estimates mean together: The difference in the smoking effect between women and men is 1.283 days. If statistically significant then it indicates that current smoking is more strongly associated with poor mental health among women than among men (by ~ 1.283 days per month). This type of effect modification is important for targeting mental health interventions. If the smoking mental health link is stronger for one sex, programs should deliver greater mental health benefits for that group.

Step 5: Visualize the Interaction

pred_int <- ggpredict(model_B, terms = c("smoker_current", "sex"))

ggplot(pred_int, aes(x = x, y = predicted, color = group, fill = group)) +
  geom_point(size = 4) +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high),
                width = 0.15, linewidth = 0.9) +
  geom_line(aes(group = group), linewidth = 0.8, linetype = "dashed") +
  labs(
    title    = "Predicted Mentally Unhealthy Days by Smoking Status and Sex",
    subtitle = "From Model B: adjusted for physical health, BMI, exercise, age, income, and education",
    x        = "Smoking Status",
    y        = "Predicted Mentally Unhealthy Days (Past 30)",
    color    = "Sex",
    fill     = "Sex"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Set1")
Figure 1. Predicted Mentally Unhealthy Days by Smoking Status and Sex

Figure 1. Predicted Mentally Unhealthy Days by Smoking Status and Sex

Step 6: Plain-Language Interpretation of the Interaction

The analysis found that the association between current smoking and poor mental health is significantly stronger for female than for male. Among male participants, current smokers reported approximately 1.5 more mentally unhealthy days per month compared to non-smokers, while among female group the difference was almost double (approximately 2.8 more days). This gap was statistically significant after accounting for physical health, age, income, education, and exercise. This suggests that smoking can carry a heavier psychological burden for women, and that smoking targeted programs for women could potentially achieve greater mental health benefits than the same programs targeting men. These findings should be interpreted with caution because the data are cross-sectional. Therefore we cannot determine if smoking worsened mental health, people in poor mental health were more likely to smoke, or both. The future longitudinal research is needed to establish the direction of this relationship.


Part 4: Reflection (15 Points)

A. Findings and Limitations

The results suggest that physically unhealthy days, smoking status, and sex are the strongest independent predictors of mentally unhealthy days among U.S. adults. This is consistent with established public health literature linking physical and mental health burden, tobacco use and psychological distress, and sex-based differences in mental health outcomes. Income and education also showed meaningful associations –> higher socioeconomic status was associated with fewer mentally unhealthy days. Predictors like BMI showed weaker or less consistent associations after adjusting for all other covariates.

The primary limitation of this analysis is its cross-sectional design. Because all variables are measured at a single point in time, we cannot establish the direction of causality. For example, the association between smoking and mental health days could reflect that smoking worsens mental health; people with poor mental health are more likely to smoke as a coping mechanism, or both. Two confounders that could bias these estimates are stress exposure and access to mental health care. High chronic stress is associated with both increased smoking and worse mental health outcomes. If unmeasured it would inflate the apparent effect of smoking on mental health. Similarly, access to mental health care is correlated with both income and mental health outcomes –> those with better access may report fewer mentally unhealthy days because of care utilization –> income coefficients may overstate direct effect of income.

B. From Simple to Multiple Regression

Adjusted R-squared penalizes the model for each additional predictor added. Regular R-squared increases or stays the same every time any variable is added, even if that variable has no relationship with the outcome. When building models with multiple categorical predictors like age group (12 dummy variables), income (6 dummies), and education (4 dummies), regular R-squared will appear to improve because of the large number of parameters. Adjusted R-squared corrects for this by deflating the measure whenever variables do not carry their own weight. This makes it a better measure of how well the predictors jointly explain the outcome. Chunk tests for income and education are valuable because they address a limitation of individual t-tests. When a categorical variable is represented by many dummy variables, some levels may not differ significantly from the reference on their own yet the variable as a whole could be strongly predictive. The chunk test evaluates the joint contribution of the entire variable in a single test. This is why the appropriate question for model building is “does income matter?”.

C. AI Transparency (5 points)

I seeked Ai assistance for issues with knitting. I pasted my specific error output that I received after a knitting attempt: “"C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS Untitled1.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output EPI553_HW03_Yatsiv_Viktoriya.html --lua-filter "C:\Users\vikya\AppData\Local\R\win-library\4.5\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\vikya\AppData\Local\R\win-library\4.5\rmarkdown\rmarkdown\lua\latex-div.lua" --lua-filter "C:\Users\vikya\AppData\Local\R\win-library\4.5\rmarkdown\rmarkdown\lua\table-classes.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 3 --variable toc_float=1 --variable toc_selectors=h1,h2,h3 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template "C:\Users\vikya\AppData\Local\R\win-library\4.5\rmarkdown\rmd\h\default.html" --highlight-style tango --variable theme=flatly --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\vikya\AppData\Local\Temp\RtmpUhvWI0\rmarkdown-str451416f7b67.html" --variable code_folding=show --variable code_menu=1 YAML parse exception at line 4, column 0, while scanning for the next token: found character that cannot start any token Error: pandoc document conversion failed with error 64 Execution halted" I made corrections that were suggested to me by the AI assistant and after a few attempts the knitting process went through. That's how i verified that AI suggestions were correct..

End of Homework 3