Research Question: What individual, behavioral, and health factors are associated with physical health burden among U.S. adults? How do findings compare when the outcome is modeled as the number of 2 physically unhealthy days (continuous) versus as an indicator of frequent physical distress (binary: 14 or more days in the past 30)?
Packages and Import
# Load required packages
library(tidyverse)
library(haven)
library(kableExtra)
library(broom)
library(car)
library(gtsummary)
library(ggeffects)
library(leaps)
library(pROC)
library(ResourceSelection)# Upload BRFSS data
brfss_raw <- read_xpt("~/Desktop/EPI553/data/LLCP2023.XPT ")
# Look at data
glimpse(brfss_raw)## Rows: 433,323
## Columns: 350
## $ `_STATE` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ FMONTH <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ IDATE <chr> "03012023", "01062023", "03082023", "03062023", "01062023",…
## $ IMONTH <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01",…
## $ IDAY <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08",…
## $ IYEAR <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "20…
## $ DISPCODE <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100,…
## $ SEQNO <chr> "2023000001", "2023000002", "2023000003", "2023000004", "20…
## $ `_PSU` <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.02…
## $ CTELENM1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ PVTRESD1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ COLGHOUS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATERE1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CELPHON1 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ LADULT1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMADULT <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ RESPSLC1 <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, …
## $ LANDSEX2 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ LNDSXBRT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SAFETIME <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CTELNUM1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLFON5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CADULT1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLSEX2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELSXBRT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PVTRESD3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CCLGHOUS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSTATE1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LANDLINE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HHADULT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SEXVAR <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ GENHLTH <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3,…
## $ PHYSHLTH <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ POORHLTH <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1…
## $ PRIMINS1 <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ PERSDOC3 <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ MEDCOST1 <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHECKUP1 <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ EXERANY2 <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ EXRACT12 <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, …
## $ EXEROFT1 <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105,…
## $ EXERHMM1 <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ EXRACT22 <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 8…
## $ EXEROFT2 <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, N…
## $ EXERHMM2 <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA…
## $ STRENGTH <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888,…
## $ BPHIGH6 <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1,…
## $ BPMEDS1 <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1…
## $ CHOLCHK3 <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
## $ TOLDHI3 <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHOLMED3 <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ CVDINFR4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDCRHD4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDSTRK3 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,…
## $ ASTHMA3 <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ ASTHNOW <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CHCSCNC1 <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCOCNC1 <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHCCOPD3 <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2,…
## $ ADDEPEV3 <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCKDNY2 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2,…
## $ HAVARTH4 <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ DIABETE4 <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1,…
## $ DIABAGE4 <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA,…
## $ MARITAL <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
## $ EDUCA <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ RENTHOM1 <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMHHOL4 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ NUMPHON4 <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CPDEMO1C <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ VETERAN3 <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2,…
## $ EMPLOY1 <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7,…
## $ CHILDREN <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ INCOME3 <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99…
## $ PREGNANT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WEIGHT2 <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145,…
## $ HEIGHT3 <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505,…
## $ DEAF <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ BLIND <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ DECIDE <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7,…
## $ DIFFWALK <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1,…
## $ DIFFDRES <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ DIFFALON <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ FALL12MN <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1,…
## $ FALLINJ5 <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, …
## $ SMOKE100 <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1,…
## $ SMOKDAY2 <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, N…
## $ USENOW3 <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ ECIGNOW2 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1,…
## $ ALCDAY4 <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888,…
## $ AVEDRNK3 <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA,…
## $ DRNK3GE5 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ MAXDRNKS <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA,…
## $ FLUSHOT7 <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ FLSHTMY3 <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 1…
## $ PNEUVAC4 <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1,…
## $ SHINGLE2 <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2,…
## $ HIVTST7 <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ HIVTSTD3 <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, …
## $ SEATBELT <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKDRI2 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ COVIDPO1 <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1,…
## $ COVIDSM1 <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA…
## $ COVIDACT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, …
## $ PDIABTS1 <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8…
## $ PREDIAB2 <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3…
## $ DIABTYPE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INSULIN1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CHKHEMO3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EYEEXAM1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEYE1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEDU1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FEETSORE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEXER <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEDU <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LMTJOIN3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHDIS2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ JOINPAI2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSFIRST <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSLAST <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSNUMCG <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTSC1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSSCNCR <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTWHN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADMAM <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HOWLONG <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CERVSCRN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLCNC <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLPAP <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLHPV <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADHYST2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATEST1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATIME1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCPSARS2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSASUGS1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCSTALK2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADSIGM4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNSIGM <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNTES1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIGMTES1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSIG4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNCNCR <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VIRCOLO1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VCLNTES2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SMALSTOL <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOLTEST <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOOLDN2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BLDSTFIT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SDNATES1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRDIFF <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRAGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRTYP2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVTRT3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDOC1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVSUM <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVRTRN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINST <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINSR <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDEIN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCLIN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVPAIN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCTL2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDORTAN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ NUMBURN3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SUNPRTCT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKDAYOUT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKENDOUT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CIMEMLO1 <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2…
## $ CDWORRY <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA,…
## $ CDDISCU1 <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDHOUS1 <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDSOCIA1 <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CAREGIV1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVREL4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVLNG1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHRS1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPRB3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVALZD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPER1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHOU1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVEXPT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSMK2 <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, …
## $ STOPSMK2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTCIGS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTECIG <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HEATTBCO <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIREARM5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GUNLOAD <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOADULK2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP6 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP6 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIRSTAID <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ASPIRIN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BIRTHSEX <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SOMALE <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA,…
## $ SOFEMALE <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, …
## $ TRNSGNDR <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ MARIJAN1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJSMOK <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJEAT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJVAPE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJDAB <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJOTHR <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ USEMRJN4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDEPRS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRINK <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRUGS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPRISN <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDIVRC <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPUNCH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHURT1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACESWEAR <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETOUCH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETTHEM <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHVSEX <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADSAF <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADNED <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ IMFVPLA4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADVC4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADSHT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TETANUS1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDVA1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVACGE1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDNU2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LSATISFY <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2…
## $ EMTSUPRT <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2…
## $ SDLONELY <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2…
## $ SDHEMPLY <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ FOODSTMP <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHFOOD1 <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5…
## $ SDHBILLS <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHUTILS <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHTRNSP <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHSTRE1 <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4…
## $ RRCLASS3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRCOGNT2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRTREAT <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRATWRK2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRHCARE4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRPHYSM2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSGEND1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSXBRTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSRLTN2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHDX2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHNO2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ QSTVER <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ QSTLANG <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_METSTAT` <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2,…
## $ `_URBSTAT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MSCODE <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1,…
## $ `_STSTR` <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 110…
## $ `_STRWT` <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158…
## $ `_RAWRAKE` <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ `_WT2RAKE` <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316…
## $ `_IMPRACE` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_CHISPNC` <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ `_CRACE1` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CAGEG <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_CLLCPWT` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_DUALUSE` <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DUALCOR` <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, …
## $ `_LLCPWT2` <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.164…
## $ `_LLCPWT` <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.335…
## $ `_RFHLTH` <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,…
## $ `_PHYS14D` <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1,…
## $ `_MENT14D` <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1,…
## $ `_HLTHPL1` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_HCVU653` <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9,…
## $ `_TOTINDA` <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ METVL12_ <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA,…
## $ METVL22_ <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ MAXVO21_ <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875,…
## $ FC601_ <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328,…
## $ ACTIN13_ <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, …
## $ ACTIN23_ <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, N…
## $ PADUR1_ <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ PADUR2_ <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ PAFREQ1_ <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, N…
## $ PAFREQ2_ <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA,…
## $ `_MINAC12` <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA…
## $ `_MINAC22` <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0,…
## $ STRFREQ_ <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000,…
## $ PAMISS3_ <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1,…
## $ PAMIN13_ <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, …
## $ PAMIN23_ <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, N…
## $ PA3MIN_ <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, …
## $ PAVIG13_ <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, …
## $ PAVIG23_ <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA,…
## $ PA3VIGM_ <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA…
## $ `_PACAT3` <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9,…
## $ `_PAINDX3` <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ `_PA150R4` <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1,…
## $ `_PA300R4` <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9,…
## $ `_PA30023` <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9,…
## $ `_PASTRNG` <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2,…
## $ `_PAREC3` <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2,…
## $ `_PASTAE3` <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFHYPE6` <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHOLCH3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFCHOL3` <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2,…
## $ `_MICHD` <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_LTASTH1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ `_CASTHM1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1,…
## $ `_ASTHMS1` <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3,…
## $ `_DRDXAR2` <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ `_MRACE1` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_HISPANC` <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_RACE` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEG21` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEGR3` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEPRV` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_SEX` <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_AGE65YR` <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ `_AGE80` <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79,…
## $ `_AGE_G` <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6,…
## $ HTIN4 <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62,…
## $ HTM4 <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165,…
## $ WTKG3 <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257,…
## $ `_BMI5` <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296,…
## $ `_BMI5CAT` <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3,…
## $ `_RFBMI5` <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHLDCNT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_EDUCAG` <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2,…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ `_SMOKER3` <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3,…
## $ `_RFSMOK3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ `_CURECI2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKANY6 <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1,…
## $ DROCDY4_ <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 3…
## $ `_RFBING6` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKWK2` <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, …
## $ `_RFDRHV8` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_FLSHOT7` <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2,…
## $ `_PNEUMO3` <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1,…
## $ `_AIDTST4` <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFSEAT2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFSEAT3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKDRV` <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2,…
There are 433,323 rows and 350 columns in the raw dataset
Select variables
# New dataset with variables of interest (9 variables)
brfss_new <- brfss_raw %>%
select(PHYSHLTH, MENTHLTH, SEXVAR, EXERANY2, ADDEPEV3, '_AGEG5YR', '_INCOMG1', EDUCA, MARITAL)
#check new data
glimpse(brfss_new)## Rows: 433,323
## Columns: 9
## $ PHYSHLTH <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ SEXVAR <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ EXERANY2 <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ ADDEPEV3 <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ EDUCA <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ MARITAL <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
My dataset includes physical health days, mental health days, sex, exercise, depression, age, income, education, and marital status. I chose to include marital status and depression because previous assignments did not include them and I am curious about their affect on physical health days.
Recode Variables*
# Recode variables
brfss_clean <- brfss_new %>%
mutate(
# mentally unhealthy days
menthlth_days = case_when(
MENTHLTH == 88 ~ 0,
MENTHLTH %in% 77:99 ~ NA_real_,
MENTHLTH >= 1 & MENTHLTH <= 30 ~ as.numeric(MENTHLTH),
TRUE ~ NA_real_
),
# physically unhealthy days
physhlth_days = case_when(
PHYSHLTH == 88 ~ 0,
PHYSHLTH %in% 77:99 ~ NA_real_,
PHYSHLTH >= 1 & PHYSHLTH <= 30 ~ as.numeric(PHYSHLTH),
TRUE ~ NA_real_
),
# sex
sex = factor(SEXVAR,
levels = c(1, 2),
labels = c("Male", "Female")),
# exercise
exercise = factor(case_when(
EXERANY2 == 1 ~ "Yes",
EXERANY2 == 2 ~ "No",
TRUE ~ NA_character_
), levels = c("No", "Yes")),
# depression
depression = factor(case_when(
ADDEPEV3 == 1 ~ "Yes",
ADDEPEV3 == 2 ~ "No",
TRUE ~ NA_character_
), levels = c("No", "Yes")),
# age groups
age_group=factor(case_when(
`_AGEG5YR` %in% c(1,2,3) ~ "18-34",
`_AGEG5YR` %in% c(4,5,6) ~ "35-49",
`_AGEG5YR` %in% c(7,8,9) ~ "50-64",
`_AGEG5YR` %in% c(10,11,12,13) ~ "65+",
`_AGEG5YR` == 14 ~ NA_character_,
TRUE ~ NA_character_
)),
# income
income= factor(case_when(
`_INCOMG1` %in% c(1,2) ~ "Less than $25K",
`_INCOMG1` %in% c(3,4) ~ "$25K-$49K",
`_INCOMG1` %in% c(5,6) ~ "$50K- $99K",
`_INCOMG1` == 7 ~ "$100K+",
`_INCOMG1` == 9 ~ NA_character_,
TRUE ~ NA_character_
)),
# education
education = factor(case_when(
EDUCA %in% c(1,2,3) ~ "Less than HS",
EDUCA == 4 ~ "HS/GED",
EDUCA == 5 ~ "Some college",
EDUCA == 6 ~ "College graduate",
EDUCA == 9 ~ NA_character_,
TRUE ~ NA_character_
)),
# marital status
marital = factor(case_when(
MARITAL %in% c(1,6) ~ "Married/Partnered",
MARITAL %in% c(2,3,4) ~ "Previously married",
MARITAL == 5 ~ "Never married",
MARITAL == 9 ~ NA_character_,
TRUE ~ NA_character_
))
)Set references
brfss_clean <- brfss_clean %>%
mutate(
sex = relevel(sex, ref = "Male"),
exercise = relevel(exercise, ref = "No"),
depression = relevel(depression, ref = "No"),
age_group = relevel(age_group, ref= "18-34"),
income = relevel(income, ref = "Less than $25K"),
education = relevel(education, ref = "Less than HS"),
marital = relevel(marital, ref = "Married/Partnered")
)Missingness
## Total N: 433323
miss_info <- function(x) {
n <- sum(is.na(x))
pct <- round(100 * n / length(x), 2)
paste0(n, " (", pct, "%)")
}
cat("Missing mentally unhealthy days:", miss_info(brfss_clean$menthlth_days), "\n")## Missing mentally unhealthy days: 8108 (1.87%)
## Missing physically unhealthy days: 10785 (2.49%)
## Missing sex: 0 (0%)
## Missing exercise: 1251 (0.29%)
## Missing depression: 2587 (0.6%)
## Missing age: 7779 (1.8%)
## Missing income: 86623 (19.99%)
## Missing education: 2325 (0.54%)
## Missing marital: 4289 (0.99%)
There were 10,785 missing observations for physically unhealthy days (2.49%). There was 2,587 missing observations for depression (0.6%) and 4289 missing for marital status (0.99%)
Drop Missing Observations
brfss_final <- brfss_clean %>%
drop_na(menthlth_days, physhlth_days, sex, exercise, depression, age_group, income, education, marital)Random Sample
set.seed(1220)
brfss_analytic <- brfss_final |>
drop_na(menthlth_days, physhlth_days, sex, exercise, depression, age_group, income, education, marital) |>
slice_sample(n = 8000)
summary(brfss_analytic)## PHYSHLTH MENTHLTH SEXVAR EXERANY2
## Min. : 1.00 Min. : 1.00 Min. :1.000 Min. :1.000
## 1st Qu.:10.00 1st Qu.:10.00 1st Qu.:1.000 1st Qu.:1.000
## Median :88.00 Median :88.00 Median :2.000 Median :1.000
## Mean :57.77 Mean :56.98 Mean :1.512 Mean :1.227
## 3rd Qu.:88.00 3rd Qu.:88.00 3rd Qu.:2.000 3rd Qu.:1.000
## Max. :88.00 Max. :88.00 Max. :2.000 Max. :2.000
## ADDEPEV3 _AGEG5YR _INCOMG1 EDUCA
## Min. :1.000 Min. : 1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.: 5.000 1st Qu.:4.000 1st Qu.:4.000
## Median :2.000 Median : 8.000 Median :5.000 Median :5.000
## Mean :1.796 Mean : 7.617 Mean :4.579 Mean :5.105
## 3rd Qu.:2.000 3rd Qu.:11.000 3rd Qu.:6.000 3rd Qu.:6.000
## Max. :2.000 Max. :13.000 Max. :7.000 Max. :6.000
## MARITAL menthlth_days physhlth_days sex exercise
## Min. :1.000 Min. : 0.000 Min. : 0.000 Male :3908 No :1819
## 1st Qu.:1.000 1st Qu.: 0.000 1st Qu.: 0.000 Female:4092 Yes:6181
## Median :1.000 Median : 0.000 Median : 0.000
## Mean :2.282 Mean : 4.221 Mean : 4.292
## 3rd Qu.:3.000 3rd Qu.: 4.000 3rd Qu.: 3.000
## Max. :6.000 Max. :30.000 Max. :30.000
## depression age_group income education
## No :6369 18-34:1305 Less than $25K:1066 Less than HS : 384
## Yes:1631 35-49:1680 $100K+ : 640 College graduate:3628
## 50-64:2161 $25K-$49K :1967 HS/GED :1879
## 65+ :2854 $50K- $99K :4327 Some college :2109
##
##
## marital
## Married/Partnered :4675
## Never married :1376
## Previously married:1949
##
##
##
Descriptive summary table
brfss_analytic %>%
select(menthlth_days, physhlth_days, income, sex, exercise, age_group, education, marital, depression) %>%
tbl_summary(
label = list(
menthlth_days ~ "Mentally unhealthy days (past 30)",
physhlth_days ~ "Physically unhealthy days (past 30)",
depression ~ "Ever told depressive disorder",
sex ~ "Sex",
exercise ~ "Any physical activity (past 30 days)",
age_group ~ "Age group in 5 year categories",
income ~ "Annual household income",
education ~ "Highest level of education completed",
marital ~ "Marital Status"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 1,
missing = "no"
) %>%
add_n() %>%
# add_overall() %>%
bold_labels() %>%
italicize_levels() %>%
modify_caption("**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**") %>%
as_flex_table()Characteristic | N | N = 8,0001 |
|---|---|---|
Mentally unhealthy days (past 30) | 8,000 | 4.2 (8.1) |
Physically unhealthy days (past 30) | 8,000 | 4.3 (8.6) |
Annual household income | 8,000 | |
Less than $25K | 1,066 (13%) | |
$100K+ | 640 (8.0%) | |
$25K-$49K | 1,967 (25%) | |
$50K- $99K | 4,327 (54%) | |
Sex | 8,000 | |
Male | 3,908 (49%) | |
Female | 4,092 (51%) | |
Any physical activity (past 30 days) | 8,000 | 6,181 (77%) |
Age group in 5 year categories | 8,000 | |
18-34 | 1,305 (16%) | |
35-49 | 1,680 (21%) | |
50-64 | 2,161 (27%) | |
65+ | 2,854 (36%) | |
Highest level of education completed | 8,000 | |
Less than HS | 384 (4.8%) | |
College graduate | 3,628 (45%) | |
HS/GED | 1,879 (23%) | |
Some college | 2,109 (26%) | |
Marital Status | 8,000 | |
Married/Partnered | 4,675 (58%) | |
Never married | 1,376 (17%) | |
Previously married | 1,949 (24%) | |
Ever told depressive disorder | 8,000 | 1,631 (20%) |
1Mean (SD); n (%) | ||
Final analytic sample size= 8000
1.1 Fit the Maximum Model
model <- lm(physhlth_days ~ menthlth_days + depression + sex + exercise + age_group + income + education + marital,
data= brfss_analytic
)
tidy(model, conf.int = TRUE) |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(
caption = "Maximum Model: All Candidate Predictors",
col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Term | Estimate | SE | t | p-value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 6.5427 | 0.5343 | 12.2456 | 0.0000 | 5.4954 | 7.5901 |
| menthlth_days | 0.2977 | 0.0120 | 24.8162 | 0.0000 | 0.2741 | 0.3212 |
| depressionYes | 1.7365 | 0.2375 | 7.3115 | 0.0000 | 1.2709 | 2.2021 |
| sexFemale | -0.5121 | 0.1777 | -2.8810 | 0.0040 | -0.8605 | -0.1637 |
| exerciseYes | -3.5096 | 0.2169 | -16.1799 | 0.0000 | -3.9348 | -3.0844 |
| age_group35-49 | 1.3398 | 0.2997 | 4.4704 | 0.0000 | 0.7523 | 1.9273 |
| age_group50-64 | 2.0828 | 0.2946 | 7.0700 | 0.0000 | 1.5053 | 2.6603 |
| age_group65+ | 3.2354 | 0.2958 | 10.9392 | 0.0000 | 2.6556 | 3.8151 |
| income$100K+ | -3.5158 | 0.4364 | -8.0557 | 0.0000 | -4.3713 | -2.6603 |
| income$25K-$49K | -2.7422 | 0.3023 | -9.0706 | 0.0000 | -3.3349 | -2.1496 |
| income$50K- $99K | -3.4587 | 0.3022 | -11.4454 | 0.0000 | -4.0511 | -2.8663 |
| educationCollege graduate | -0.1959 | 0.4402 | -0.4451 | 0.6563 | -1.0588 | 0.6669 |
| educationHS/GED | -0.1456 | 0.4395 | -0.3313 | 0.7404 | -1.0071 | 0.7159 |
| educationSome college | 0.4655 | 0.4413 | 1.0549 | 0.2915 | -0.3995 | 1.3304 |
| maritalNever married | -0.4080 | 0.2696 | -1.5136 | 0.1302 | -0.9365 | 0.1204 |
| maritalPreviously married | 0.0460 | 0.2275 | 0.2022 | 0.8398 | -0.4000 | 0.4920 |
glance(model) |>
dplyr::select(r.squared, adj.r.squared, sigma, AIC, BIC, df.residual) |>
mutate(across(everything(), \(x) round(x, 3))) |>
kable(caption = "Linear Regression Model Fit Statistics") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| r.squared | adj.r.squared | sigma | AIC | BIC | df.residual |
|---|---|---|---|---|---|
| 0.199 | 0.197 | 7.727 | 55436.52 | 55555.31 | 7984 |
R-squared= 0.199 Adjusted R-squared= 0.197 AIC= 55,436.52 BIC= 55,555.31
The model explains approximately 19.90% of the variance in physically unhealthy days. The adjusted R-squared is 0.197 which is very similar to the R-squared value, indicating the predictors in the model are all contributing meaningfully. The AIC and BIC values serve as baselines for comparing similar models but BIC penalizes complexity more heavily.
1.2 Best Subsets Regression
best_subsets <- regsubsets(
physhlth_days ~ menthlth_days + depression + sex + exercise + age_group + income + education + marital,
data = brfss_analytic,
nvmax = 15,
method = "exhaustive"
)
best_summary <- summary(best_subsets)subset_metrics <- tibble(
p = 1:length(best_summary$adjr2),
`Adj. R²` = best_summary$adjr2,
BIC = best_summary$bic,
Cp = best_summary$cp
)
p1 <- ggplot(subset_metrics, aes(x = p, y = `Adj. R²`)) +
geom_line(linewidth = 1, color = "lightblue") +
geom_point(size = 3, color = "lightblue") +
geom_vline(xintercept = which.max(best_summary$adjr2),
linetype = "dashed", color = "lightgreen") +
labs(title = "Adjusted R² by Model Size", x = "Number of Variables", y = "Adjusted R²") +
theme_minimal(base_size = 12)
p2 <- ggplot(subset_metrics, aes(x = p, y = BIC)) +
geom_line(linewidth = 1, color = "lightblue") +
geom_point(size = 3, color = "lightblue") +
geom_vline(xintercept = which.min(best_summary$bic),
linetype = "dashed", color = "lightgreen") +
labs(title = "BIC by Model Size", x = "Number of Variables", y = "BIC") +
theme_minimal(base_size = 12)
gridExtra::grid.arrange(p1, p2, ncol = 2)## Best model by Adj. R²: 12 variables
## Best model by BIC: 10 variables
The model size that maximizes adjusted R^2 includes 12 variables, while the model size that minimizes BIC is 10 variables. The two criteria do not agree on the best model, the best model by BIC favors the simpler model.
1.3 Stepwise Selection Method
## Start: AIC=32731.51
## physhlth_days ~ menthlth_days + depression + sex + exercise +
## age_group + income + education + marital
##
## Df Sum of Sq RSS AIC
## - marital 2 158 476855 32730
## <none> 476696 32732
## - education 3 619 477315 32736
## - sex 1 496 477192 32738
## - depression 1 3192 479888 32783
## - age_group 3 7864 484561 32856
## - income 3 7949 484646 32858
## - exercise 1 15631 492327 32988
## - menthlth_days 1 36770 513466 33324
##
## Step: AIC=32730.16
## physhlth_days ~ menthlth_days + depression + sex + exercise +
## age_group + income + education
##
## Df Sum of Sq RSS AIC
## <none> 476855 32730
## - education 3 636 477490 32735
## - sex 1 448 477303 32736
## - depression 1 3197 480051 32782
## - income 3 8351 485205 32863
## - age_group 3 10876 487731 32905
## - exercise 1 15628 492483 32986
## - menthlth_days 1 36747 513602 33322
tidy(mod_backward, conf.int = TRUE) |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(
caption = "Backward Elimination Result (AIC-based)",
col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Term | Estimate | SE | t | p-value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 6.2945 | 0.5020 | 12.5388 | 0.0000 | 5.3105 | 7.2786 |
| menthlth_days | 0.2974 | 0.0120 | 24.8076 | 0.0000 | 0.2739 | 0.3209 |
| depressionYes | 1.7359 | 0.2373 | 7.3168 | 0.0000 | 1.2709 | 2.2010 |
| sexFemale | -0.4839 | 0.1766 | -2.7403 | 0.0062 | -0.8300 | -0.1377 |
| exerciseYes | -3.5081 | 0.2168 | -16.1782 | 0.0000 | -3.9332 | -3.0830 |
| age_group35-49 | 1.4775 | 0.2875 | 5.1385 | 0.0000 | 0.9138 | 2.0411 |
| age_group50-64 | 2.2587 | 0.2740 | 8.2434 | 0.0000 | 1.7216 | 2.7958 |
| age_group65+ | 3.4446 | 0.2652 | 12.9893 | 0.0000 | 2.9248 | 3.9645 |
| income$100K+ | -3.4166 | 0.4204 | -8.1273 | 0.0000 | -4.2407 | -2.5925 |
| income$25K-$49K | -2.7055 | 0.2999 | -9.0206 | 0.0000 | -3.2935 | -2.1176 |
| income$50K- $99K | -3.3886 | 0.2885 | -11.7468 | 0.0000 | -3.9541 | -2.8231 |
| educationCollege graduate | -0.2331 | 0.4395 | -0.5305 | 0.5958 | -1.0946 | 0.6283 |
| educationHS/GED | -0.1770 | 0.4387 | -0.4035 | 0.6866 | -1.0370 | 0.6830 |
| educationSome college | 0.4388 | 0.4407 | 0.9957 | 0.3194 | -0.4251 | 1.3027 |
mod_null <- lm(physhlth_days ~ 1, data = brfss_analytic)
mod_forward <- step(mod_null,
scope = list(lower = mod_null, upper = model),
direction = "forward", trace = 1)## Start: AIC=34475.35
## physhlth_days ~ 1
##
## Df Sum of Sq RSS AIC
## + menthlth_days 1 64650 530379 33557
## + exercise 1 37363 557666 33959
## + income 3 32952 562077 34026
## + depression 1 25412 569616 34128
## + education 3 9857 585171 34348
## + marital 2 8213 586816 34368
## + age_group 3 7080 587949 34386
## + sex 1 762 594267 34467
## <none> 595029 34475
##
## Step: AIC=33557.21
## physhlth_days ~ menthlth_days
##
## Df Sum of Sq RSS AIC
## + exercise 1 27526.5 502853 33133
## + income 3 19479.9 510899 33264
## + age_group 3 16910.8 513468 33304
## + marital 2 6168.3 524211 33468
## + education 3 5117.1 525262 33486
## + depression 1 3733.6 526646 33503
## <none> 530379 33557
## + sex 1 18.6 530361 33559
##
## Step: AIC=33132.85
## physhlth_days ~ menthlth_days + exercise
##
## Df Sum of Sq RSS AIC
## + age_group 3 11694.1 491159 32951
## + income 3 11486.1 491367 32954
## + marital 2 3668.8 499184 33078
## + depression 1 3401.5 499451 33081
## + education 3 1793.6 501059 33110
## <none> 502853 33133
## + sex 1 11.1 502842 33135
##
## Step: AIC=32950.6
## physhlth_days ~ menthlth_days + exercise + age_group
##
## Df Sum of Sq RSS AIC
## + income 3 10201.3 480957 32789
## + depression 1 3826.6 487332 32890
## + education 3 1954.8 489204 32925
## + marital 2 976.2 490182 32939
## <none> 491159 32951
## + sex 1 62.9 491096 32952
##
## Step: AIC=32788.69
## physhlth_days ~ menthlth_days + exercise + age_group + income
##
## Df Sum of Sq RSS AIC
## + depression 1 3020.05 477937 32740
## + education 3 670.49 480287 32784
## + sex 1 226.14 480731 32787
## <none> 480957 32789
## + marital 2 138.22 480819 32790
##
## Step: AIC=32740.3
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression
##
## Df Sum of Sq RSS AIC
## + sex 1 446.80 477490 32735
## + education 3 634.14 477303 32736
## <none> 477937 32740
## + marital 2 126.28 477811 32742
##
## Step: AIC=32734.82
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression + sex
##
## Df Sum of Sq RSS AIC
## + education 3 635.74 476855 32730
## <none> 477490 32735
## + marital 2 175.11 477315 32736
##
## Step: AIC=32730.16
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression + sex + education
##
## Df Sum of Sq RSS AIC
## <none> 476855 32730
## + marital 2 158.21 476696 32732
tidy(mod_forward, conf.int = TRUE) |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(
caption = "Forward Selection Result (AIC-based)",
col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Term | Estimate | SE | t | p-value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 6.2945 | 0.5020 | 12.5388 | 0.0000 | 5.3105 | 7.2786 |
| menthlth_days | 0.2974 | 0.0120 | 24.8076 | 0.0000 | 0.2739 | 0.3209 |
| exerciseYes | -3.5081 | 0.2168 | -16.1782 | 0.0000 | -3.9332 | -3.0830 |
| age_group35-49 | 1.4775 | 0.2875 | 5.1385 | 0.0000 | 0.9138 | 2.0411 |
| age_group50-64 | 2.2587 | 0.2740 | 8.2434 | 0.0000 | 1.7216 | 2.7958 |
| age_group65+ | 3.4446 | 0.2652 | 12.9893 | 0.0000 | 2.9248 | 3.9645 |
| income$100K+ | -3.4166 | 0.4204 | -8.1273 | 0.0000 | -4.2407 | -2.5925 |
| income$25K-$49K | -2.7055 | 0.2999 | -9.0206 | 0.0000 | -3.2935 | -2.1176 |
| income$50K- $99K | -3.3886 | 0.2885 | -11.7468 | 0.0000 | -3.9541 | -2.8231 |
| depressionYes | 1.7359 | 0.2373 | 7.3168 | 0.0000 | 1.2709 | 2.2010 |
| sexFemale | -0.4839 | 0.1766 | -2.7403 | 0.0062 | -0.8300 | -0.1377 |
| educationCollege graduate | -0.2331 | 0.4395 | -0.5305 | 0.5958 | -1.0946 | 0.6283 |
| educationHS/GED | -0.1770 | 0.4387 | -0.4035 | 0.6866 | -1.0370 | 0.6830 |
| educationSome college | 0.4388 | 0.4407 | 0.9957 | 0.3194 | -0.4251 | 1.3027 |
mod_stepwise <- step(mod_null,
scope = list(lower = mod_null, upper = model),
direction = "both", trace = 1)## Start: AIC=34475.35
## physhlth_days ~ 1
##
## Df Sum of Sq RSS AIC
## + menthlth_days 1 64650 530379 33557
## + exercise 1 37363 557666 33959
## + income 3 32952 562077 34026
## + depression 1 25412 569616 34128
## + education 3 9857 585171 34348
## + marital 2 8213 586816 34368
## + age_group 3 7080 587949 34386
## + sex 1 762 594267 34467
## <none> 595029 34475
##
## Step: AIC=33557.21
## physhlth_days ~ menthlth_days
##
## Df Sum of Sq RSS AIC
## + exercise 1 27527 502853 33133
## + income 3 19480 510899 33264
## + age_group 3 16911 513468 33304
## + marital 2 6168 524211 33468
## + education 3 5117 525262 33486
## + depression 1 3734 526646 33503
## <none> 530379 33557
## + sex 1 19 530361 33559
## - menthlth_days 1 64650 595029 34475
##
## Step: AIC=33132.85
## physhlth_days ~ menthlth_days + exercise
##
## Df Sum of Sq RSS AIC
## + age_group 3 11694 491159 32951
## + income 3 11486 491367 32954
## + marital 2 3669 499184 33078
## + depression 1 3401 499451 33081
## + education 3 1794 501059 33110
## <none> 502853 33133
## + sex 1 11 502842 33135
## - exercise 1 27527 530379 33557
## - menthlth_days 1 54813 557666 33959
##
## Step: AIC=32950.6
## physhlth_days ~ menthlth_days + exercise + age_group
##
## Df Sum of Sq RSS AIC
## + income 3 10201 480957 32789
## + depression 1 3827 487332 32890
## + education 3 1955 489204 32925
## + marital 2 976 490182 32939
## <none> 491159 32951
## + sex 1 63 491096 32952
## - age_group 3 11694 502853 33133
## - exercise 1 22310 513468 33304
## - menthlth_days 1 62634 553792 33909
##
## Step: AIC=32788.69
## physhlth_days ~ menthlth_days + exercise + age_group + income
##
## Df Sum of Sq RSS AIC
## + depression 1 3020 477937 32740
## + education 3 670 480287 32784
## + sex 1 226 480731 32787
## <none> 480957 32789
## + marital 2 138 480819 32790
## - income 3 10201 491159 32951
## - age_group 3 10409 491367 32954
## - exercise 1 16128 497085 33051
## - menthlth_days 1 53213 534171 33626
##
## Step: AIC=32740.3
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression
##
## Df Sum of Sq RSS AIC
## + sex 1 447 477490 32735
## + education 3 634 477303 32736
## <none> 477937 32740
## + marital 2 126 477811 32742
## - depression 1 3020 480957 32789
## - income 3 9395 487332 32890
## - age_group 3 10790 488728 32913
## - exercise 1 16038 493975 33002
## - menthlth_days 1 37070 515007 33336
##
## Step: AIC=32734.82
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression + sex
##
## Df Sum of Sq RSS AIC
## + education 3 636 476855 32730
## <none> 477490 32735
## + marital 2 175 477315 32736
## - sex 1 447 477937 32740
## - depression 1 3241 480731 32787
## - income 3 9614 487105 32888
## - age_group 3 10946 488436 32910
## - exercise 1 16134 493624 32999
## - menthlth_days 1 37286 514777 33334
##
## Step: AIC=32730.16
## physhlth_days ~ menthlth_days + exercise + age_group + income +
## depression + sex + education
##
## Df Sum of Sq RSS AIC
## <none> 476855 32730
## + marital 2 158 476696 32732
## - education 3 636 477490 32735
## - sex 1 448 477303 32736
## - depression 1 3197 480051 32782
## - income 3 8351 485205 32863
## - age_group 3 10876 487731 32905
## - exercise 1 15628 492483 32986
## - menthlth_days 1 36747 513602 33322
tidy(mod_stepwise, conf.int = TRUE) |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(
caption = "Stepwise Selection Result (AIC-based)",
col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Term | Estimate | SE | t | p-value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 6.2945 | 0.5020 | 12.5388 | 0.0000 | 5.3105 | 7.2786 |
| menthlth_days | 0.2974 | 0.0120 | 24.8076 | 0.0000 | 0.2739 | 0.3209 |
| exerciseYes | -3.5081 | 0.2168 | -16.1782 | 0.0000 | -3.9332 | -3.0830 |
| age_group35-49 | 1.4775 | 0.2875 | 5.1385 | 0.0000 | 0.9138 | 2.0411 |
| age_group50-64 | 2.2587 | 0.2740 | 8.2434 | 0.0000 | 1.7216 | 2.7958 |
| age_group65+ | 3.4446 | 0.2652 | 12.9893 | 0.0000 | 2.9248 | 3.9645 |
| income$100K+ | -3.4166 | 0.4204 | -8.1273 | 0.0000 | -4.2407 | -2.5925 |
| income$25K-$49K | -2.7055 | 0.2999 | -9.0206 | 0.0000 | -3.2935 | -2.1176 |
| income$50K- $99K | -3.3886 | 0.2885 | -11.7468 | 0.0000 | -3.9541 | -2.8231 |
| depressionYes | 1.7359 | 0.2373 | 7.3168 | 0.0000 | 1.2709 | 2.2010 |
| sexFemale | -0.4839 | 0.1766 | -2.7403 | 0.0062 | -0.8300 | -0.1377 |
| educationCollege graduate | -0.2331 | 0.4395 | -0.5305 | 0.5958 | -1.0946 | 0.6283 |
| educationHS/GED | -0.1770 | 0.4387 | -0.4035 | 0.6866 | -1.0370 | 0.6830 |
| educationSome college | 0.4388 | 0.4407 | 0.9957 | 0.3194 | -0.4251 | 1.3027 |
method_comparison <- tribble(
~Method, ~`Variables selected`, ~`Adj. R²`, ~AIC, ~BIC,
"Maximum model",
length(coef(model)) - 1,
round(glance(model)$adj.r.squared, 4),
round(AIC(model), 1),
round(BIC(model), 1),
"Backward (AIC)",
length(coef(mod_backward)) - 1,
round(glance(mod_backward)$adj.r.squared, 4),
round(AIC(mod_backward), 1),
round(BIC(mod_backward), 1),
"Forward (AIC)",
length(coef(mod_forward)) - 1,
round(glance(mod_forward)$adj.r.squared, 4),
round(AIC(mod_forward), 1),
round(BIC(mod_forward), 1),
"Stepwise (AIC)",
length(coef(mod_stepwise)) - 1,
round(glance(mod_stepwise)$adj.r.squared, 4),
round(AIC(mod_stepwise), 1),
round(BIC(mod_stepwise), 1)
)
method_comparison |>
kable(caption = "Comparison of Variable Selection Methods") |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Method | Variables selected | Adj. R² | AIC | BIC |
|---|---|---|---|---|
| Maximum model | 15 | 0.1974 | 55436.5 | 55555.3 |
| Backward (AIC) | 13 | 0.1973 | 55435.2 | 55540.0 |
| Forward (AIC) | 13 | 0.1973 | 55435.2 | 55540.0 |
| Stepwise (AIC) | 13 | 0.1973 | 55435.2 | 55540.0 |
All 3 models agreed on the same final model with 13 variables. All three models excluded the marital variables (maritalNevermarried and maritalPreviouslymarried). The remaining variables were menthlth_days, exerciseYes, all age groups, all income groups, depressionYes, sexFemale, and the education groups.
1.4 Final Model Selection and Interpretation
I chose the stepwise model as my final model. All three methods (backward, forward, and stepwise) ended with the same 13 variables which showed an improved model fit from the original maximum model.
## # A tibble: 14 × 7
## term estimate std.error statistic p.value conf.low conf.high
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 6.29 0.502 12.5 9.92e- 36 5.31 7.28
## 2 menthlth_days 0.297 0.0120 24.8 6.10e-131 0.274 0.321
## 3 exerciseYes -3.51 0.217 -16.2 5.96e- 58 -3.93 -3.08
## 4 age_group35-49 1.48 0.288 5.14 2.83e- 7 0.914 2.04
## 5 age_group50-64 2.26 0.274 8.24 1.94e- 16 1.72 2.80
## 6 age_group65+ 3.44 0.265 13.0 3.42e- 38 2.92 3.96
## 7 income$100K+ -3.42 0.420 -8.13 5.05e- 16 -4.24 -2.59
## 8 income$25K-$49K -2.71 0.300 -9.02 2.31e- 19 -3.29 -2.12
## 9 income$50K- $99K -3.39 0.288 -11.7 1.33e- 31 -3.95 -2.82
## 10 depressionYes 1.74 0.237 7.32 2.79e- 13 1.27 2.20
## 11 sexFemale -0.484 0.177 -2.74 6.15e- 3 -0.830 -0.138
## 12 educationCollege g… -0.233 0.439 -0.531 5.96e- 1 -1.09 0.628
## 13 educationHS/GED -0.177 0.439 -0.403 6.87e- 1 -1.04 0.683
## 14 educationSome coll… 0.439 0.441 0.996 3.19e- 1 -0.425 1.30
Holding all other variable constant, each additional day of poor mental health days is associated with an increase of 0.30 physically unhealth days (95% CI: 0.27, 0.32). This means that worse mental health is associated with worse physical health. Individuals who reported exercising in the past 30 days had 3.51 fewer physically unhealthy days in comparison to those who did not exercise, holding all other variables constant. Therefore, exercising is associated with better physical health. Holding all other variable constant, individuals with depression had 1.74 more physically unhealthy days in comparison to those without depression. This indicated that depression is associated with poor physical health outcomes.
1.5 LINE Assumption Check
Residual vs. Fitted: There is evidence of mild non-linearlity since residuals are centered around zero with a little bit of a curve.
Q-Q: The residuals don’t follow the normal distribution. The top tail leaves the straight line which indicates non-normality.
Scale location: The spread of residuals increases, indicating some heteroscedasticity.
Residuals vs Leverage: Based on the plot there are no highly influential observations because there are no extreme outliers, all points are clustered together.
Overall, the LINE assumptions are reasonably satisfied with some violations. There is non-normality and heteroscedasticity indicated by the Normal Q-Q and Scale-Location plots. Based on the Residuals vs Leverage plot there is no evidence of influential observations. There is some non-linearity present in the Residual vs Fitted plot but it mild.
2.1 Create the Binary Outcome
brfss_analytic <- brfss_analytic %>%
mutate(
frequent_distress= case_when(
physhlth_days >= 14 ~ "Yes",
physhlth_days < 14 ~ "No",
TRUE ~ NA_character_
),
frequent_distress= factor(frequent_distress,
levels= c("No", "Yes"))
)
brfss_analytic %>%
count(frequent_distress) %>%
mutate(percent = n/sum(n) * 100)## # A tibble: 2 × 3
## frequent_distress n percent
## <fct> <int> <dbl>
## 1 No 6944 86.8
## 2 Yes 1056 13.2
1056 (13.2%) individuals reported frequent physical distress, greater than or equal to 14 physically unhealthy days. While 6944 (86.8%) individuals reported no frequent physical distress (fewer than 14 physically unhealthy days). Overall, there is a smaller portion of individuals experiencing frequent physical distress.
2.2 Simple (Unadjusted) Logistic Regression
I chose exercise from my final linear model as the predictor because it strongly related to physical health and was significantly associated with physically unhealthy days in the linear regression model.
mod_simple <- glm(frequent_distress ~ exercise,
data = brfss_analytic, family = binomial)
summary(mod_simple)##
## Call:
## glm(formula = frequent_distress ~ exercise, family = binomial,
## data = brfss_analytic)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.97554 0.05258 -18.55 <2e-16 ***
## exerciseYes -1.33472 0.06881 -19.40 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6242.7 on 7999 degrees of freedom
## Residual deviance: 5883.3 on 7998 degrees of freedom
## AIC: 5887.3
##
## Number of Fisher Scoring iterations: 5
## (Intercept) exerciseYes
## 0.3769871 0.2632325
## 2.5 % 97.5 %
## (Intercept) 0.3397881 0.4175879
## exerciseYes 0.2300233 0.3012592
Odds ratio: 0.26 95% CI: 0.23, 0.30
In comparison to those who did not exercise, those who reported exercise have 0.26 times the odds of frequent mental distress, 95% CI[0.23, 0.30]. The association is statistically significant at a= 0.05 because the CI does not include 1.0.
2.3 Multiple Logisitic Regression
mod_logistic <- glm(frequent_distress ~ menthlth_days + depression + sex + exercise + age_group + income + education,
data = brfss_analytic, family = binomial)
summary(mod_logistic)##
## Call:
## glm(formula = frequent_distress ~ menthlth_days + depression +
## sex + exercise + age_group + income + education, family = binomial,
## data = brfss_analytic)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.064007 0.199530 -10.344 < 2e-16 ***
## menthlth_days 0.067828 0.003926 17.278 < 2e-16 ***
## depressionYes 0.499690 0.087818 5.690 1.27e-08 ***
## sexFemale -0.107139 0.075060 -1.427 0.153
## exerciseYes -0.927579 0.077931 -11.903 < 2e-16 ***
## age_group35-49 0.811292 0.147113 5.515 3.49e-08 ***
## age_group50-64 1.050798 0.139227 7.547 4.44e-14 ***
## age_group65+ 1.434023 0.135785 10.561 < 2e-16 ***
## income$100K+ -1.094527 0.199595 -5.484 4.16e-08 ***
## income$25K-$49K -0.650636 0.102657 -6.338 2.33e-10 ***
## income$50K- $99K -0.949700 0.101806 -9.329 < 2e-16 ***
## educationCollege graduate -0.159053 0.163952 -0.970 0.332
## educationHS/GED -0.007249 0.159248 -0.046 0.964
## educationSome college 0.189565 0.159785 1.186 0.235
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6242.7 on 7999 degrees of freedom
## Residual deviance: 5110.2 on 7986 degrees of freedom
## AIC: 5138.2
##
## Number of Fisher Scoring iterations: 5
| term | estimate | std.error | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| (Intercept) | 0.127 | 0.200 | -10.344 | 0.000 | 0.085 | 0.187 |
| menthlth_days | 1.070 | 0.004 | 17.278 | 0.000 | 1.062 | 1.078 |
| depressionYes | 1.648 | 0.088 | 5.690 | 0.000 | 1.386 | 1.956 |
| sexFemale | 0.898 | 0.075 | -1.427 | 0.153 | 0.775 | 1.041 |
| exerciseYes | 0.396 | 0.078 | -11.903 | 0.000 | 0.340 | 0.461 |
| age_group35-49 | 2.251 | 0.147 | 5.515 | 0.000 | 1.692 | 3.014 |
| age_group50-64 | 2.860 | 0.139 | 7.547 | 0.000 | 2.186 | 3.775 |
| age_group65+ | 4.196 | 0.136 | 10.561 | 0.000 | 3.232 | 5.505 |
| income$100K+ | 0.335 | 0.200 | -5.484 | 0.000 | 0.224 | 0.490 |
| income$25K-$49K | 0.522 | 0.103 | -6.338 | 0.000 | 0.427 | 0.638 |
| income$50K- $99K | 0.387 | 0.102 | -9.329 | 0.000 | 0.317 | 0.472 |
| educationCollege graduate | 0.853 | 0.164 | -0.970 | 0.332 | 0.621 | 1.181 |
| educationHS/GED | 0.993 | 0.159 | -0.046 | 0.964 | 0.729 | 1.362 |
| educationSome college | 1.209 | 0.160 | 1.186 | 0.235 | 0.887 | 1.660 |
Holding all other variables constant, each additional day of poor mental health days is associated with 1.07 times the odds of frequent physical distress, 95% CI [1.06, 1.08]. Therefore, worse mental health is associated with greater odds of experiencing frequent physical distress. Individuals who reported being depressed have 1.65 times the odds of frequent physical distress in comparison to those who reported not being depressed, holding all other variables constant. Indicating that depression is associated with greater odds of frequent physical distress. In comparison to males, females have 0.40 times the odds of frequent mental distress, holding all other variables constant. This indicates that females have greater odds of frequent mental distress compared to males.
1.4 Likelihood Ratio Test
mod_reduced <- glm(frequent_distress ~ menthlth_days + depression + sex + exercise + age_group + income,
family= binomial,
data= brfss_analytic)
lr_test <- anova(mod_reduced, mod_logistic, test = "Chisq")
lr_test |>
kable(digits = 3, caption = "Likelihood Ratio Test: Full Model vs. Reduced") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Resid. Df | Resid. Dev | Df | Deviance | Pr(>Chi) |
|---|---|---|---|---|
| 7989 | 5124.252 | NA | NA | NA |
| 7986 | 5110.171 | 3 | 14.081 | 0.003 |
H0: education isn’t associated with the frequent physical distress and doesnt improve the model H1: At least one education category is associated with frequent mental distress and improves the model fit
Residual deviance: 14.081 df: 3 p-value: 0.003
At a= 0.05, we reject the null hypothesis, so this group of predictors (education) significantly improves the model
2.5 ROC Curve and AUC
roc_obj <- roc(brfss_analytic$frequent_distress, fitted(mod_logistic))
plot(roc_obj, main = "ROC Curve: Frequent Physical Distress Model",
print.auc = TRUE)## Area under the curve: 0.7877
## 95% CI: 0.7725-0.8028 (DeLong)
AUC= 0.788 95% CI: [0.7725, 0.8028]
The AUC= 0.788, which indicates that the model has acceptable discrimination. This means the model has a decently good ability to discriminate between individuals with and without frequent physical distress.
2.6 Hosmer-Lemeshow Goodness-of-Fit Test
##
## Hosmer and Lemeshow goodness of fit (GOF) test
##
## data: mod_logistic$y, fitted(mod_logistic)
## X-squared = 12.767, df = 8, p-value = 0.1201
H0: The model fits the data H1: The model does not fit the data well
t-statistic: 12.77 df: 8 p-value: 0.12
At alpha= 0.05, we fail to reject the null hypothesis, therefore there is no evidence of poor model fit. The Hosmer and Lemeshow goodness of fit test complements to ROC/AUC findings because it suggest that the model fits the data well as well as has the acceptable ability to distinguish between groups.
The results from this analysis suggest that the demographic and behavioral factors are associated with physical health burden among U.S. adults. Mentally unhealthy days and depression were the strongest predictors in both the linear and logistic models. Indicating that those with frequent mentally unhealthy days and depression have worse physically unhealthy days. This is cross-sectional survey data, so causation cannot be assumed due to correlation between predictors and the outcome because data was measured at the same time. Also, there are other potential confounders that were not measured in these models such as chronic conditions (other comorbidites) and behaviors such as smoking and alcohol consumption.
The linear regression model and logistic regression model give different information about the same research question. The linear regression model examined the effect predictors had on the continuous physical health outcome. The logistic regression model estimates how the predictors influenced the odds of frequent mental distress (binary outcome). Linear regression provide R^2 and the adjusted R^2 to examine how much variance a predictor provides in physically unhealthy days. Logistic regression allows for odds ratio and adjusted OR to be calculated and interpreted as well as AUC and calibration tests that examine discrimination and model fit. Linear regression is preferred when the outcome is continuous and logistic is preferred when it is binary.
During this assignment I sought AI assistance a few times to help with errors i was receiving when I tried to run my code. In every case it was a missing comma or misspelling that I couldn’t identify. I told AI i was using R studio and I was receivng an error, I would then input a copy of the error code and AI helped me with what R studio was having an issue with unexpected symbols that I did not realize were a result of a missing comma. Another issue I had was when I was doing the logistic regression. I was trying to use physhlth_days and family=binomial at the same time and it helped me realize I wasn’t using the correct variable because it wasn’t binary and frequent_distress was.