Research Question: What individual, behavioral, and health factors are associated with physical health burden among U.S. adults? How do findings compare when the outcome is modeled as the number of 2 physically unhealthy days (continuous) versus as an indicator of frequent physical distress (binary: 14 or more days in the past 30)?

Part 0: Data Preparation (10 points)

Packages and Import

# Load required packages
library(tidyverse)
library(haven)
library(kableExtra)
library(broom)
library(car)
library(gtsummary)
library(ggeffects)
library(leaps)
library(pROC)
library(ResourceSelection)
# Upload BRFSS data
brfss_raw <- read_xpt("~/Desktop/EPI553/data/LLCP2023.XPT ")

# Look at data
glimpse(brfss_raw)
## Rows: 433,323
## Columns: 350
## $ `_STATE`   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ FMONTH     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ IDATE      <chr> "03012023", "01062023", "03082023", "03062023", "01062023",…
## $ IMONTH     <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01",…
## $ IDAY       <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08",…
## $ IYEAR      <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "20…
## $ DISPCODE   <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100,…
## $ SEQNO      <chr> "2023000001", "2023000002", "2023000003", "2023000004", "20…
## $ `_PSU`     <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.02…
## $ CTELENM1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ PVTRESD1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ COLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATERE1   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ CELPHON1   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ LADULT1    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMADULT   <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ RESPSLC1   <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, …
## $ LANDSEX2   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ LNDSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SAFETIME   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CTELNUM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLFON5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CADULT1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELLSEX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CELSXBRT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PVTRESD3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CCLGHOUS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSTATE1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LANDLINE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HHADULT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SEXVAR     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ GENHLTH    <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3,…
## $ PHYSHLTH   <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH   <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ POORHLTH   <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1…
## $ PRIMINS1   <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ PERSDOC3   <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ MEDCOST1   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHECKUP1   <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ EXERANY2   <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ EXRACT12   <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, …
## $ EXEROFT1   <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105,…
## $ EXERHMM1   <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ EXRACT22   <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 8…
## $ EXEROFT2   <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, N…
## $ EXERHMM2   <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA…
## $ STRENGTH   <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888,…
## $ BPHIGH6    <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1,…
## $ BPMEDS1    <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1…
## $ CHOLCHK3   <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,…
## $ TOLDHI3    <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHOLMED3   <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1,…
## $ CVDINFR4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDCRHD4   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CVDSTRK3   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2,…
## $ ASTHMA3    <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ ASTHNOW    <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CHCSCNC1   <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCOCNC1   <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ CHCCOPD3   <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2,…
## $ ADDEPEV3   <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ CHCKDNY2   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2,…
## $ HAVARTH4   <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ DIABETE4   <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1,…
## $ DIABAGE4   <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA,…
## $ MARITAL    <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
## $ EDUCA      <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ RENTHOM1   <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ NUMHHOL4   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1,…
## $ NUMPHON4   <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ CPDEMO1C   <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ VETERAN3   <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2,…
## $ EMPLOY1    <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7,…
## $ CHILDREN   <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ INCOME3    <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99…
## $ PREGNANT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WEIGHT2    <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145,…
## $ HEIGHT3    <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505,…
## $ DEAF       <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ BLIND      <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ DECIDE     <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7,…
## $ DIFFWALK   <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1,…
## $ DIFFDRES   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ DIFFALON   <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ FALL12MN   <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1,…
## $ FALLINJ5   <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, …
## $ SMOKE100   <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1,…
## $ SMOKDAY2   <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, N…
## $ USENOW3    <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3,…
## $ ECIGNOW2   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1,…
## $ ALCDAY4    <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888,…
## $ AVEDRNK3   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA,…
## $ DRNK3GE5   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ MAXDRNKS   <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA,…
## $ FLUSHOT7   <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1,…
## $ FLSHTMY3   <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 1…
## $ PNEUVAC4   <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1,…
## $ SHINGLE2   <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2,…
## $ HIVTST7    <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2,…
## $ HIVTSTD3   <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, …
## $ SEATBELT   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKDRI2   <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA,…
## $ COVIDPO1   <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1,…
## $ COVIDSM1   <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA…
## $ COVIDACT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, …
## $ PDIABTS1   <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8…
## $ PREDIAB2   <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3…
## $ DIABTYPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INSULIN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CHKHEMO3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EYEEXAM1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEYE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ DIABEDU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FEETSORE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEXER   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHEDU    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LMTJOIN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ARTHDIS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ JOINPAI2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSFIRST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSLAST    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSNUMCG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTSC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSSCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LCSCTWHN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADMAM     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HOWLONG    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CERVSCRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLCNC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLPAP   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRVCLHPV   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADHYST2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATEST1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSATIME1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCPSARS2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PSASUGS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ PCSTALK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HADSIGM4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNSIGM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIGMTES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSIG4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COLNCNCR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VIRCOLO1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ VCLNTES2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SMALSTOL   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOLTEST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STOOLDN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BLDSTFIT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SDNATES1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRDIFF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRAGE    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CNCRTYP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVTRT3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDOC1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVSUM    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVRTRN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINST   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVINSR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVDEIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCLIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVPAIN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CSRVCTL2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ INDORTAN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ NUMBURN3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SUNPRTCT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKDAYOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WKENDOUT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CIMEMLO1   <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2…
## $ CDWORRY    <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA,…
## $ CDDISCU1   <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDHOUS1    <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CDSOCIA1   <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ CAREGIV1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVREL4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVLNG1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHRS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPRB3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVALZD   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVPER1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVHOU1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CRGVEXPT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LASTSMK2   <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, …
## $ STOPSMK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTCIGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MENTECIG   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HEATTBCO   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIREARM5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GUNLOAD    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOADULK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP1    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP2    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP3    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP4    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP5    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HASYMP6    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP5   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STRSYMP6   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FIRSTAID   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ASPIRIN    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BIRTHSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SOMALE     <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA,…
## $ SOFEMALE   <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, …
## $ TRNSGNDR   <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ MARIJAN1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJSMOK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJEAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJVAPE   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJDAB    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ MARJOTHR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ USEMRJN4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDEPRS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRINK   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDRUGS   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPRISN   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEDIVRC   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEPUNCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHURT1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACESWEAR   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETOUCH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACETTHEM   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEHVSEX   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADSAF   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ACEADNED   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ IMFVPLA4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADVC4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ HPVADSHT   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TETANUS1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDVA1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVACGE1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COVIDNU2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LSATISFY   <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2…
## $ EMTSUPRT   <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2…
## $ SDLONELY   <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2…
## $ SDHEMPLY   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ FOODSTMP   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHFOOD1   <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5…
## $ SDHBILLS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHUTILS   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHTRNSP   <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2…
## $ SDHSTRE1   <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4…
## $ RRCLASS3   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRCOGNT2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRTREAT    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRATWRK2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRHCARE4   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RRPHYSM2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSGEND1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSXBRTH   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RCSRLTN2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHDX2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CASTHNO2   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ QSTVER     <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ QSTLANG    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_METSTAT` <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2,…
## $ `_URBSTAT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ MSCODE     <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1,…
## $ `_STSTR`   <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 110…
## $ `_STRWT`   <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158…
## $ `_RAWRAKE` <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1,…
## $ `_WT2RAKE` <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316…
## $ `_IMPRACE` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_CHISPNC` <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,…
## $ `_CRACE1`  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CAGEG      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_CLLCPWT` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ `_DUALUSE` <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DUALCOR` <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, …
## $ `_LLCPWT2` <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.164…
## $ `_LLCPWT`  <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.335…
## $ `_RFHLTH`  <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1,…
## $ `_PHYS14D` <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1,…
## $ `_MENT14D` <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1,…
## $ `_HLTHPL1` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_HCVU653` <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9,…
## $ `_TOTINDA` <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ METVL12_   <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA,…
## $ METVL22_   <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ MAXVO21_   <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875,…
## $ FC601_     <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328,…
## $ ACTIN13_   <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, …
## $ ACTIN23_   <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, N…
## $ PADUR1_    <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA,…
## $ PADUR2_    <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA,…
## $ PAFREQ1_   <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, N…
## $ PAFREQ2_   <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA,…
## $ `_MINAC12` <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA…
## $ `_MINAC22` <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0,…
## $ STRFREQ_   <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000,…
## $ PAMISS3_   <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1,…
## $ PAMIN13_   <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, …
## $ PAMIN23_   <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, N…
## $ PA3MIN_    <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, …
## $ PAVIG13_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, …
## $ PAVIG23_   <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA,…
## $ PA3VIGM_   <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA…
## $ `_PACAT3`  <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9,…
## $ `_PAINDX3` <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1,…
## $ `_PA150R4` <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1,…
## $ `_PA300R4` <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9,…
## $ `_PA30023` <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9,…
## $ `_PASTRNG` <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2,…
## $ `_PAREC3`  <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2,…
## $ `_PASTAE3` <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFHYPE6` <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHOLCH3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFCHOL3` <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2,…
## $ `_MICHD`   <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_LTASTH1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1,…
## $ `_CASTHM1` <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1,…
## $ `_ASTHMS1` <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3,…
## $ `_DRDXAR2` <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1,…
## $ `_MRACE1`  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_HISPANC` <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_RACE`    <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEG21` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEGR3` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_RACEPRV` <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1,…
## $ `_SEX`     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_AGE65YR` <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2,…
## $ `_AGE80`   <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79,…
## $ `_AGE_G`   <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6,…
## $ HTIN4      <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62,…
## $ HTM4       <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165,…
## $ WTKG3      <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257,…
## $ `_BMI5`    <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296,…
## $ `_BMI5CAT` <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3,…
## $ `_RFBMI5`  <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2,…
## $ `_CHLDCNT` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_EDUCAG`  <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2,…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ `_SMOKER3` <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3,…
## $ `_RFSMOK3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1,…
## $ `_CURECI2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ DRNKANY6   <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1,…
## $ DROCDY4_   <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 3…
## $ `_RFBING6` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKWK2` <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, …
## $ `_RFDRHV8` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_FLSHOT7` <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2,…
## $ `_PNEUMO3` <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1,…
## $ `_AIDTST4` <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2,…
## $ `_RFSEAT2` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_RFSEAT3` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ `_DRNKDRV` <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2,…

There are 433,323 rows and 350 columns in the raw dataset

Select variables

# New dataset with variables of interest (9 variables)
brfss_new <- brfss_raw %>%
  select(PHYSHLTH, MENTHLTH, SEXVAR, EXERANY2, ADDEPEV3, '_AGEG5YR', '_INCOMG1', EDUCA, MARITAL)

#check new data 
glimpse(brfss_new)
## Rows: 433,323
## Columns: 9
## $ PHYSHLTH   <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30,…
## $ MENTHLTH   <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ SEXVAR     <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1,…
## $ EXERANY2   <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1,…
## $ ADDEPEV3   <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ `_AGEG5YR` <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 1…
## $ `_INCOMG1` <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4,…
## $ EDUCA      <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4,…
## $ MARITAL    <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3,…
#save as a working data frame
saveRDS(brfss_new, "brfss_new.rds")

My dataset includes physical health days, mental health days, sex, exercise, depression, age, income, education, and marital status. I chose to include marital status and depression because previous assignments did not include them and I am curious about their affect on physical health days.

Recode Variables*

# Recode variables
brfss_clean <- brfss_new %>%
  mutate(
    
    # mentally unhealthy days
    menthlth_days = case_when(
      MENTHLTH == 88 ~ 0,
      MENTHLTH %in% 77:99 ~ NA_real_,
      MENTHLTH >= 1 & MENTHLTH <= 30 ~ as.numeric(MENTHLTH),
      TRUE ~ NA_real_
    ),
    
     # physically unhealthy days 
    physhlth_days = case_when(
      PHYSHLTH == 88 ~ 0,
      PHYSHLTH %in% 77:99 ~ NA_real_,
      PHYSHLTH >= 1 & PHYSHLTH <= 30 ~ as.numeric(PHYSHLTH),
      TRUE ~ NA_real_
    ),
    
    # sex
    sex = factor(SEXVAR,
                 levels = c(1, 2),
                 labels = c("Male", "Female")),
    
     # exercise 
    exercise = factor(case_when(
      EXERANY2 == 1 ~ "Yes",
      EXERANY2 == 2 ~ "No",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),
    
    # depression 
    depression = factor(case_when(
      ADDEPEV3 == 1 ~ "Yes",
      ADDEPEV3 == 2 ~ "No",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),
    
    # age groups 
    age_group=factor(case_when(
      `_AGEG5YR` %in% c(1,2,3) ~ "18-34",
      `_AGEG5YR` %in% c(4,5,6) ~ "35-49",
      `_AGEG5YR` %in% c(7,8,9) ~ "50-64",
      `_AGEG5YR` %in% c(10,11,12,13) ~ "65+",
      `_AGEG5YR` == 14 ~ NA_character_,
      TRUE ~ NA_character_
    )),
    
    # income
    income= factor(case_when(
      `_INCOMG1` %in% c(1,2) ~ "Less than $25K",
      `_INCOMG1` %in% c(3,4) ~ "$25K-$49K",
      `_INCOMG1` %in% c(5,6) ~ "$50K- $99K",
      `_INCOMG1` == 7 ~ "$100K+",
      `_INCOMG1` == 9 ~ NA_character_,
      TRUE ~ NA_character_
    )),

    # education 
    education = factor(case_when(
      EDUCA %in% c(1,2,3) ~ "Less than HS",
      EDUCA == 4 ~ "HS/GED",
      EDUCA == 5 ~ "Some college",
      EDUCA == 6 ~ "College graduate",
      EDUCA == 9 ~ NA_character_,
      TRUE ~ NA_character_
    )), 

     # marital status
    marital = factor(case_when(
      MARITAL %in% c(1,6) ~ "Married/Partnered",
      MARITAL %in% c(2,3,4) ~ "Previously married",
      MARITAL == 5 ~ "Never married",
      MARITAL == 9 ~ NA_character_,
      TRUE ~ NA_character_
    ))
    
    )

Set references

brfss_clean <- brfss_clean %>%
  mutate(
    sex = relevel(sex, ref = "Male"),
    exercise = relevel(exercise, ref = "No"),
    depression = relevel(depression, ref = "No"),
    age_group = relevel(age_group, ref= "18-34"),
    income = relevel(income, ref = "Less than $25K"),
    education = relevel(education, ref = "Less than HS"),
    marital = relevel(marital, ref = "Married/Partnered")
  )

Missingness

# Report missing values for the key variables
cat("Total N:", nrow(brfss_clean), "\n\n")
## Total N: 433323
miss_info <- function(x) {
  n <- sum(is.na(x))
  pct <- round(100 * n / length(x), 2)
  paste0(n, " (", pct, "%)")
}

cat("Missing mentally unhealthy days:", miss_info(brfss_clean$menthlth_days), "\n")
## Missing mentally unhealthy days: 8108 (1.87%)
cat("Missing physically unhealthy days:", miss_info(brfss_clean$physhlth_days), "\n")
## Missing physically unhealthy days: 10785 (2.49%)
cat("Missing sex:", miss_info(brfss_clean$sex), "\n")
## Missing sex: 0 (0%)
cat("Missing exercise:", miss_info(brfss_clean$exercise), "\n")
## Missing exercise: 1251 (0.29%)
cat("Missing depression:", miss_info(brfss_clean$depression), "\n")
## Missing depression: 2587 (0.6%)
cat("Missing age:", miss_info(brfss_clean$age_group), "\n")
## Missing age: 7779 (1.8%)
cat("Missing income:", miss_info(brfss_clean$income), "\n")
## Missing income: 86623 (19.99%)
cat("Missing education:", miss_info(brfss_clean$education), "\n")
## Missing education: 2325 (0.54%)
cat("Missing marital:", miss_info(brfss_clean$marital), "\n")
## Missing marital: 4289 (0.99%)

There were 10,785 missing observations for physically unhealthy days (2.49%). There was 2,587 missing observations for depression (0.6%) and 4289 missing for marital status (0.99%)

Drop Missing Observations

brfss_final <- brfss_clean %>%
  drop_na(menthlth_days, physhlth_days, sex, exercise, depression, age_group, income, education, marital)

Random Sample

set.seed(1220)
brfss_analytic <- brfss_final |>
drop_na(menthlth_days, physhlth_days, sex, exercise, depression, age_group, income, education, marital) |>
slice_sample(n = 8000)

summary(brfss_analytic)
##     PHYSHLTH        MENTHLTH         SEXVAR         EXERANY2    
##  Min.   : 1.00   Min.   : 1.00   Min.   :1.000   Min.   :1.000  
##  1st Qu.:10.00   1st Qu.:10.00   1st Qu.:1.000   1st Qu.:1.000  
##  Median :88.00   Median :88.00   Median :2.000   Median :1.000  
##  Mean   :57.77   Mean   :56.98   Mean   :1.512   Mean   :1.227  
##  3rd Qu.:88.00   3rd Qu.:88.00   3rd Qu.:2.000   3rd Qu.:1.000  
##  Max.   :88.00   Max.   :88.00   Max.   :2.000   Max.   :2.000  
##     ADDEPEV3        _AGEG5YR         _INCOMG1         EDUCA      
##  Min.   :1.000   Min.   : 1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:2.000   1st Qu.: 5.000   1st Qu.:4.000   1st Qu.:4.000  
##  Median :2.000   Median : 8.000   Median :5.000   Median :5.000  
##  Mean   :1.796   Mean   : 7.617   Mean   :4.579   Mean   :5.105  
##  3rd Qu.:2.000   3rd Qu.:11.000   3rd Qu.:6.000   3rd Qu.:6.000  
##  Max.   :2.000   Max.   :13.000   Max.   :7.000   Max.   :6.000  
##     MARITAL      menthlth_days    physhlth_days        sex       exercise  
##  Min.   :1.000   Min.   : 0.000   Min.   : 0.000   Male  :3908   No :1819  
##  1st Qu.:1.000   1st Qu.: 0.000   1st Qu.: 0.000   Female:4092   Yes:6181  
##  Median :1.000   Median : 0.000   Median : 0.000                           
##  Mean   :2.282   Mean   : 4.221   Mean   : 4.292                           
##  3rd Qu.:3.000   3rd Qu.: 4.000   3rd Qu.: 3.000                           
##  Max.   :6.000   Max.   :30.000   Max.   :30.000                           
##  depression age_group               income                education   
##  No :6369   18-34:1305   Less than $25K:1066   Less than HS    : 384  
##  Yes:1631   35-49:1680   $100K+        : 640   College graduate:3628  
##             50-64:2161   $25K-$49K     :1967   HS/GED          :1879  
##             65+  :2854   $50K- $99K    :4327   Some college    :2109  
##                                                                       
##                                                                       
##                marital    
##  Married/Partnered :4675  
##  Never married     :1376  
##  Previously married:1949  
##                           
##                           
## 

Descriptive summary table

brfss_analytic %>%
  select(menthlth_days, physhlth_days, income, sex, exercise, age_group, education, marital, depression) %>%
  tbl_summary(
    label = list(
      menthlth_days ~ "Mentally unhealthy days (past 30)",
      physhlth_days ~ "Physically unhealthy days (past 30)",
      depression ~ "Ever told depressive disorder",
      sex ~ "Sex",
      exercise ~ "Any physical activity (past 30 days)",
      age_group ~ "Age group in 5 year categories",
      income ~ "Annual household income",
      education ~ "Highest level of education completed",
      marital ~ "Marital Status"
    ),
    statistic = list(
      all_continuous()  ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = all_continuous() ~ 1,
    missing = "no"
  ) %>%
  add_n() %>%
  # add_overall() %>%
  bold_labels() %>%
  italicize_levels() %>%
  modify_caption("**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**") %>%
  as_flex_table()
**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**

Characteristic

N

N = 8,0001

Mentally unhealthy days (past 30)

8,000

4.2 (8.1)

Physically unhealthy days (past 30)

8,000

4.3 (8.6)

Annual household income

8,000

Less than $25K

1,066 (13%)

$100K+

640 (8.0%)

$25K-$49K

1,967 (25%)

$50K- $99K

4,327 (54%)

Sex

8,000

Male

3,908 (49%)

Female

4,092 (51%)

Any physical activity (past 30 days)

8,000

6,181 (77%)

Age group in 5 year categories

8,000

18-34

1,305 (16%)

35-49

1,680 (21%)

50-64

2,161 (27%)

65+

2,854 (36%)

Highest level of education completed

8,000

Less than HS

384 (4.8%)

College graduate

3,628 (45%)

HS/GED

1,879 (23%)

Some college

2,109 (26%)

Marital Status

8,000

Married/Partnered

4,675 (58%)

Never married

1,376 (17%)

Previously married

1,949 (24%)

Ever told depressive disorder

8,000

1,631 (20%)

1Mean (SD); n (%)

Final analytic sample size= 8000

Part 1: Model Selection for Linear Regression (35 Points)

1.1 Fit the Maximum Model

model <- lm(physhlth_days ~ menthlth_days + depression + sex + exercise + age_group + income + education + marital, 
            data= brfss_analytic
            )
tidy(model, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Maximum Model: All Candidate Predictors",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Maximum Model: All Candidate Predictors
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.5427 0.5343 12.2456 0.0000 5.4954 7.5901
menthlth_days 0.2977 0.0120 24.8162 0.0000 0.2741 0.3212
depressionYes 1.7365 0.2375 7.3115 0.0000 1.2709 2.2021
sexFemale -0.5121 0.1777 -2.8810 0.0040 -0.8605 -0.1637
exerciseYes -3.5096 0.2169 -16.1799 0.0000 -3.9348 -3.0844
age_group35-49 1.3398 0.2997 4.4704 0.0000 0.7523 1.9273
age_group50-64 2.0828 0.2946 7.0700 0.0000 1.5053 2.6603
age_group65+ 3.2354 0.2958 10.9392 0.0000 2.6556 3.8151
income$100K+ -3.5158 0.4364 -8.0557 0.0000 -4.3713 -2.6603
income$25K-$49K -2.7422 0.3023 -9.0706 0.0000 -3.3349 -2.1496
income$50K- $99K -3.4587 0.3022 -11.4454 0.0000 -4.0511 -2.8663
educationCollege graduate -0.1959 0.4402 -0.4451 0.6563 -1.0588 0.6669
educationHS/GED -0.1456 0.4395 -0.3313 0.7404 -1.0071 0.7159
educationSome college 0.4655 0.4413 1.0549 0.2915 -0.3995 1.3304
maritalNever married -0.4080 0.2696 -1.5136 0.1302 -0.9365 0.1204
maritalPreviously married 0.0460 0.2275 0.2022 0.8398 -0.4000 0.4920
glance(model) |>
  dplyr::select(r.squared, adj.r.squared, sigma, AIC, BIC, df.residual) |>
  mutate(across(everything(), \(x) round(x, 3))) |>
  kable(caption = "Linear Regression Model Fit Statistics") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Linear Regression Model Fit Statistics
r.squared adj.r.squared sigma AIC BIC df.residual
0.199 0.197 7.727 55436.52 55555.31 7984

R-squared= 0.199 Adjusted R-squared= 0.197 AIC= 55,436.52 BIC= 55,555.31

The model explains approximately 19.90% of the variance in physically unhealthy days. The adjusted R-squared is 0.197 which is very similar to the R-squared value, indicating the predictors in the model are all contributing meaningfully. The AIC and BIC values serve as baselines for comparing similar models but BIC penalizes complexity more heavily.

1.2 Best Subsets Regression

best_subsets <- regsubsets(
  physhlth_days ~ menthlth_days + depression + sex + exercise + age_group +   income + education + marital,
  data = brfss_analytic,
  nvmax = 15,
  method = "exhaustive"
)

best_summary <- summary(best_subsets)
subset_metrics <- tibble(
  p = 1:length(best_summary$adjr2),
  `Adj. R²` = best_summary$adjr2,
  BIC = best_summary$bic,
  Cp = best_summary$cp
)

p1 <- ggplot(subset_metrics, aes(x = p, y = `Adj. R²`)) +
  geom_line(linewidth = 1, color = "lightblue") +
  geom_point(size = 3, color = "lightblue") +
  geom_vline(xintercept = which.max(best_summary$adjr2),
             linetype = "dashed", color = "lightgreen") +
  labs(title = "Adjusted R² by Model Size", x = "Number of Variables", y = "Adjusted R²") +
  theme_minimal(base_size = 12)

p2 <- ggplot(subset_metrics, aes(x = p, y = BIC)) +
  geom_line(linewidth = 1, color = "lightblue") +
  geom_point(size = 3, color = "lightblue") +
  geom_vline(xintercept = which.min(best_summary$bic),
             linetype = "dashed", color = "lightgreen") +
  labs(title = "BIC by Model Size", x = "Number of Variables", y = "BIC") +
  theme_minimal(base_size = 12)

gridExtra::grid.arrange(p1, p2, ncol = 2)

cat("Best model by Adj. R²:", which.max(best_summary$adjr2), "variables\n")
## Best model by Adj. R²: 12 variables
cat("Best model by BIC:", which.min(best_summary$bic), "variables\n")
## Best model by BIC: 10 variables

The model size that maximizes adjusted R^2 includes 12 variables, while the model size that minimizes BIC is 10 variables. The two criteria do not agree on the best model, the best model by BIC favors the simpler model.

1.3 Stepwise Selection Method

mod_backward <- step(model, direction = "backward", trace = 1)
## Start:  AIC=32731.51
## physhlth_days ~ menthlth_days + depression + sex + exercise + 
##     age_group + income + education + marital
## 
##                 Df Sum of Sq    RSS   AIC
## - marital        2       158 476855 32730
## <none>                       476696 32732
## - education      3       619 477315 32736
## - sex            1       496 477192 32738
## - depression     1      3192 479888 32783
## - age_group      3      7864 484561 32856
## - income         3      7949 484646 32858
## - exercise       1     15631 492327 32988
## - menthlth_days  1     36770 513466 33324
## 
## Step:  AIC=32730.16
## physhlth_days ~ menthlth_days + depression + sex + exercise + 
##     age_group + income + education
## 
##                 Df Sum of Sq    RSS   AIC
## <none>                       476855 32730
## - education      3       636 477490 32735
## - sex            1       448 477303 32736
## - depression     1      3197 480051 32782
## - income         3      8351 485205 32863
## - age_group      3     10876 487731 32905
## - exercise       1     15628 492483 32986
## - menthlth_days  1     36747 513602 33322
tidy(mod_backward, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Backward Elimination Result (AIC-based)",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Backward Elimination Result (AIC-based)
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.2945 0.5020 12.5388 0.0000 5.3105 7.2786
menthlth_days 0.2974 0.0120 24.8076 0.0000 0.2739 0.3209
depressionYes 1.7359 0.2373 7.3168 0.0000 1.2709 2.2010
sexFemale -0.4839 0.1766 -2.7403 0.0062 -0.8300 -0.1377
exerciseYes -3.5081 0.2168 -16.1782 0.0000 -3.9332 -3.0830
age_group35-49 1.4775 0.2875 5.1385 0.0000 0.9138 2.0411
age_group50-64 2.2587 0.2740 8.2434 0.0000 1.7216 2.7958
age_group65+ 3.4446 0.2652 12.9893 0.0000 2.9248 3.9645
income$100K+ -3.4166 0.4204 -8.1273 0.0000 -4.2407 -2.5925
income$25K-$49K -2.7055 0.2999 -9.0206 0.0000 -3.2935 -2.1176
income$50K- $99K -3.3886 0.2885 -11.7468 0.0000 -3.9541 -2.8231
educationCollege graduate -0.2331 0.4395 -0.5305 0.5958 -1.0946 0.6283
educationHS/GED -0.1770 0.4387 -0.4035 0.6866 -1.0370 0.6830
educationSome college 0.4388 0.4407 0.9957 0.3194 -0.4251 1.3027
mod_null <- lm(physhlth_days ~ 1, data = brfss_analytic)

mod_forward <- step(mod_null,
                    scope = list(lower = mod_null, upper = model),
                    direction = "forward", trace = 1)
## Start:  AIC=34475.35
## physhlth_days ~ 1
## 
##                 Df Sum of Sq    RSS   AIC
## + menthlth_days  1     64650 530379 33557
## + exercise       1     37363 557666 33959
## + income         3     32952 562077 34026
## + depression     1     25412 569616 34128
## + education      3      9857 585171 34348
## + marital        2      8213 586816 34368
## + age_group      3      7080 587949 34386
## + sex            1       762 594267 34467
## <none>                       595029 34475
## 
## Step:  AIC=33557.21
## physhlth_days ~ menthlth_days
## 
##              Df Sum of Sq    RSS   AIC
## + exercise    1   27526.5 502853 33133
## + income      3   19479.9 510899 33264
## + age_group   3   16910.8 513468 33304
## + marital     2    6168.3 524211 33468
## + education   3    5117.1 525262 33486
## + depression  1    3733.6 526646 33503
## <none>                    530379 33557
## + sex         1      18.6 530361 33559
## 
## Step:  AIC=33132.85
## physhlth_days ~ menthlth_days + exercise
## 
##              Df Sum of Sq    RSS   AIC
## + age_group   3   11694.1 491159 32951
## + income      3   11486.1 491367 32954
## + marital     2    3668.8 499184 33078
## + depression  1    3401.5 499451 33081
## + education   3    1793.6 501059 33110
## <none>                    502853 33133
## + sex         1      11.1 502842 33135
## 
## Step:  AIC=32950.6
## physhlth_days ~ menthlth_days + exercise + age_group
## 
##              Df Sum of Sq    RSS   AIC
## + income      3   10201.3 480957 32789
## + depression  1    3826.6 487332 32890
## + education   3    1954.8 489204 32925
## + marital     2     976.2 490182 32939
## <none>                    491159 32951
## + sex         1      62.9 491096 32952
## 
## Step:  AIC=32788.69
## physhlth_days ~ menthlth_days + exercise + age_group + income
## 
##              Df Sum of Sq    RSS   AIC
## + depression  1   3020.05 477937 32740
## + education   3    670.49 480287 32784
## + sex         1    226.14 480731 32787
## <none>                    480957 32789
## + marital     2    138.22 480819 32790
## 
## Step:  AIC=32740.3
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression
## 
##             Df Sum of Sq    RSS   AIC
## + sex        1    446.80 477490 32735
## + education  3    634.14 477303 32736
## <none>                   477937 32740
## + marital    2    126.28 477811 32742
## 
## Step:  AIC=32734.82
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression + sex
## 
##             Df Sum of Sq    RSS   AIC
## + education  3    635.74 476855 32730
## <none>                   477490 32735
## + marital    2    175.11 477315 32736
## 
## Step:  AIC=32730.16
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression + sex + education
## 
##           Df Sum of Sq    RSS   AIC
## <none>                 476855 32730
## + marital  2    158.21 476696 32732
tidy(mod_forward, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Forward Selection Result (AIC-based)",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Forward Selection Result (AIC-based)
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.2945 0.5020 12.5388 0.0000 5.3105 7.2786
menthlth_days 0.2974 0.0120 24.8076 0.0000 0.2739 0.3209
exerciseYes -3.5081 0.2168 -16.1782 0.0000 -3.9332 -3.0830
age_group35-49 1.4775 0.2875 5.1385 0.0000 0.9138 2.0411
age_group50-64 2.2587 0.2740 8.2434 0.0000 1.7216 2.7958
age_group65+ 3.4446 0.2652 12.9893 0.0000 2.9248 3.9645
income$100K+ -3.4166 0.4204 -8.1273 0.0000 -4.2407 -2.5925
income$25K-$49K -2.7055 0.2999 -9.0206 0.0000 -3.2935 -2.1176
income$50K- $99K -3.3886 0.2885 -11.7468 0.0000 -3.9541 -2.8231
depressionYes 1.7359 0.2373 7.3168 0.0000 1.2709 2.2010
sexFemale -0.4839 0.1766 -2.7403 0.0062 -0.8300 -0.1377
educationCollege graduate -0.2331 0.4395 -0.5305 0.5958 -1.0946 0.6283
educationHS/GED -0.1770 0.4387 -0.4035 0.6866 -1.0370 0.6830
educationSome college 0.4388 0.4407 0.9957 0.3194 -0.4251 1.3027
mod_stepwise <- step(mod_null,
                     scope = list(lower = mod_null, upper = model),
                     direction = "both", trace = 1)
## Start:  AIC=34475.35
## physhlth_days ~ 1
## 
##                 Df Sum of Sq    RSS   AIC
## + menthlth_days  1     64650 530379 33557
## + exercise       1     37363 557666 33959
## + income         3     32952 562077 34026
## + depression     1     25412 569616 34128
## + education      3      9857 585171 34348
## + marital        2      8213 586816 34368
## + age_group      3      7080 587949 34386
## + sex            1       762 594267 34467
## <none>                       595029 34475
## 
## Step:  AIC=33557.21
## physhlth_days ~ menthlth_days
## 
##                 Df Sum of Sq    RSS   AIC
## + exercise       1     27527 502853 33133
## + income         3     19480 510899 33264
## + age_group      3     16911 513468 33304
## + marital        2      6168 524211 33468
## + education      3      5117 525262 33486
## + depression     1      3734 526646 33503
## <none>                       530379 33557
## + sex            1        19 530361 33559
## - menthlth_days  1     64650 595029 34475
## 
## Step:  AIC=33132.85
## physhlth_days ~ menthlth_days + exercise
## 
##                 Df Sum of Sq    RSS   AIC
## + age_group      3     11694 491159 32951
## + income         3     11486 491367 32954
## + marital        2      3669 499184 33078
## + depression     1      3401 499451 33081
## + education      3      1794 501059 33110
## <none>                       502853 33133
## + sex            1        11 502842 33135
## - exercise       1     27527 530379 33557
## - menthlth_days  1     54813 557666 33959
## 
## Step:  AIC=32950.6
## physhlth_days ~ menthlth_days + exercise + age_group
## 
##                 Df Sum of Sq    RSS   AIC
## + income         3     10201 480957 32789
## + depression     1      3827 487332 32890
## + education      3      1955 489204 32925
## + marital        2       976 490182 32939
## <none>                       491159 32951
## + sex            1        63 491096 32952
## - age_group      3     11694 502853 33133
## - exercise       1     22310 513468 33304
## - menthlth_days  1     62634 553792 33909
## 
## Step:  AIC=32788.69
## physhlth_days ~ menthlth_days + exercise + age_group + income
## 
##                 Df Sum of Sq    RSS   AIC
## + depression     1      3020 477937 32740
## + education      3       670 480287 32784
## + sex            1       226 480731 32787
## <none>                       480957 32789
## + marital        2       138 480819 32790
## - income         3     10201 491159 32951
## - age_group      3     10409 491367 32954
## - exercise       1     16128 497085 33051
## - menthlth_days  1     53213 534171 33626
## 
## Step:  AIC=32740.3
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression
## 
##                 Df Sum of Sq    RSS   AIC
## + sex            1       447 477490 32735
## + education      3       634 477303 32736
## <none>                       477937 32740
## + marital        2       126 477811 32742
## - depression     1      3020 480957 32789
## - income         3      9395 487332 32890
## - age_group      3     10790 488728 32913
## - exercise       1     16038 493975 33002
## - menthlth_days  1     37070 515007 33336
## 
## Step:  AIC=32734.82
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression + sex
## 
##                 Df Sum of Sq    RSS   AIC
## + education      3       636 476855 32730
## <none>                       477490 32735
## + marital        2       175 477315 32736
## - sex            1       447 477937 32740
## - depression     1      3241 480731 32787
## - income         3      9614 487105 32888
## - age_group      3     10946 488436 32910
## - exercise       1     16134 493624 32999
## - menthlth_days  1     37286 514777 33334
## 
## Step:  AIC=32730.16
## physhlth_days ~ menthlth_days + exercise + age_group + income + 
##     depression + sex + education
## 
##                 Df Sum of Sq    RSS   AIC
## <none>                       476855 32730
## + marital        2       158 476696 32732
## - education      3       636 477490 32735
## - sex            1       448 477303 32736
## - depression     1      3197 480051 32782
## - income         3      8351 485205 32863
## - age_group      3     10876 487731 32905
## - exercise       1     15628 492483 32986
## - menthlth_days  1     36747 513602 33322
tidy(mod_stepwise, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Stepwise Selection Result (AIC-based)",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Stepwise Selection Result (AIC-based)
Term Estimate SE t p-value CI Lower CI Upper
(Intercept) 6.2945 0.5020 12.5388 0.0000 5.3105 7.2786
menthlth_days 0.2974 0.0120 24.8076 0.0000 0.2739 0.3209
exerciseYes -3.5081 0.2168 -16.1782 0.0000 -3.9332 -3.0830
age_group35-49 1.4775 0.2875 5.1385 0.0000 0.9138 2.0411
age_group50-64 2.2587 0.2740 8.2434 0.0000 1.7216 2.7958
age_group65+ 3.4446 0.2652 12.9893 0.0000 2.9248 3.9645
income$100K+ -3.4166 0.4204 -8.1273 0.0000 -4.2407 -2.5925
income$25K-$49K -2.7055 0.2999 -9.0206 0.0000 -3.2935 -2.1176
income$50K- $99K -3.3886 0.2885 -11.7468 0.0000 -3.9541 -2.8231
depressionYes 1.7359 0.2373 7.3168 0.0000 1.2709 2.2010
sexFemale -0.4839 0.1766 -2.7403 0.0062 -0.8300 -0.1377
educationCollege graduate -0.2331 0.4395 -0.5305 0.5958 -1.0946 0.6283
educationHS/GED -0.1770 0.4387 -0.4035 0.6866 -1.0370 0.6830
educationSome college 0.4388 0.4407 0.9957 0.3194 -0.4251 1.3027
method_comparison <- tribble(
  ~Method, ~`Variables selected`, ~`Adj. R²`, ~AIC, ~BIC,
  "Maximum model",
    length(coef(model)) - 1,
    round(glance(model)$adj.r.squared, 4),
    round(AIC(model), 1),
    round(BIC(model), 1),
  "Backward (AIC)",
    length(coef(mod_backward)) - 1,
    round(glance(mod_backward)$adj.r.squared, 4),
    round(AIC(mod_backward), 1),
    round(BIC(mod_backward), 1),
  "Forward (AIC)",
    length(coef(mod_forward)) - 1,
    round(glance(mod_forward)$adj.r.squared, 4),
    round(AIC(mod_forward), 1),
    round(BIC(mod_forward), 1),
  "Stepwise (AIC)",
    length(coef(mod_stepwise)) - 1,
    round(glance(mod_stepwise)$adj.r.squared, 4),
    round(AIC(mod_stepwise), 1),
    round(BIC(mod_stepwise), 1)
)

method_comparison |>
  kable(caption = "Comparison of Variable Selection Methods") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Comparison of Variable Selection Methods
Method Variables selected Adj. R² AIC BIC
Maximum model 15 0.1974 55436.5 55555.3
Backward (AIC) 13 0.1973 55435.2 55540.0
Forward (AIC) 13 0.1973 55435.2 55540.0
Stepwise (AIC) 13 0.1973 55435.2 55540.0

All 3 models agreed on the same final model with 13 variables. All three models excluded the marital variables (maritalNevermarried and maritalPreviouslymarried). The remaining variables were menthlth_days, exerciseYes, all age groups, all income groups, depressionYes, sexFemale, and the education groups.

1.4 Final Model Selection and Interpretation

I chose the stepwise model as my final model. All three methods (backward, forward, and stepwise) ended with the same 13 variables which showed an improved model fit from the original maximum model.

tidy(mod_stepwise, conf.int = TRUE)
## # A tibble: 14 × 7
##    term                estimate std.error statistic   p.value conf.low conf.high
##    <chr>                  <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
##  1 (Intercept)            6.29     0.502     12.5   9.92e- 36    5.31      7.28 
##  2 menthlth_days          0.297    0.0120    24.8   6.10e-131    0.274     0.321
##  3 exerciseYes           -3.51     0.217    -16.2   5.96e- 58   -3.93     -3.08 
##  4 age_group35-49         1.48     0.288      5.14  2.83e-  7    0.914     2.04 
##  5 age_group50-64         2.26     0.274      8.24  1.94e- 16    1.72      2.80 
##  6 age_group65+           3.44     0.265     13.0   3.42e- 38    2.92      3.96 
##  7 income$100K+          -3.42     0.420     -8.13  5.05e- 16   -4.24     -2.59 
##  8 income$25K-$49K       -2.71     0.300     -9.02  2.31e- 19   -3.29     -2.12 
##  9 income$50K- $99K      -3.39     0.288    -11.7   1.33e- 31   -3.95     -2.82 
## 10 depressionYes          1.74     0.237      7.32  2.79e- 13    1.27      2.20 
## 11 sexFemale             -0.484    0.177     -2.74  6.15e-  3   -0.830    -0.138
## 12 educationCollege g…   -0.233    0.439     -0.531 5.96e-  1   -1.09      0.628
## 13 educationHS/GED       -0.177    0.439     -0.403 6.87e-  1   -1.04      0.683
## 14 educationSome coll…    0.439    0.441      0.996 3.19e-  1   -0.425     1.30

Holding all other variable constant, each additional day of poor mental health days is associated with an increase of 0.30 physically unhealth days (95% CI: 0.27, 0.32). This means that worse mental health is associated with worse physical health. Individuals who reported exercising in the past 30 days had 3.51 fewer physically unhealthy days in comparison to those who did not exercise, holding all other variables constant. Therefore, exercising is associated with better physical health. Holding all other variable constant, individuals with depression had 1.74 more physically unhealthy days in comparison to those without depression. This indicated that depression is associated with poor physical health outcomes.

1.5 LINE Assumption Check

par(mfrow = c(2, 2))
plot(mod_stepwise)

par(mfrow = c(1, 1))
  1. Residual vs. Fitted: There is evidence of mild non-linearlity since residuals are centered around zero with a little bit of a curve.

  2. Q-Q: The residuals don’t follow the normal distribution. The top tail leaves the straight line which indicates non-normality.

  3. Scale location: The spread of residuals increases, indicating some heteroscedasticity.

  4. Residuals vs Leverage: Based on the plot there are no highly influential observations because there are no extreme outliers, all points are clustered together.

Overall, the LINE assumptions are reasonably satisfied with some violations. There is non-normality and heteroscedasticity indicated by the Normal Q-Q and Scale-Location plots. Based on the Residuals vs Leverage plot there is no evidence of influential observations. There is some non-linearity present in the Residual vs Fitted plot but it mild.

Part 2:Logitic Regression (40 points)

2.1 Create the Binary Outcome

brfss_analytic <- brfss_analytic %>%
  mutate(
    frequent_distress= case_when(
      physhlth_days >= 14 ~ "Yes", 
      physhlth_days < 14 ~ "No",
      TRUE ~ NA_character_
    ),
    frequent_distress= factor(frequent_distress,
                              levels= c("No", "Yes"))
  )

brfss_analytic %>%
  count(frequent_distress) %>%
  mutate(percent = n/sum(n) * 100)
## # A tibble: 2 × 3
##   frequent_distress     n percent
##   <fct>             <int>   <dbl>
## 1 No                 6944    86.8
## 2 Yes                1056    13.2

1056 (13.2%) individuals reported frequent physical distress, greater than or equal to 14 physically unhealthy days. While 6944 (86.8%) individuals reported no frequent physical distress (fewer than 14 physically unhealthy days). Overall, there is a smaller portion of individuals experiencing frequent physical distress.

2.2 Simple (Unadjusted) Logistic Regression

I chose exercise from my final linear model as the predictor because it strongly related to physical health and was significantly associated with physically unhealthy days in the linear regression model.

mod_simple <- glm(frequent_distress ~ exercise,
data = brfss_analytic, family = binomial)

summary(mod_simple)
## 
## Call:
## glm(formula = frequent_distress ~ exercise, family = binomial, 
##     data = brfss_analytic)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.97554    0.05258  -18.55   <2e-16 ***
## exerciseYes -1.33472    0.06881  -19.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6242.7  on 7999  degrees of freedom
## Residual deviance: 5883.3  on 7998  degrees of freedom
## AIC: 5887.3
## 
## Number of Fisher Scoring iterations: 5
exp(coef(mod_simple)) 
## (Intercept) exerciseYes 
##   0.3769871   0.2632325
exp(confint(mod_simple))
##                 2.5 %    97.5 %
## (Intercept) 0.3397881 0.4175879
## exerciseYes 0.2300233 0.3012592

Odds ratio: 0.26 95% CI: 0.23, 0.30

In comparison to those who did not exercise, those who reported exercise have 0.26 times the odds of frequent mental distress, 95% CI[0.23, 0.30]. The association is statistically significant at a= 0.05 because the CI does not include 1.0.

2.3 Multiple Logisitic Regression

mod_logistic <- glm(frequent_distress ~ menthlth_days + depression + sex + exercise + age_group + income + education,
data = brfss_analytic, family = binomial)

summary(mod_logistic)
## 
## Call:
## glm(formula = frequent_distress ~ menthlth_days + depression + 
##     sex + exercise + age_group + income + education, family = binomial, 
##     data = brfss_analytic)
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)               -2.064007   0.199530 -10.344  < 2e-16 ***
## menthlth_days              0.067828   0.003926  17.278  < 2e-16 ***
## depressionYes              0.499690   0.087818   5.690 1.27e-08 ***
## sexFemale                 -0.107139   0.075060  -1.427    0.153    
## exerciseYes               -0.927579   0.077931 -11.903  < 2e-16 ***
## age_group35-49             0.811292   0.147113   5.515 3.49e-08 ***
## age_group50-64             1.050798   0.139227   7.547 4.44e-14 ***
## age_group65+               1.434023   0.135785  10.561  < 2e-16 ***
## income$100K+              -1.094527   0.199595  -5.484 4.16e-08 ***
## income$25K-$49K           -0.650636   0.102657  -6.338 2.33e-10 ***
## income$50K- $99K          -0.949700   0.101806  -9.329  < 2e-16 ***
## educationCollege graduate -0.159053   0.163952  -0.970    0.332    
## educationHS/GED           -0.007249   0.159248  -0.046    0.964    
## educationSome college      0.189565   0.159785   1.186    0.235    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6242.7  on 7999  degrees of freedom
## Residual deviance: 5110.2  on 7986  degrees of freedom
## AIC: 5138.2
## 
## Number of Fisher Scoring iterations: 5
tidy(mod_logistic, conf.int = TRUE, exponentiate = TRUE) |>
kable(digits = 3)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 0.127 0.200 -10.344 0.000 0.085 0.187
menthlth_days 1.070 0.004 17.278 0.000 1.062 1.078
depressionYes 1.648 0.088 5.690 0.000 1.386 1.956
sexFemale 0.898 0.075 -1.427 0.153 0.775 1.041
exerciseYes 0.396 0.078 -11.903 0.000 0.340 0.461
age_group35-49 2.251 0.147 5.515 0.000 1.692 3.014
age_group50-64 2.860 0.139 7.547 0.000 2.186 3.775
age_group65+ 4.196 0.136 10.561 0.000 3.232 5.505
income$100K+ 0.335 0.200 -5.484 0.000 0.224 0.490
income$25K-$49K 0.522 0.103 -6.338 0.000 0.427 0.638
income$50K- $99K 0.387 0.102 -9.329 0.000 0.317 0.472
educationCollege graduate 0.853 0.164 -0.970 0.332 0.621 1.181
educationHS/GED 0.993 0.159 -0.046 0.964 0.729 1.362
educationSome college 1.209 0.160 1.186 0.235 0.887 1.660

Holding all other variables constant, each additional day of poor mental health days is associated with 1.07 times the odds of frequent physical distress, 95% CI [1.06, 1.08]. Therefore, worse mental health is associated with greater odds of experiencing frequent physical distress. Individuals who reported being depressed have 1.65 times the odds of frequent physical distress in comparison to those who reported not being depressed, holding all other variables constant. Indicating that depression is associated with greater odds of frequent physical distress. In comparison to males, females have 0.40 times the odds of frequent mental distress, holding all other variables constant. This indicates that females have greater odds of frequent mental distress compared to males.

1.4 Likelihood Ratio Test

mod_reduced <- glm(frequent_distress ~ menthlth_days + depression + sex + exercise + age_group + income,
                   family= binomial,
                   data= brfss_analytic)

lr_test <- anova(mod_reduced, mod_logistic, test = "Chisq") 
lr_test |>
  kable(digits = 3, caption = "Likelihood Ratio Test: Full Model vs. Reduced") |>
  kable_styling(bootstrap_options = "striped", full_width = FALSE)
Likelihood Ratio Test: Full Model vs. Reduced
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
7989 5124.252 NA NA NA
7986 5110.171 3 14.081 0.003

H0: education isn’t associated with the frequent physical distress and doesnt improve the model H1: At least one education category is associated with frequent mental distress and improves the model fit

Residual deviance: 14.081 df: 3 p-value: 0.003

At a= 0.05, we reject the null hypothesis, so this group of predictors (education) significantly improves the model

2.5 ROC Curve and AUC

roc_obj <- roc(brfss_analytic$frequent_distress, fitted(mod_logistic))
plot(roc_obj, main = "ROC Curve: Frequent Physical Distress Model",
print.auc = TRUE)

auc(roc_obj)
## Area under the curve: 0.7877
ci.auc(roc_obj)
## 95% CI: 0.7725-0.8028 (DeLong)

AUC= 0.788 95% CI: [0.7725, 0.8028]

The AUC= 0.788, which indicates that the model has acceptable discrimination. This means the model has a decently good ability to discriminate between individuals with and without frequent physical distress.

2.6 Hosmer-Lemeshow Goodness-of-Fit Test

hoslem.test(mod_logistic$y, fitted(mod_logistic), g = 10)
## 
##  Hosmer and Lemeshow goodness of fit (GOF) test
## 
## data:  mod_logistic$y, fitted(mod_logistic)
## X-squared = 12.767, df = 8, p-value = 0.1201

H0: The model fits the data H1: The model does not fit the data well

t-statistic: 12.77 df: 8 p-value: 0.12

At alpha= 0.05, we fail to reject the null hypothesis, therefore there is no evidence of poor model fit. The Hosmer and Lemeshow goodness of fit test complements to ROC/AUC findings because it suggest that the model fits the data well as well as has the acceptable ability to distinguish between groups.

Part 3: Reflection

The results from this analysis suggest that the demographic and behavioral factors are associated with physical health burden among U.S. adults. Mentally unhealthy days and depression were the strongest predictors in both the linear and logistic models. Indicating that those with frequent mentally unhealthy days and depression have worse physically unhealthy days. This is cross-sectional survey data, so causation cannot be assumed due to correlation between predictors and the outcome because data was measured at the same time. Also, there are other potential confounders that were not measured in these models such as chronic conditions (other comorbidites) and behaviors such as smoking and alcohol consumption.

The linear regression model and logistic regression model give different information about the same research question. The linear regression model examined the effect predictors had on the continuous physical health outcome. The logistic regression model estimates how the predictors influenced the odds of frequent mental distress (binary outcome). Linear regression provide R^2 and the adjusted R^2 to examine how much variance a predictor provides in physically unhealthy days. Logistic regression allows for odds ratio and adjusted OR to be calculated and interpreted as well as AUC and calibration tests that examine discrimination and model fit. Linear regression is preferred when the outcome is continuous and logistic is preferred when it is binary.

During this assignment I sought AI assistance a few times to help with errors i was receiving when I tried to run my code. In every case it was a missing comma or misspelling that I couldn’t identify. I told AI i was using R studio and I was receivng an error, I would then input a copy of the error code and AI helped me with what R studio was having an issue with unexpected symbols that I did not realize were a result of a missing comma. Another issue I had was when I was doing the logistic regression. I was trying to use physhlth_days and family=binomial at the same time and it helped me realize I wasn’t using the correct variable because it wasn’t binary and frequent_distress was.