EPI 553 — Homework 3: Multiple Linear Regression, Tests of Hypoteheses, and Interaction Analysis

Submission: Knit this file to HTML, publish to RPubs with the title epi553_hw03_lastname_firstname, and paste the RPubs URL in the Brightspace submission comments. Also upload this .Rmd file to Brightspace.

AI Policy: AI tools are permitted on this assignment for debugging not coding. See the assignment description for full details.

Part 0: Data Preparation (10 points)

Section 6: Load required packages: tidyverse, haven, broom, kableExtra, car, summary, ggeffects

# Load required packages
library(kableExtra)
library(broom)
library(haven)
library(janitor)
library(plotly)
library(broom)
library(ggeffects)
library(ggstats)
library(gtsummary)
library(GGally)
library(car)
library(lmtest)
library(corrplot)
library(tidyverse)

Section 7: Import using read_xpt. Report the total number of rows and columns in the raw dataset

# Import the dataset — update the path if needed
brfss_raw <- read_xpt("LLCP2023.XPT") |> clean_names()

# Quick check
glimpse(brfss_raw)

## Rows: 433,323
## Columns: 350
## $ state    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ fmonth   <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ idate    <chr> "03012023", "01062023", "03082023", "03062023", "01062023", "…
## $ imonth   <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01", "…
## $ iday     <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08", "…
## $ iyear    <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "2023…
## $ dispcode <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1…
## $ seqno    <chr> "2023000001", "2023000002", "2023000003", "2023000004", "2023…
## $ psu      <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e…
## $ ctelenm1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pvtresd1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ colghous <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ statere1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ celphon1 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ ladult1  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ numadult <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1, 1…
## $ respslc1 <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, 1,…
## $ landsex2 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ lndsxbrt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ safetime <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ ctelnum1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cellfon5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cadult1  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cellsex2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ celsxbrt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pvtresd3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cclghous <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cstate1  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ landline <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hhadult  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sexvar   <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ genhlth  <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3, 3…
## $ physhlth <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30, 1…
## $ menthlth <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ poorhlth <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1, …
## $ primins1 <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2…
## $ persdoc3 <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2…
## $ medcost1 <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ checkup1 <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## $ exerany2 <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1, 1…
## $ exract12 <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, NA…
## $ exeroft1 <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105, N…
## $ exerhmm1 <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA, 1…
## $ exract22 <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 88,…
## $ exeroft2 <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, NA,…
## $ exerhmm2 <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, …
## $ strength <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888, 8…
## $ bphigh6  <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1, 1…
## $ bpmeds1  <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1, …
## $ cholchk3 <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2…
## $ toldhi3  <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2…
## $ cholmed3 <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2…
## $ cvdinfr4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ cvdcrhd4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ cvdstrk3 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2…
## $ asthma3  <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2…
## $ asthnow  <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ chcscnc1 <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ chcocnc1 <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1…
## $ chccopd3 <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2…
## $ addepev3 <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ chckdny2 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2, 2…
## $ havarth4 <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1…
## $ diabete4 <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1…
## $ diabage4 <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA, N…
## $ marital  <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3, 2…
## $ educa    <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4, 4…
## $ renthom1 <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ numhhol4 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2…
## $ numphon4 <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ cpdemo1c <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ veteran3 <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2…
## $ employ1  <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7, 7…
## $ children <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ income3  <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99, …
## $ pregnant <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ weight2  <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145, 1…
## $ height3  <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505, 5…
## $ deaf     <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2…
## $ blind    <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ decide   <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7, 2…
## $ diffwalk <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 1…
## $ diffdres <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2, 2…
## $ diffalon <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1…
## $ fall12mn <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1, 3…
## $ fallinj5 <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, 88…
## $ smoke100 <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1…
## $ smokday2 <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, NA,…
## $ usenow3  <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ ecignow2 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1, 4…
## $ alcday4  <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888, 1…
## $ avedrnk3 <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA, N…
## $ drnk3ge5 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA, N…
## $ maxdrnks <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA, N…
## $ flushot7 <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1…
## $ flshtmy3 <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 112…
## $ pneuvac4 <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1…
## $ shingle2 <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1…
## $ hivtst7  <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2, 1…
## $ hivtstd3 <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, NA…
## $ seatbelt <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ drnkdri2 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA, N…
## $ covidpo1 <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2…
## $ covidsm1 <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA, …
## $ covidact <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA…
## $ pdiabts1 <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8, …
## $ prediab2 <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3, …
## $ diabtype <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ insulin1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ chkhemo3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ eyeexam1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ diabeye1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ diabedu1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ feetsore <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthexer <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthedu  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lmtjoin3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthdis2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ joinpai2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsfirst <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcslast  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsnumcg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsctsc1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsscncr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsctwhn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadmam   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ howlong  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cervscrn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclcnc <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclpap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclhpv <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadhyst2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psatest1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psatime1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pcpsars2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psasugs1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pcstalk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadsigm4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colnsigm <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colntes1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sigmtes1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lastsig4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colncncr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ vircolo1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ vclntes2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ smalstol <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ stoltest <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ stooldn2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ bldstfit <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sdnates1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrdiff <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrage  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrtyp2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvtrt3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvdoc1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvsum  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvrtrn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvinst <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvinsr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvdein <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvclin <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvpain <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvctl2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ indortan <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ numburn3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sunprtct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wkdayout <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wkendout <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cimemlo1 <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2, …
## $ cdworry  <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA, N…
## $ cddiscu1 <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ cdhous1  <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ cdsocia1 <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ caregiv1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvrel4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvlng1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvhrs1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvprb3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvalzd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvper1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvhou1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvexpt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lastsmk2 <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, NA…
## $ stopsmk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mentcigs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mentecig <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ heattbco <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ firearm5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ gunload  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ loadulk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp1  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp2  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp3  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp4  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp5  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp6  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp6 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ firstaid <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aspirin  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ birthsex <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ somale   <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA, N…
## $ sofemale <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, 9,…
## $ trnsgndr <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
## $ marijan1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjsmok <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjeat  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjvape <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjdab  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjothr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ usemrjn4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedeprs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedrink <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedrugs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceprisn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedivrc <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acepunch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acehurt1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceswear <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acetouch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acetthem <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acehvsex <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceadsaf <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceadned <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ imfvpla4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hpvadvc4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hpvadsht <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ tetanus1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covidva1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covacge1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covidnu2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lsatisfy <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2, …
## $ emtsuprt <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2, …
## $ sdlonely <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2, …
## $ sdhemply <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ foodstmp <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhfood1 <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5, …
## $ sdhbills <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhutils <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhtrnsp <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhstre1 <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4, …
## $ rrclass3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrcognt2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrtreat  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rratwrk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrhcare4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrphysm2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsgend1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsxbrth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsrltn2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ casthdx2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ casthno2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ qstver   <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
## $ qstlang  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ metstat  <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1…
## $ urbstat  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ mscode   <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1, 1…
## $ ststr    <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 11011…
## $ strwt    <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158, …
## $ rawrake  <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1, 1…
## $ wt2rake  <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316, …
## $ imprace  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ chispnc  <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
## $ crace1   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cageg    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cllcpwt  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ dualuse  <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dualcor  <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, 0.…
## $ llcpwt2  <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.1640,…
## $ llcpwt   <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.3352,…
## $ rfhlth   <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1…
## $ phys14d  <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1, 2…
## $ ment14d  <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1…
## $ hlthpl1  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ hcvu653  <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9, 9…
## $ totinda  <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1, 1…
## $ metvl12  <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA, 3…
## $ metvl22  <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, N…
## $ maxvo21  <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875, 2…
## $ fc601    <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328, 3…
## $ actin13  <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, NA…
## $ actin23  <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ padur1   <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA, 6…
## $ padur2   <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, N…
## $ pafreq1  <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, NA,…
## $ pafreq2  <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA, N…
## $ minac12  <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA, …
## $ minac22  <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0, N…
## $ strfreq  <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000, 0…
## $ pamiss3  <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1, 1…
## $ pamin13  <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, NA…
## $ pamin23  <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, NA,…
## $ pa3min   <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, NA…
## $ pavig13  <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, 12…
## $ pavig23  <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA, N…
## $ pa3vigm  <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA, …
## $ pacat3   <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9, 9…
## $ paindx3  <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1, 1…
## $ pa150r4  <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1, 1…
## $ pa300r4  <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9, 9…
## $ pa30023  <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9, 9…
## $ pastrng  <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1…
## $ parec3   <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2, 1…
## $ pastae3  <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1…
## $ rfhype6  <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2…
## $ cholch3  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ rfchol3  <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1…
## $ michd    <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ ltasth1  <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1…
## $ casthm1  <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1, 1…
## $ asthms1  <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3, 3…
## $ drdxar2  <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1…
## $ mrace1   <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ hispanc  <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ race     <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ raceg21  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ racegr3  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ raceprv  <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ sex      <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ ageg5yr  <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 13,…
## $ age65yr  <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2…
## $ age80    <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79, 8…
## $ age_g    <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6…
## $ htin4    <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62, 6…
## $ htm4     <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165, 1…
## $ wtkg3    <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257, 1…
## $ bmi5     <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296, 4…
## $ bmi5cat  <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3, 4…
## $ rfbmi5   <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2…
## $ chldcnt  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ educag   <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2, 2…
## $ incomg1  <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4, 3…
## $ smoker3  <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3, 3…
## $ rfsmok3  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## $ cureci2  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ drnkany6 <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2…
## $ drocdy4  <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 33,…
## $ rfbing6  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ drnkwk2  <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, 0,…
## $ rfdrhv8  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ flshot7  <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2, 1…
## $ pneumo3  <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1, 1…
## $ aidtst4  <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1…
## $ rfseat2  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ rfseat3  <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ drnkdrv  <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2, 9…

tibble(Metric = c("Observations", "Variables"),
       Value  = c(nrow(brfss_raw), ncol(brfss_raw))) |> 
  kable(caption = "Full Dataset Dimensions") |> 
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Full Dataset Dimensions
Metric	Value
Observations	433323
Variables	350

Section 8: Select the nine variables listed in the table

# Select only the variables we want to work with
brfss_select <- brfss_raw |> 
  select(menthlth, physhlth, bmi5, sexvar, exerany2, ageg5yr, incomg1, educa, smoker3) 

tibble(Metric = c("Observations", "Variables"),
       Value  = c(nrow(brfss_select), ncol(brfss_select))) |> 
  kable(caption = "Only Necessary Dataset Dimensions") |> 
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Only Necessary Dataset Dimensions
Metric	Value
Observations	433323
Variables	9

# Save necessary data and variables as an RDS file
saveRDS(brfss_select,
  "/Users/zoya_hayes/Desktop/EPI553/EPI553_Coding/Assignments/HW03/brfss_2023_select.rds")

Section 9: Recode all variables as described. Store all categorical variables as labeled factors

# Recode all the variables selected
brfss_recode <- brfss_select |> 
  mutate(
    # Outcome: Mentally unheathy days in past 30 
    menthlth_days = case_when(
      menthlth == 88                    ~ 0,
      menthlth >= 1 & menthlth <= 30   ~ as.numeric(menthlth),
      menthlth %in% c(77, 99) ~ NA_real_,
      TRUE                             ~ NA_real_
    ),
    # Physical health days
    physhlth_days = case_when(
      physhlth == 88                    ~ 0,
      physhlth >= 1 & physhlth <= 30   ~ as.numeric(physhlth),
      physhlth %in% c(77, 99) ~ NA_real_,
      TRUE                             ~ NA_real_
    ),
    # BMI
     bmi = case_when(
     bmi5 == 9999 ~ NA_real_,
    TRUE         ~ bmi5 / 100
    ),
    # Sex
    sex = factor(sexvar, levels = c(1, 2), labels = c("Male", "Female")),
    # Exercise in past 30 days
    exercise = factor(case_when(
      exerany2 == 2 ~ "No",
      exerany2 == 1 ~ "Yes",
      TRUE          ~ NA_character_
    ), levels = c("No", "Yes")),
    # Age
      age_group = factor(case_when(
      ageg5yr == 1 ~ "18-24",
      ageg5yr == 2 ~ "25-29",
      ageg5yr == 3 ~ "30-34",
      ageg5yr == 4 ~ "35-39",
      ageg5yr == 5 ~ "40-44",
      ageg5yr == 6 ~ "45-49",
      ageg5yr == 7 ~ "50-54",
      ageg5yr == 8 ~ "55-59",
      ageg5yr == 9 ~ "60-64",
      ageg5yr == 10 ~ "65-69",
      ageg5yr == 11 ~ "70-74",
      ageg5yr == 12 ~ "75-79",
      ageg5yr == 13 ~ "80+",
      ageg5yr == 14 ~ NA_character_,
      TRUE          ~ NA_character_
      ), 
      levels = c("18-24", "25-29","30-34","35-39","40-44", "45-49","50-54","55-59","60-64", "65-69", "70-74", "75-79", "80+")),
    # Income
      income = factor(case_when(
      incomg1 == 1 ~ "Less than $15,000",
      incomg1 == 2 ~ "$15,000 to < $25,000",
      incomg1 == 3 ~ "$25,000 to < $35,000",
      incomg1 == 4 ~ "$35,000 to < $50,000",
      incomg1 == 5 ~ "$50,000 to < $100,000", 
      incomg1 == 6 ~ "$100,000 to < $200,000", 
      incomg1 == 7 ~ "More than $200,000",
      incomg1 == 9 ~ NA_character_,
      TRUE          ~ NA_character_
      ), 
      levels = c("Less than $15,000", "$15,000 to < $25,000", "$25,000 to < $35,000", "$35,000 to < $50,000", "$50,000 to < $100,000", "$100,000 to < $200,000", "More than $200,000")), 
    # Education 
      education = factor(case_when(
      educa %in% c(1, 2) ~ "Less than high school",
      educa == 3 ~ "High school diploma or GED",
      educa == 4 ~ "Some college or technical school",
      educa == 5 ~ "College graduate",
      educa == 6 ~ "Graduate or professional degree",
      educa == 9 ~ NA_character_,
      TRUE          ~ NA_character_), 
      levels = c("Less than high school", "High school diploma or GED", "Some college or technical school", "College graduate", "Graduate or professional degree")),
    # Smoker Status 
      smoking = factor(case_when(
        smoker3 == 1 ~ "Current daily smoker",
        smoker3 == 2 ~ "Current some-day smoker",
        smoker3 == 3 ~ "Former smoker",
        smoker3 == 4 ~ "Never smoker",
        smoker3 == 9 ~ NA_character_,
        TRUE          ~ NA_character_),
        levels = c("Current daily smoker", "Current some-day smoker", "Former smoker", "Never smoker")),
  )

brfss_analytic <- brfss_recode |> 
  select (menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking)

Section 10: Report the number and percentage of missing observations for menthlth_days and at least two other key variables, before removing any missing value

# Report missing values for the key variables and print out total sample size
missing_summary <- data.frame(
  Variable = names(brfss_analytic),
  Missing_Count = colSums(is.na(brfss_analytic)),
  Missing_Percent = round(colSums(is.na(brfss_analytic)) / nrow(brfss_analytic) * 100, 2)
)

# Show variables with the most missing data
print(missing_summary[order(-missing_summary$Missing_Count), ][1:9, ]) |>  
  kable(
    caption = "Variables by Missingness"
  ) |> 
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width = FALSE,
    position = "center"
  )

##                    Variable Missing_Count Missing_Percent
## income               income         86623           19.99
## bmi                     bmi         40535            9.35
## smoking             smoking         23062            5.32
## physhlth_days physhlth_days         10785            2.49
## menthlth_days menthlth_days          8108            1.87
## age_group         age_group          7779            1.80
## education         education          2325            0.54
## exercise           exercise          1251            0.29
## sex                     sex             0            0.00

Variables by Missingness
	Variable	Missing_Count	Missing_Percent
income	income	86623	19.99
bmi	bmi	40535	9.35
smoking	smoking	23062	5.32
physhlth_days	physhlth_days	10785	2.49
menthlth_days	menthlth_days	8108	1.87
age_group	age_group	7779	1.80
education	education	2325	0.54
exercise	exercise	1251	0.29
sex	sex	0	0.00

The missing count for menthlth_days is 8108 or 1.87%. For BMI, the missing count is 40,535 with 9.35% missing. Income also has a relatively high missing count with 86,623 missing or 19.99%.

Section 11: Create the analytic dataset using drop_na() and slice_sample() with set.seed(1220). Report the final analytic n.

# Create analytic dataset
set.seed(1220)
brfss_analytic <- brfss_analytic |> 
  drop_na(menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking) |> 
  slice_sample(n = 8000)

# Display subset dimensions
cat("Working subset dimensions:",
    nrow(brfss_analytic), "observations,",
    ncol(brfss_analytic), "variables\n")

## Working subset dimensions: 8000 observations, 9 variables

# Save for later use
saveRDS(brfss_analytic,
  "/Users/zoya_hayes/Desktop/EPI553/EPI553_Coding/Assignments/HW03/brfss_2023_analytic.rds")

# Print out total sample size
cat("Final Analytic N:", nrow(brfss_analytic), "\n")

## Final Analytic N: 8000

tibble(Metric = c("Observations", "Variables"),
       Value  = c(nrow(brfss_analytic), ncol(brfss_analytic))) |> 
  kable(caption = "Analytic Dataset Dimensions") |> 
  kable_styling(bootstrap_options = "striped", full_width = FALSE)

Analytic Dataset Dimensions
Metric	Value
Observations	8000
Variables	9

The final analytic n is 8,000.

Section 12: Produce a descriptive statistics table using gtsummary::tbl_summary(), stratified by sex.

# Create Table 1 Descriptive Statistics
brfss_analytic |>  
  select(menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking) |>  
  tbl_summary(
    by = sex,
    type = list(exercise ~ "categorical"),
    label = list(
      menthlth_days ~ "Mentally Unhealthy Days (Last 30 Days)",
      physhlth_days ~ "Physically Unhealthy Days (Last 30 Days)",
      bmi ~ "Body Mass Index (kg/m2)",
      exercise ~ "Exercise Status",
      age_group ~ "Age Category",
      income ~ "Income Category",
      education ~ "Education Category",
      smoking ~ "Smoking Status"
    ),
    statistic = list(
      all_continuous() ~ "{mean} ({sd})",
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = all_continuous() ~ 1,
    missing = "no"
  ) |> 
  add_n() |> 
  bold_labels() |> 
  italicize_levels() |> 
  modify_caption("**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**") |>  
  modify_footnote(
    all_stat_cols() ~ 
      "Mean (SD); n(%); Analytic sample includes 8,000 complete-case observations. Missing values were excluded listwise for all variables included in this table."
  ) |> 
  as_flex_table()

**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**
Characteristic	N	Male N = 3,9361	Female N = 4,0641
Mentally Unhealthy Days (Last 30 Days)	8,000	3.5 (7.5)	5.1 (8.6)
Physically Unhealthy Days (Last 30 Days)	8,000	4.0 (8.4)	4.9 (8.9)
Body Mass Index (kg/m2)	8,000	28.7 (6.0)	28.7 (7.0)
Exercise Status	8,000
No		790 (20%)	970 (24%)
Yes		3,146 (80%)	3,094 (76%)
Age Category	8,000
18-24		235 (6.0%)	171 (4.2%)
25-29		219 (5.6%)	189 (4.7%)
30-34		253 (6.4%)	210 (5.2%)
35-39		263 (6.7%)	302 (7.4%)
40-44		290 (7.4%)	292 (7.2%)
45-49		266 (6.8%)	252 (6.2%)
50-54		305 (7.7%)	303 (7.5%)
55-59		308 (7.8%)	352 (8.7%)
60-64		408 (10%)	379 (9.3%)
65-69		418 (11%)	483 (12%)
70-74		382 (9.7%)	426 (10%)
75-79		325 (8.3%)	338 (8.3%)
80+		264 (6.7%)	367 (9.0%)
Income Category	8,000
Less than $15,000		160 (4.1%)	247 (6.1%)
$15,000 to < $25,000		271 (6.9%)	370 (9.1%)
$25,000 to < $35,000		376 (9.6%)	495 (12%)
$35,000 to < $50,000		482 (12%)	585 (14%)
$50,000 to < $100,000		1,251 (32%)	1,260 (31%)
$100,000 to < $200,000		996 (25%)	869 (21%)
More than $200,000		400 (10%)	238 (5.9%)
Education Category	8,000
Less than high school		75 (1.9%)	49 (1.2%)
High school diploma or GED		130 (3.3%)	122 (3.0%)
Some college or technical school		950 (24%)	877 (22%)
College graduate		1,018 (26%)	1,120 (28%)
Graduate or professional degree		1,763 (45%)	1,896 (47%)
Smoking Status	8,000
Current daily smoker		339 (8.6%)	319 (7.8%)
Current some-day smoker		151 (3.8%)	117 (2.9%)
Former smoker		1,207 (31%)	1,055 (26%)
Never smoker		2,239 (57%)	2,573 (63%)
1Mean (SD); n(%); Analytic sample includes 8,000 complete-case observations. Missing values were excluded listwise for all variables included in this table.

Part 1: Multiple Linear Regression (25 points)

Research Question: What is the independent association of each predictor with the number of mentally unhealthy days in the past 30 days

Step 1: Fit a multiple linear regression model with menthlth_days as the outcome and the following predictors: physhlth_days, bmi, sex, exercise, age_group, income, education, and smoking. Display the full summary() output.

# Fitting full multiple linear regression 
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education + smoking, data = brfss_analytic)

tidy(model_full, conf.int = TRUE) |> 
  mutate(
    term     = dplyr::recode(term,
      "(Intercept)" = "Intercept",
      "physhlth_days" = "Physical health days",
      "bmi"           = "Body Mass Index",
      "sexFemale"     = "Sex: Female (ref = Male)",
      "exerciseYes"   = "Exercise: Yes (ref = No)",
    ),
    across(where(is.numeric), ~ round(., 3))
  ) |> 
  kable(
    caption  = "Table 3. Multiple Linear Regression: Mentally Unhealthy Days ~ Multiple Predictors (BRFSS 2023, n = 8,000)",
    col.names = c("Term", "Estimate(β)", "Std. Error", "t-statistic",
                  "p-value", "95% CI Lower", "95% CI Upper")
  ) |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Table 3. Multiple Linear Regression: Mentally Unhealthy Days ~ Multiple Predictors (BRFSS 2023, n = 8,000)
Term	Estimate(β)	Std. Error	t-statistic	p-value	95% CI Lower	95% CI Upper
Intercept	9.651	0.954	10.115	0.000	7.780	11.521
Physical health days	0.266	0.010	26.384	0.000	0.246	0.285
Body Mass Index	0.033	0.013	2.527	0.012	0.007	0.059
Sex: Female (ref = Male)	1.390	0.168	8.301	0.000	1.062	1.719
Exercise: Yes (ref = No)	-0.651	0.215	-3.033	0.002	-1.072	-0.230
age_group25-29	-1.057	0.519	-2.034	0.042	-2.075	-0.038
age_group30-34	-1.094	0.506	-2.160	0.031	-2.087	-0.101
age_group35-39	-1.811	0.489	-3.707	0.000	-2.769	-0.853
age_group40-44	-2.893	0.487	-5.935	0.000	-3.849	-1.937
age_group45-49	-3.056	0.498	-6.141	0.000	-4.032	-2.081
age_group50-54	-3.517	0.483	-7.277	0.000	-4.464	-2.569
age_group55-59	-4.496	0.476	-9.454	0.000	-5.428	-3.564
age_group60-64	-4.522	0.458	-9.863	0.000	-5.421	-3.623
age_group65-69	-5.578	0.450	-12.390	0.000	-6.460	-4.695
age_group70-74	-6.025	0.457	-13.173	0.000	-6.922	-5.129
age_group75-79	-6.287	0.475	-13.239	0.000	-7.217	-5.356
age_group80+	-6.820	0.477	-14.302	0.000	-7.754	-5.885
income$15,000 to < $25,000	-1.637	0.469	-3.491	0.000	-2.556	-0.718
income$25,000 to < $35,000	-2.076	0.448	-4.635	0.000	-2.955	-1.198
income$35,000 to < $50,000	-2.547	0.438	-5.812	0.000	-3.406	-1.688
income$50,000 to < $100,000	-3.050	0.411	-7.428	0.000	-3.856	-2.245
income$100,000 to < $200,000	-3.500	0.429	-8.154	0.000	-4.341	-2.658
incomeMore than $200,000	-3.784	0.500	-7.563	0.000	-4.765	-2.803
educationHigh school diploma or GED	0.080	0.811	0.099	0.921	-1.509	1.669
educationSome college or technical school	1.077	0.690	1.561	0.119	-0.275	2.429
educationCollege graduate	1.791	0.691	2.591	0.010	0.436	3.146
educationGraduate or professional degree	1.738	0.693	2.509	0.012	0.380	3.095
smokingCurrent some-day smoker	-1.587	0.535	-2.965	0.003	-2.636	-0.538
smokingFormer smoker	-1.990	0.337	-5.902	0.000	-2.651	-1.329
smokingNever smoker	-2.937	0.322	-9.131	0.000	-3.567	-2.306

Step 2: Write the fitted regression equation including all terms, with coefficients rounded to three decimal places.

The model is:

Menthlth_days = 9.651 + 0.266(physhlth_days) + 0.033(bmi) + 1.390(sex: Female, ref = male) + -0.651(exercise: Yes, ref = no), + -1.057(age_group25-29, ref = 18-24) + -1.094(age_group30-34, ref = 18-24) + -1.811(age_group35-39, ref = 18-24) + -2.893(age_group40-44, ref = 18-24) + -3.056(age_group45-49, ref = 18-24) + -3.517(age_group50-54, ref = 18-24) + -4.496(age_group55-59, ref = 18-24) + -4.522(age_group60-64, ref = 18-24) + -5.578(age_group65-69, ref = 18-24) + -6.025(age_group70-74, ref = 18-24) + -6.287(age_group75-79, ref = 18-24) + -6.820(age_group80+, ref = 18-24) + - 1.637(income15,000 to < 25,000, ref less than 15,000) + - 2.076(income25,000 to < 35,000, ref less than 15,000) + - 2.547(income35,000 to < 50,000, ref less than 15,000) + - 3.050(income50,000 to < 100,000, ref less than 15,000) + - 3.500(income100,000 to < 200,000, ref less than 15,000) + - 3.784(incomeMore than 200,000, ref less than 15,000) + 0.080(educationHigh school diploma or GED, ref less than high school) + 1.077(educationSome college or technical school, ref less than high school) + 1.791(educationCollege graduate, ref less than high school) + 1.738(educationGraduate or professional degree, ref less than high school) + - 1.587(smokingCurrent some-day smoker, ref = current everyday smoker) + - 1.990(smokingFormer smoker, ref = current everyday smoker) + - 2.937(smokingNever smoker, ref = current everyday smoker)

Step 3: Interpret the following coefficients in plain language. For each, state the direction, magnitude, and meaning in terms of mentally unhealthy days, holding all other variables constant

physhlth_days ($\hat{\beta}$ = 0.266): Each additional day of poor physical health is associated with an estimated 0.266 additional mentally unhealthy days on average, holding bmi, sex, exercise, age, income,education, and smoking constant.

bmi($\hat{\beta}$ = 0.033): Each one unit increase in bmi, is associated with an estimated 0.033 additional mentally unhealthy days on average, adjusting for all other covariates.

sex: Female vs. Male(reference) ($\hat{\beta}$ = 1.39): Compared to males (the reference group), females report an estiamted 1.39 more mentally unhealthy days on average, holding all other varaibles constant.

exercise: Yes vs. No(reference) ($\hat{\beta}$ = -0.651): People who engaged in any physical activity in the past 30 days report an estimated 0.651 fewer mentally unhealthy days compared to those who did not exercise, adjusting for all other covariates.

Age_group 80+ ($\hat{\beta}$ = -6.820): Compared to 18-24 years olds (reference group), individuals aged 80 and older report 6.820 fewer mentally unhealthy days on average, holding all other variables constant.

Income 15,000 - < 25,000 (ref = less than 15,000) ($\hat{\beta}$ = -1.637): Compared to the less than 15,000 (reference group), individuals who make 15,000 to 25,000 report 1.637 fewer mentally unhealthy days on average, holding all other variables constant.

Income More than 200,000 (ref = less than 15,000) ($\hat{\beta}$ = -3.784): Compared to the less than 15,000 (reference group), individuals who make more than 200,000 report 3.784 fewer mentally unhealthy days on average, holding all other variables constant.

Step 4:Report and interpret each of the following model-level statistics:

• R-squared: proportion of variance in mentally unhealthy days explained by all predictors • Adjusted R-squared: how it differs from R-squared and what it adds • Root MSE (residual standard error): average prediction error of the model • Overall F-test: state H0, F-statistic, degrees of freedom (numerator and denominator), p-value, and conclusion

glance(model_full) |> 
  select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) |> 
  mutate(across(where(is.numeric), ~ round(., 4))) |> 
  pivot_longer(everything(), names_to = "Statistic", values_to = "Value") |> 
  mutate(Statistic = dplyr::recode(Statistic,
    "r.squared"     = "R²",
    "adj.r.squared" = "Adjusted R²",
    "sigma"         = "Residual Std. Error (Root MSE)",
    "statistic"     = "F-statistic",
    "p.value"       = "p-value (overall F-test)",
    "df"            = "Model df (p)",
    "df.residual"   = "Residual df (n − p − 1)",
    "nobs"          = "n (observations)"
  )) |> 
  kable(caption = "Overall Model Summary — Full Model") |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Overall Model Summary — Full Model
Statistic	Value
R²	0.1853
Adjusted R²	0.1824
Residual Std. Error (Root MSE)	7.3516
F-statistic	62.5234
p-value (overall F-test)	0.0000
Model df (p)	29.0000
Residual df (n − p − 1)	7970.0000
n (observations)	8000.0000

R-squared = 0.1853 It appears that the full model with all predictors explains 18.53% of the variance in mentally unhealthy days. R-squared is not the best to use because it always increases when you add a predictor, even if it is random noise and should not be used to compare models with different numbers of predictors.

Adjusted R-squared = 0.1824 It appears that the full model with all predictors explains 18.24% of the variance in mentally unhealthy days when using adjusted r-squared which penalizes for the number of predictors and only increase when a new predictor improves the model by more than chance alone.

Root MSE (residual standard error) = 7.3516 On average, the full models prediction is off by 7.3156 mentally unhealthy days. Root MSE is the estimated standard deviation of errors and also goes by the residual standard error.

Overall F-test state H0, F-statistic, degrees of freedom (numerator and denominator), p-value, and conclusion

\[H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 \quad \text{(no predictor is associated with mentally unhealthy days)}\]

\[H_1: \text{At least one } \beta_j \neq 0\] The f-statistic is 62.5234. The numerator degrees of freedom are 29 and the denominator degrees of freedom are 7970. The p-value is less than .001 for the overall F-test. In this case, we would reject the null hypothesis that no predictor is associated with mentally unhealthy days. We would instead consider the evidence that at least one predictor is significantly associated with mentally unhealthy days (alternative hypothesis). —

Part 2: Tests of Hypotheses (20 points)

Step 1 — Obtain Type III (partial) sums of squares for all predictors using car::Anova(model, type = “III”). Display the full output. Identify which predictors have statistically significant partial associations at α = 0.05.

# Type III using car::Anova()
anova_type3 <- Anova(model_full, type = "III")

anova_type3 |> 
  as.data.frame() |> 
  rownames_to_column("Source") |> 
  mutate(across(where(is.numeric), ~ round(., 4))) |> 
  kable(
    caption = "Type III (Partial) Sums of Squares — car::Anova()",
    col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
  ) |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Type III (Partial) Sums of Squares — car::Anova()
Source	Sum of Sq	df	F value	p-value
(Intercept)	5529.7722	1	102.3152	0.0000
physhlth_days	37622.7952	1	696.1198	0.0000
bmi	345.2408	1	6.3879	0.0115
sex	3723.8662	1	68.9012	0.0000
exercise	497.0434	1	9.1966	0.0024
age_group	30092.1774	12	46.3986	0.0000
income	4733.8943	6	14.5982	0.0000
education	1265.1504	4	5.8521	0.0001
smoking	5101.1076	3	31.4613	0.0000
Residuals	430750.0872	7970	NA	NA

All predictors have statistically significant partial associations at alpha = 0.05 this includes physhlth_days, bmi, sex, exercise, age_group, income, education, and smoking.

Step 2 — Chunk test for income: Test whether income collectively improves the model significantly, after accounting for all other predictors:

Section 13: Fit a reduced model that includes all predictors except income.

# Reduced model: Without Income
model_reduced <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + education + smoking, data = brfss_analytic)

# Full model: With Income
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + education + smoking + income, data = brfss_analytic)

Section 14: Use anova(model_reduced, model_full) to compare the two models.

# Chunk Test Income
anova(model_reduced, model_full) |> 
  as.data.frame() |> 
  rownames_to_column("Model") |> 
  mutate(
    Model = c("Reduced (without income)", "Full (+ income)"),
    across(where(is.numeric), ~ round(., 4))
  ) |> 
  kable(
    caption = "Chunk Test: Does income collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Chunk Test: Does income collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (without income)	7976	435484.0	NA	NA	NA	NA
Full (+ income)	7970	430750.1	6	4733.894	14.5982	0

Section 15: State H0 and HA for this test clearly

\[H_0: \beta_{\text{income}} = 0\text{(income has no effect on the relationship, holding all other variables constant)}\] \[H_1: \beta_{\text{income}} \neq 0\text{(income has a signficant effect on the relationship)}\] Section 16: Report the F-statistic, degrees of freedom, and p-value.

The f-statistic is 14.5982, the numerator degrees of freedom was 6 and the denominator degrees of freedom was 7970. The p-value for this chunk test was less than 0.001.

Section 17: State your conclusion at α = 0.05.

In this case, we would reject the null hypothesis that income does not have a significant effect on the relationship holding all else constant. We would instead consider the evidence that income has a significant effect on the relationship with mentally unhealthy days holding all else constant (alternative hypothesis).

Step 3 — Chunk Test for Education: Repeat the same procedure for education as a group of predictors.

# Reduced model: Without Education
model_reduced <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + smoking, data = brfss_analytic)

# Full model: With Education
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education + smoking, data = brfss_analytic)

# Chunk Test Education
anova(model_reduced, model_full) |> 
  as.data.frame() |> 
  rownames_to_column("Model") |> 
  mutate(
    Model = c("Reduced (without education)", "Full (+ education)"),
    across(where(is.numeric), ~ round(., 4))
  ) |> 
  kable(
    caption = "Chunk Test: Does education collectively add to the model?",
    col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
  ) |> 
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Chunk Test: Does education collectively add to the model?
Model	Res. df	RSS	df	Sum of Sq	F	p-value
Reduced (without education)	7974	432015.2	NA	NA	NA	NA
Full (+ education)	7970	430750.1	4	1265.15	5.8521	1e-04

Chunk Test Education State H0 and HA for this test clearly

\[H_0: \beta_{\text{education}} = 0\text{(education has no effect on the relationship, holding all other variables constant)}\] \[H_1: \beta_{\text{education}} \neq 0\text{(education has a signficant effect on the relationship)}\] Chunk Test Education Report the F-statistic, degrees of freedom, and p-value.

The f-statistic is 5.8521, the numerator degrees of freedom was 4 and the denominator degrees of freedom was 7970. The p-value for this chunk test was 1e-04 which is less than .001.

Chunk Test Education State your conclusion at α = 0.05.

In this case, we would reject the null hypothesis that education does not have a significant effect on the relationship holding all else constant. We would instead consider the evidence that education has a significant effect on the relationship with mentally unhealthy days holding all else constant (alternative hypothesis).

Step 4 Write a 3 to 5 sentence synthesis interpreting the Type III results and your chunk test findings. Which predictors make the strongest independent contributions? What do the chunk tests add beyond the individual t-tests from summary()?

According to the Type III results, all of the predictors have partial assocaitions that are significant at the α = 0.05 level. The predictor that makes the strongest independent contribution is physhlth_days because it produces the larges Type III sum of square (37622.7952) and the largest f-value (696.1198) and sex also has a relatively large f-value of 68.9012. According to the chunk tests performed, income and education do have a significant effect on the relationship with mentally unhealthy days holding all else constant. Chunk tests are great to use because they can assess a group of variables impact together similar to the omnibus chi-squared test earlier in the semester as well as being able to test all levels of a categorical variable. If using individual t-tests, type I error increases while chunk tests can test everyone at one significance level.

Part 3: Interaction Analysis (25 points)

Step 1: Create a binary smoking variable called smoker_current:

• “Current smoker” includes current daily smokers and current some-day smokers. • “Non-smoker” includes former smokers and never smokers. Use “Non-smoker” as the reference category.

brfss_smoker <- brfss_analytic |> 
  mutate(
    smoker_current = factor(case_when(
      smoking %in% c ("Former smoker", "Never smoker") ~ "Non-Smoker",
      smoking %in% c("Current daily smoker", "Current some-day smoker") ~ "Current Smoker"),
    levels = c("Non-Smoker", "Current Smoker")
    )
  )

Step 2: Fit two models:

• Model A (main effects only): regress menthlth_days on physhlth_days, bmi, sex, smoker_current, exercise, age_group, income, and education. • Model B (with interaction): same as Model A, but use sex * smoker_current in the formula to include the interaction term.

# Model with Main Effects Only
model_A <- lm(menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex + smoker_current, data = brfss_smoker)

# Model with interaction
model_B <- lm(menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex * smoker_current, data = brfss_smoker)

Step 3: Test whether the interaction is statistically significant:

Section 18: Use anova(model_A, model_B) to compare the two models.

# Test for Interaction
anova(model_A, model_B) |>
  tidy() |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(caption = "Partial F-test: Is the Sex x Smoker Interaction Significant?") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Partial F-test: Is the Sex x Smoker Interaction Significant?
term	df.residual	rss	df	sumsq	statistic	p.value
menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex + smoker_current	7972	432508.9	NA	NA	NA	NA
menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex * smoker_current	7971	432174.9	1	333.9749	6.1598	0.0131

Section 19: State H0 and HA for this test.

\[H_0:\beta_3 = 0\text{(slopes are equal, lines are parallel, no interaction between sex and smoker status)}\] \[H_1:\beta_3 \neq 0\text{(slopes differ, interaction is present between sex and smoker status)}\] Section 20: Report the F-statistic, degrees of freedom, and p-value.

The f-statistic is 6.1598, the numerator degrees of freedom are 1, the denominator degrees of freedom ar 7971, and the p-value is 0.0131.

Section 21: State your conclusion at α = 0.05.

Since the p-value is less than the conventional α = 0.05 threshold, we would reject the null hypothesis that the sex slope is equal across all smoker levels and instead cosnider the evidence that an interaction is present between sex and smoker status.

Step 4: Interpret the interaction using the coefficients from Model B:

tidy(model_B, conf.int = TRUE) |>
  mutate(across(where(is.numeric), \(x) round(x, 4))) |>
  kable(
    caption = "Interaction Model: Sex * Smoker_Current → Mentally UnHealthy Days",
    col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
  ) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Interaction Model: Sex * Smoker_Current → Mentally UnHealthy Days
Term	Estimate	SE	t	p-value	CI Lower	CI Upper
(Intercept)	6.8994	0.9286	7.4301	0.0000	5.0791	8.7196
physhlth_days	0.2686	0.0101	26.6788	0.0000	0.2489	0.2883
bmi	0.0331	0.0132	2.5017	0.0124	0.0072	0.0590
exerciseYes	-0.6728	0.2150	-3.1301	0.0018	-1.0942	-0.2515
age_group25-29	-0.9202	0.5196	-1.7709	0.0766	-1.9388	0.0984
age_group30-34	-0.8924	0.5059	-1.7640	0.0778	-1.8841	0.0993
age_group35-39	-1.5929	0.4875	-3.2673	0.0011	-2.5485	-0.6372
age_group40-44	-2.6286	0.4856	-5.4127	0.0000	-3.5806	-1.6766
age_group45-49	-2.8425	0.4969	-5.7209	0.0000	-3.8165	-1.8686
age_group50-54	-3.2778	0.4820	-6.8012	0.0000	-4.2226	-2.3331
age_group55-59	-4.2499	0.4740	-8.9652	0.0000	-5.1791	-3.3206
age_group60-64	-4.2640	0.4567	-9.3364	0.0000	-5.1593	-3.3688
age_group65-69	-5.2506	0.4466	-11.7563	0.0000	-6.1261	-4.3751
age_group70-74	-5.7111	0.4544	-12.5686	0.0000	-6.6018	-4.8203
age_group75-79	-5.9076	0.4702	-12.5646	0.0000	-6.8292	-4.9859
age_group80+	-6.4995	0.4736	-13.7239	0.0000	-7.4278	-5.5711
income$15,000 to < $25,000	-1.6357	0.4700	-3.4804	0.0005	-2.5570	-0.7144
income$25,000 to < $35,000	-2.0746	0.4486	-4.6243	0.0000	-2.9540	-1.1952
income$35,000 to < $50,000	-2.5455	0.4392	-5.7964	0.0000	-3.4064	-1.6847
income$50,000 to < $100,000	-3.0430	0.4116	-7.3935	0.0000	-3.8498	-2.2362
income$100,000 to < $200,000	-3.5097	0.4300	-8.1623	0.0000	-4.3526	-2.6668
incomeMore than $200,000	-3.8405	0.5010	-7.6651	0.0000	-4.8226	-2.8583
educationHigh school diploma or GED	0.1256	0.8124	0.1546	0.8772	-1.4669	1.7180
educationSome college or technical school	1.1179	0.6912	1.6172	0.1059	-0.2371	2.4729
educationCollege graduate	1.8179	0.6928	2.6239	0.0087	0.4598	3.1760
educationGraduate or professional degree	1.6691	0.6943	2.4040	0.0162	0.3081	3.0300
sexFemale	1.1855	0.1775	6.6784	0.0000	0.8376	1.5335
smoker_currentCurrent Smoker	1.5208	0.3654	4.1621	0.0000	0.8045	2.2371
sexFemale:smoker_currentCurrent Smoker	1.2833	0.5171	2.4819	0.0131	0.2697	2.2968

Section 22: Report the estimated effect of current smoking on mentally unhealthy days among men (the coefficient for smoker_current). Interpret in plain language.

Being a current smoker is associated with 1.5208 more mentally unhealthy days among males holding all else constant.

Section 23: Compute and report the estimated effect among women (smoker_current coefficient plus the interaction coefficient). Interpret in plain language.

Effect for women = smoker_current + interaction term (sexFemale:smoker_current) = 1.5208 + 1.2833 = 2.8041

Being a current smoker is associated with 2.8041 more mentally unhealthy days among women holding all else constant.

Section 24: In 2 to 3 sentences, explain what these estimates mean together: is smoking more strongly associated with poor mental health in one sex than the other, and by how much?

Smoking is more strongly associated with poor mental health days in women, who demonstrate roughly 1.28 more mentally unhealthy days over the past thirty days holding all else constant. Although both sexes appear to have worsened mental health if they are current smokers, the effect is greater in women.

Step 5: Visualize the interaction using ggeffects::ggpredict(model_B, terms = c(“smoker_current”, “sex”)). The plot should show predicted mentally unhealthy days for each combination of smoking status and sex, with labeled axes, a descriptive title, and a legend identifying men and women.

int_B <- ggpredict(model_B, terms = c("smoker_current", "sex"))

ggplot(int_B, aes (x = x, y = predicted, color = group, fill = group, group = group)) +
  geom_line(linewidth = 1.2) +
  geom_point(size = 2) +
  geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.12, color = NA) +
  labs(
    title = "Predicted Mentally Unhealthy Days by Smoking Status and Sex",
    subtitle = "From the interaction model: sex * smoker_curernt",
    x = "Current Soking Status",
    y = "Predicted Mentally Unhealthy Days (past 30 days)",
    color = "Sex",
    fill = "Sex"
  ) +
  theme_minimal(base_size = 13) +
  scale_color_brewer(palette = "Set2")

Step 6: Write 3 to 5 sentences interpreting the interaction for a general public health audience. Do not include coefficient values or p-values. Focus on the substantive finding and its implications.

It appears that current smokers, whether male or female have increased levels of mentally unhealthy days when compared to nonsmokers of the same gender. It is important to note that there is an interaction because females who are current smokers seem to be affected more by smoking status than males are. This is apparent because female mentally health days increases more when going from non-smoker to current smoker than when compared to males going from non-smoker to current smoker. This is a substantive finding because it suggests who public health campaigns to limit smoking should be directed to and focused on (females). In addition, mental health screening for smoking women could be beneficial to understand what societal, environmental, or physical factors may be influencing the difference between men and women. —

Part 4: Reflection (15 points)

Write 250–400 words in continuous prose (not bullet points) addressing all three topics below.

A. Findings and Limitations (6 points)

[What do the results suggest about the determinants of mental health among U.S. adults? Which predictors were most strongly associated? Which were not statistically significant? What are the key limitations of using cross-sectional survey data? Name at least two specific confounders that could bias the associations you estimated.]

The results suggest that mental health among U.S adults depends on the intersection of the determinants of health, and that one predictor alone can not explain nearly as much as when they are combined. Even then physically unhealthy days, BMI, sex, exercise, age, income, education, and smoking can only explain 18.53% of the variance in mentally unhealthy days, meaning there is still room for explaining other variables that were not currently included in the model or random noise. The most strongly associated predictors with partial assocaitions were physhlth_days with a f-value of 696.1198, sex with a f-value of 68.9012, and age_group with a f-value of 46.3986. All of the predictors, including physhlth_days, bmi, sex, exercise, age_group, income, education, and smoking, were significant at the α = 0.05 level. Cross-sectional data are imperative in epidemiological studies; however, in data such as the BRFSS, the information is sometimes collected in familial clusters, leading to possible caveats regarding independence as well as inability to determine causality. Two confounders that could lead to bias in the associations that I discovered could be income and health insurance status which are not presently included in this model and would be worth look into.

B. From Simple to Multiple Regression (4 points)

[What does Adjusted R-squared tell you that regular R-squared does not, and why is this distinction especially important when adding multiple categorical predictors? Why is it useful to test groups of predictors (income, education) collectively with chunk tests, rather than relying only on the individual t-test for each level?]

Regular R-squared can tell the proportion of total variance in Y that is explained by the model; however, this always increases or stays the same when we add predictors to the model, and should not be used to compare models with different numbers of predictors. Adjusted R-squared instead will penalize the number of predictors and only increase when a new predictor improves the model by more than chance alone. This is especially important when adding multiple categorical predictors because the dummy variables can impact the regular R-squared, whereas the adjusted R-squared will deduct for predictors that are not beneficial. It is useful to test groups of predictors with chunk tests because they can test all levels of a categorical variable collectively, instead of conducting individual t-tests, which can increase type I errors.

C. AI Transparency (5 points)

[Describe which parts of the assignment (if any) you sought AI assistance for, what specific prompts you used, and how you verified the AI-assisted output was correct. If you did not use AI, state that explicitly]

Most of my code was from Dr.Masum’s Lecture notes or from my own code from the lab. I did employ AI for two parts of this assignment that I could consider debugging. My first issue surrounded making table one (descriptive statistics), my code would not generate both levels of exercise (yes and no). To get help, I used copilot and used the prompt “I am currently working on an assignment for graduate level statistics two and have already created a descriptive table; however, I would like exercise to be separated into yes and no and the variable is already coded into a factor variable” and I added the current code I had. Copilot suggested to add type = list(exercise ~ “categorical”) to my code that I already had. To verify this was correct I googled what this line of code would add and it explicitly states that this will change the way the variable is received from continuous to factor allowing both levels to be included. This was listed on numerous Rblogs as a way to fit proper table output. The second part I employed copilot for was visualizing the interaction. Originally using code from the lecture, kept supplying me with an error that each variable only had one observation. I entered this prompt I am currently working on an assignment for graduate level statistics two and have already created a plot; however, I would like there to be lines drawn for each level of sex (male and female) depending on smoking status. I inserted the codeI had already written and Copilot suggested adding this line of code (group = group). I again googled and went to rblogs and they suggested that using group-group will allow the points to form a line even though the x was not a continuous variable. —

End of Homework 3