Submission: Knit this file to HTML, publish to RPubs with the title
epi553_hw03_lastname_firstname, and paste the RPubs URL in the Brightspace submission comments. Also upload this.Rmdfile to Brightspace.AI Policy: AI tools are permitted on this assignment for debugging not coding. See the assignment description for full details.
Section 6: Load required packages: tidyverse, haven, broom, kableExtra, car, summary, ggeffects
# Load required packages
library(kableExtra)
library(broom)
library(haven)
library(janitor)
library(plotly)
library(broom)
library(ggeffects)
library(ggstats)
library(gtsummary)
library(GGally)
library(car)
library(lmtest)
library(corrplot)
library(tidyverse)Section 7: Import using read_xpt. Report the total number of rows and columns in the raw dataset
# Import the dataset — update the path if needed
brfss_raw <- read_xpt("LLCP2023.XPT") |> clean_names()
# Quick check
glimpse(brfss_raw)## Rows: 433,323
## Columns: 350
## $ state <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ fmonth <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ idate <chr> "03012023", "01062023", "03082023", "03062023", "01062023", "…
## $ imonth <chr> "03", "01", "03", "03", "01", "01", "03", "01", "03", "01", "…
## $ iday <chr> "01", "06", "08", "06", "06", "09", "21", "06", "15", "08", "…
## $ iyear <chr> "2023", "2023", "2023", "2023", "2023", "2023", "2023", "2023…
## $ dispcode <dbl> 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1100, 1…
## $ seqno <chr> "2023000001", "2023000002", "2023000003", "2023000004", "2023…
## $ psu <dbl> 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e+09, 2.023e…
## $ ctelenm1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ pvtresd1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ colghous <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ statere1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ celphon1 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ ladult1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ numadult <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1, 1…
## $ respslc1 <dbl> 1, NA, NA, 1, NA, 1, NA, NA, NA, 2, 1, 1, 1, 1, 1, NA, NA, 1,…
## $ landsex2 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ lndsxbrt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ safetime <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ ctelnum1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cellfon5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cadult1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cellsex2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ celsxbrt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pvtresd3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cclghous <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cstate1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ landline <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hhadult <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sexvar <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ genhlth <dbl> 2, 2, 4, 2, 4, 3, 4, 4, 3, 3, 3, 2, 2, 3, 3, 4, 5, 3, 3, 3, 3…
## $ physhlth <dbl> 88, 88, 6, 2, 88, 2, 8, 1, 5, 88, 88, 2, 88, 88, 88, 4, 30, 1…
## $ menthlth <dbl> 88, 88, 2, 88, 88, 3, 77, 88, 88, 88, 88, 88, 88, 88, 88, 88,…
## $ poorhlth <dbl> NA, NA, 1, 88, NA, 88, 88, 1, 88, NA, NA, 88, NA, NA, NA, 1, …
## $ primins1 <dbl> 3, 3, 3, 3, 3, 7, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2…
## $ persdoc3 <dbl> 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2…
## $ medcost1 <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ checkup1 <dbl> 2, 2, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## $ exerany2 <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 7, 2, 2, 1, 1…
## $ exract12 <dbl> NA, 1, 1, 1, 1, 1, NA, NA, NA, 1, 1, NA, 1, NA, NA, 1, NA, NA…
## $ exeroft1 <dbl> NA, 106, 205, 103, 102, 212, NA, NA, NA, 230, 105, NA, 105, N…
## $ exerhmm1 <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA, 1…
## $ exract22 <dbl> NA, 88, 88, 88, 8, 88, NA, NA, NA, 88, 77, NA, 3, NA, NA, 88,…
## $ exeroft2 <dbl> NA, NA, NA, NA, 107, NA, NA, NA, NA, NA, NA, NA, 105, NA, NA,…
## $ exerhmm2 <dbl> NA, NA, NA, NA, 100, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, …
## $ strength <dbl> 888, 888, 205, 888, 888, 203, 777, 888, 888, 888, 105, 888, 8…
## $ bphigh6 <dbl> 1, 1, 1, 3, 1, 1, 3, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 1, 1, 1, 1…
## $ bpmeds1 <dbl> 1, 2, 1, NA, 1, 1, NA, 1, 1, 1, 1, 1, NA, NA, 1, 1, 1, 1, 1, …
## $ cholchk3 <dbl> 3, 2, 2, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 2…
## $ toldhi3 <dbl> 2, 1, 1, 2, 2, 1, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2…
## $ cholmed3 <dbl> 2, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 1, 2, 1, 2…
## $ cvdinfr4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ cvdcrhd4 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ cvdstrk3 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2…
## $ asthma3 <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2…
## $ asthnow <dbl> NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ chcscnc1 <dbl> 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ chcocnc1 <dbl> 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 1…
## $ chccopd3 <dbl> 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2…
## $ addepev3 <dbl> 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ chckdny2 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 7, 1, 2, 2, 2…
## $ havarth4 <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1…
## $ diabete4 <dbl> 1, 3, 3, 3, 1, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1…
## $ diabage4 <dbl> 57, NA, NA, NA, 68, NA, NA, 98, NA, 60, NA, NA, NA, NA, NA, N…
## $ marital <dbl> 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 3, 3, 1, 3, 3, 2…
## $ educa <dbl> 5, 5, 4, 5, 5, 5, 4, 5, 5, 4, 5, 4, 6, 5, 5, 4, 6, 6, 5, 4, 4…
## $ renthom1 <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ numhhol4 <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2…
## $ numphon4 <dbl> NA, NA, NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ cpdemo1c <dbl> 1, 1, 8, 1, 3, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ veteran3 <dbl> 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 2, 2, 2…
## $ employ1 <dbl> 7, 7, 7, 7, 8, 7, 8, 7, 7, 7, 4, 1, 7, 7, 7, 5, 7, 7, 7, 7, 7…
## $ children <dbl> 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 88, 8…
## $ income3 <dbl> 99, 99, 2, 99, 7, 7, 6, 99, 6, 7, 7, 99, 9, 9, 9, 3, 77, 99, …
## $ pregnant <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ weight2 <dbl> 172, 132, 130, 170, 170, 165, 180, 135, 195, 160, 300, 145, 1…
## $ height3 <dbl> 503, 409, 504, 506, 508, 502, 600, 411, 504, 510, 511, 505, 5…
## $ deaf <dbl> 2, 1, 7, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2…
## $ blind <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ decide <dbl> 2, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 7, 2…
## $ diffwalk <dbl> 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 1, 1, 2, 1, 1…
## $ diffdres <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2, 2…
## $ diffalon <dbl> 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 1…
## $ fall12mn <dbl> 88, 88, 88, 88, 3, 88, 8, 88, 88, 88, 88, 88, 88, 1, 88, 1, 3…
## $ fallinj5 <dbl> NA, NA, NA, NA, 88, NA, 88, NA, NA, NA, NA, NA, NA, 1, NA, 88…
## $ smoke100 <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1…
## $ smokday2 <dbl> NA, NA, 3, NA, NA, NA, 3, NA, NA, 3, NA, NA, 3, 3, 3, NA, NA,…
## $ usenow3 <dbl> 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ ecignow2 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 1, 1, 1, 4, 1, 4…
## $ alcday4 <dbl> 888, 888, 888, 888, 202, 205, 888, 888, 888, 888, 888, 888, 1…
## $ avedrnk3 <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 2, NA, NA, N…
## $ drnk3ge5 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA, N…
## $ maxdrnks <dbl> NA, NA, NA, NA, 1, 2, NA, NA, NA, NA, NA, NA, 1, 3, NA, NA, N…
## $ flushot7 <dbl> 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1…
## $ flshtmy3 <dbl> NA, 92023, 112022, 102022, NA, NA, 32023, 102023, 777777, 112…
## $ pneuvac4 <dbl> 2, 1, 1, 1, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 1, 1, 2, 1, 1, 1…
## $ shingle2 <dbl> 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 1, 1, 2, 2, 1…
## $ hivtst7 <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 7, 2, 2, 2, 1…
## $ hivtstd3 <dbl> NA, NA, NA, 777777, NA, 777777, 92022, NA, NA, NA, NA, NA, NA…
## $ seatbelt <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ drnkdri2 <dbl> NA, NA, NA, NA, 88, 88, NA, NA, NA, NA, NA, NA, 88, 88, NA, N…
## $ covidpo1 <dbl> 2, 2, 2, 2, 2, 1, 2, 1, 1, 2, 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 2…
## $ covidsm1 <dbl> NA, NA, NA, NA, NA, 2, NA, 2, 2, NA, 1, 2, NA, 2, 2, NA, NA, …
## $ covidact <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 3, NA, NA, NA, NA, NA…
## $ pdiabts1 <dbl> NA, 2, 1, 1, NA, 1, 1, NA, 1, NA, 1, 1, 1, 1, 1, 1, 7, 1, 8, …
## $ prediab2 <dbl> NA, 3, 3, 3, NA, 3, 3, NA, 1, NA, 3, 3, 1, 3, 3, 3, 3, 3, 3, …
## $ diabtype <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ insulin1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ chkhemo3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ eyeexam1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ diabeye1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ diabedu1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ feetsore <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthexer <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthedu <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lmtjoin3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ arthdis2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ joinpai2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsfirst <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcslast <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsnumcg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsctsc1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsscncr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lcsctwhn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadmam <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ howlong <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cervscrn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclcnc <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclpap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crvclhpv <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadhyst2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psatest1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psatime1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pcpsars2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ psasugs1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ pcstalk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hadsigm4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colnsigm <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colntes1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sigmtes1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lastsig4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ colncncr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ vircolo1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ vclntes2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ smalstol <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ stoltest <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ stooldn2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ bldstfit <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sdnates1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrdiff <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrage <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cncrtyp2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvtrt3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvdoc1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvsum <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvrtrn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvinst <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvinsr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvdein <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvclin <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvpain <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ csrvctl2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ indortan <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ numburn3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ sunprtct <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wkdayout <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ wkendout <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cimemlo1 <dbl> 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, NA, 2, 2, 2, 1, 2, 2, …
## $ cdworry <dbl> NA, NA, 2, NA, 1, 1, NA, NA, NA, NA, NA, NA, 7, NA, NA, NA, N…
## $ cddiscu1 <dbl> NA, NA, 1, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ cdhous1 <dbl> NA, NA, 2, NA, 2, 1, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ cdsocia1 <dbl> NA, NA, 2, NA, 2, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, N…
## $ caregiv1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvrel4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvlng1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvhrs1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvprb3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvalzd <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvper1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvhou1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ crgvexpt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lastsmk2 <dbl> NA, NA, 7, NA, NA, NA, 7, NA, NA, 7, NA, NA, 7, NA, 7, NA, NA…
## $ stopsmk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mentcigs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ mentecig <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ heattbco <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ firearm5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ gunload <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ loadulk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hasymp6 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp5 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ strsymp6 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ firstaid <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aspirin <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ birthsex <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ somale <dbl> NA, NA, NA, NA, NA, NA, 2, NA, NA, 2, 2, NA, NA, 2, NA, NA, N…
## $ sofemale <dbl> 2, 2, 2, 2, 2, 2, NA, 2, 2, NA, NA, 2, 2, NA, 2, 2, 2, NA, 9,…
## $ trnsgndr <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4…
## $ marijan1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjsmok <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjeat <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjvape <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjdab <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ marjothr <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ usemrjn4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedeprs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedrink <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedrugs <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceprisn <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acedivrc <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acepunch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acehurt1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceswear <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acetouch <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acetthem <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ acehvsex <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceadsaf <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ aceadned <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ imfvpla4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hpvadvc4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hpvadsht <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ tetanus1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covidva1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covacge1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ covidnu2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ lsatisfy <dbl> 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 2, 1, 1, NA, 1, 2, 3, 1, 4, 2, …
## $ emtsuprt <dbl> 1, 2, 4, 1, 2, 2, 1, 1, 1, 3, 1, 1, 2, NA, 1, 1, 2, 1, 4, 2, …
## $ sdlonely <dbl> 5, 3, 3, 3, 2, 3, 4, 5, 3, 3, 4, 5, 4, NA, 4, 3, 4, 5, 1, 2, …
## $ sdhemply <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ foodstmp <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhfood1 <dbl> 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 4, 5, 5, NA, 5, 5, 5, 5, 5, 5, …
## $ sdhbills <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhutils <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhtrnsp <dbl> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, NA, 2, 2, 2, 2, 2, 2, …
## $ sdhstre1 <dbl> 5, 5, 3, 5, 2, 3, 5, 5, 4, 4, 4, 4, 4, NA, 4, 5, 4, 5, 4, 4, …
## $ rrclass3 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrcognt2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrtreat <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rratwrk2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrhcare4 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rrphysm2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsgend1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsxbrth <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ rcsrltn2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ casthdx2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ casthno2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ qstver <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 1…
## $ qstlang <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ metstat <dbl> 1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 1…
## $ urbstat <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ mscode <dbl> 1, 1, 3, 2, 5, 1, 2, 5, 3, 1, 1, 1, 1, 5, 1, 2, 5, 5, 5, 1, 1…
## $ ststr <dbl> 11011, 11012, 11011, 11011, 11011, 11011, 11011, 11012, 11011…
## $ strwt <dbl> 42.79158, 170.42939, 42.79158, 42.79158, 42.79158, 42.79158, …
## $ rawrake <dbl> 2, 1, 1, 2, 1, 2, 1, 1, 1, 2, 2, 4, 2, 2, 2, 1, 1, 2, 1, 1, 1…
## $ wt2rake <dbl> 85.58316, 170.42939, 42.79158, 85.58316, 42.79158, 85.58316, …
## $ imprace <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ chispnc <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9…
## $ crace1 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cageg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cllcpwt <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ dualuse <dbl> 1, 1, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dualcor <dbl> 0.5026388, 0.5026388, NA, 0.5026388, 0.5026388, 0.5026388, 0.…
## $ llcpwt2 <dbl> 941.1640, 1874.2239, 1151.6032, 941.1640, 470.5820, 941.1640,…
## $ llcpwt <dbl> 605.4279, 1121.9927, 600.9633, 605.4279, 281.7110, 1060.3352,…
## $ rfhlth <dbl> 1, 1, 2, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1…
## $ phys14d <dbl> 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 1, 2, 3, 3, 1, 1, 2…
## $ ment14d <dbl> 1, 1, 2, 1, 1, 2, 9, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 3, 1, 1…
## $ hlthpl1 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ hcvu653 <dbl> 9, 9, 9, 9, 9, 1, 9, 9, 9, 9, 1, 9, 9, 9, 9, 9, 1, 9, 9, 9, 9…
## $ totinda <dbl> 2, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1, 1…
## $ metvl12 <dbl> NA, 35, 35, 35, 35, 35, NA, NA, NA, 35, 35, NA, 35, NA, NA, 3…
## $ metvl22 <dbl> NA, NA, NA, NA, 33, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, N…
## $ maxvo21 <dbl> 1840, 1803, 1322, 1914, 1988, 2506, 1545, 1914, 1655, 1875, 2…
## $ fc601 <dbl> 315, 309, 227, 328, 341, 430, 265, 328, 284, 321, 472, 328, 3…
## $ actin13 <dbl> NA, 2, 2, 2, 2, 1, NA, NA, NA, 2, 1, NA, 1, NA, NA, 2, NA, NA…
## $ actin23 <dbl> NA, NA, NA, NA, 1, NA, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA,…
## $ padur1 <dbl> NA, 30, 15, 30, 45, 45, NA, NA, NA, 30, 30, NA, 45, NA, NA, 6…
## $ padur2 <dbl> NA, NA, NA, NA, 60, NA, NA, NA, NA, NA, NA, NA, 45, NA, NA, N…
## $ pafreq1 <dbl> NA, 6000, 1167, 3000, 2000, 2800, NA, NA, NA, 7000, 5000, NA,…
## $ pafreq2 <dbl> NA, NA, NA, NA, 7000, NA, NA, NA, NA, NA, NA, NA, 5000, NA, N…
## $ minac12 <dbl> NA, 180, 18, 90, 90, 126, NA, NA, NA, 210, 150, NA, 225, NA, …
## $ minac22 <dbl> NA, 0, 0, 0, 420, 0, NA, NA, NA, 0, NA, NA, 225, NA, NA, 0, N…
## $ strfreq <dbl> 0, 0, 1167, 0, 0, 700, NA, 0, 0, 0, 5000, 0, 0, 2000, 2000, 0…
## $ pamiss3 <dbl> 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 9, 0, 0, 1, 1…
## $ pamin13 <dbl> NA, 360, 36, 180, 180, 126, NA, NA, NA, 420, 150, NA, 225, NA…
## $ pamin23 <dbl> NA, NA, NA, NA, 420, NA, NA, NA, NA, NA, NA, NA, 450, NA, NA,…
## $ pa3min <dbl> NA, 360, 36, 180, 600, 126, NA, NA, NA, 420, 150, NA, 675, NA…
## $ pavig13 <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 0, NA, NA, 12…
## $ pavig23 <dbl> NA, NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA, 225, NA, NA, N…
## $ pa3vigm <dbl> NA, 180, 18, 90, 90, 0, NA, NA, NA, 210, 0, NA, 225, NA, NA, …
## $ pacat3 <dbl> 4, 1, 9, 9, 1, 9, 4, 4, 4, 1, 9, 4, 1, 4, 4, 9, 9, 4, 4, 9, 9…
## $ paindx3 <dbl> 2, 1, 9, 1, 1, 9, 2, 2, 2, 1, 1, 2, 1, 2, 2, 1, 9, 2, 2, 1, 1…
## $ pa150r4 <dbl> 3, 1, 9, 1, 1, 9, 3, 3, 3, 1, 1, 3, 1, 3, 3, 1, 9, 3, 3, 1, 1…
## $ pa300r4 <dbl> 3, 1, 9, 9, 1, 9, 3, 3, 3, 1, 9, 3, 1, 3, 3, 9, 9, 3, 3, 9, 9…
## $ pa30023 <dbl> 2, 1, 9, 9, 1, 9, 2, 2, 2, 1, 9, 2, 1, 2, 2, 9, 9, 2, 2, 9, 9…
## $ pastrng <dbl> 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 2, 2, 2, 1…
## $ parec3 <dbl> 4, 2, 9, 2, 2, 9, 9, 4, 4, 2, 1, 4, 2, 3, 3, 2, 9, 4, 4, 2, 1…
## $ pastae3 <dbl> 2, 2, 9, 2, 2, 9, 9, 2, 2, 2, 1, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1…
## $ rfhype6 <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2…
## $ cholch3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ rfchol3 <dbl> 1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1…
## $ michd <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ ltasth1 <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1…
## $ casthm1 <dbl> 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 9, 1, 1, 1, 1…
## $ asthms1 <dbl> 3, 3, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 9, 3, 3, 3, 3…
## $ drdxar2 <dbl> 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 1, 1, 2, 1, 1…
## $ mrace1 <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ hispanc <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ race <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ raceg21 <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ racegr3 <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ raceprv <dbl> 1, 1, 2, 1, 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ sex <dbl> 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 1, 2, 1, 2…
## $ ageg5yr <dbl> 13, 13, 13, 12, 12, 9, 13, 12, 13, 12, 8, 12, 10, 12, 12, 13,…
## $ age65yr <dbl> 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2…
## $ age80 <dbl> 80, 80, 80, 78, 76, 62, 80, 78, 80, 75, 59, 78, 68, 79, 79, 8…
## $ age_g <dbl> 6, 6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 6, 6, 6, 6, 6, 5, 6, 6, 6, 6…
## $ htin4 <dbl> 63, 57, 64, 66, 68, 62, 72, 59, 64, 70, 71, 65, 68, 73, 62, 6…
## $ htm4 <dbl> 160, 145, 163, 168, 173, 157, 183, 150, 163, 178, 180, 165, 1…
## $ wtkg3 <dbl> 7802, 5987, 5897, 7711, 7711, 7484, 8165, 6123, 8845, 7257, 1…
## $ bmi5 <dbl> 3047, 2856, 2231, 2744, 2585, 3018, 2441, 2727, 3347, 2296, 4…
## $ bmi5cat <dbl> 4, 3, 2, 3, 3, 4, 2, 3, 4, 2, 4, 2, 3, 2, 3, 3, 4, 3, 3, 3, 4…
## $ rfbmi5 <dbl> 2, 2, 1, 2, 2, 2, 1, 2, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2, 2, 2…
## $ chldcnt <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ educag <dbl> 3, 3, 2, 3, 3, 3, 2, 3, 3, 2, 3, 2, 4, 3, 3, 2, 4, 4, 3, 2, 2…
## $ incomg1 <dbl> 9, 9, 1, 9, 5, 5, 4, 9, 4, 5, 5, 9, 6, 6, 6, 2, 9, 9, 3, 4, 3…
## $ smoker3 <dbl> 4, 4, 3, 4, 4, 4, 3, 4, 4, 3, 4, 4, 3, 3, 3, 4, 4, 4, 1, 3, 3…
## $ rfsmok3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1…
## $ cureci2 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ drnkany6 <dbl> 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 2, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2…
## $ drocdy4 <dbl> 0, 0, 0, 0, 7, 17, 0, 0, 0, 0, 0, 0, 29, 100, 0, 0, 0, 0, 33,…
## $ rfbing6 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ drnkwk2 <dbl> 0, 0, 0, 0, 47, 233, 0, 0, 0, 0, 0, 0, 200, 1400, 0, 0, 0, 0,…
## $ rfdrhv8 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ flshot7 <dbl> 2, 1, 1, 1, 2, NA, 1, 1, 1, 1, NA, 1, 1, 1, 1, 2, NA, 1, 2, 1…
## $ pneumo3 <dbl> 2, 1, 1, 1, 1, NA, 1, 1, 1, 1, NA, 2, 1, 1, 2, 1, NA, 2, 1, 1…
## $ aidtst4 <dbl> 2, 2, 2, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 9, 2, 2, 2, 1…
## $ rfseat2 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ rfseat3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2…
## $ drnkdrv <dbl> 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 9, 9, 2, 2, 9, 9, 9, 9, 2, 2, 9…
tibble(Metric = c("Observations", "Variables"),
Value = c(nrow(brfss_raw), ncol(brfss_raw))) |>
kable(caption = "Full Dataset Dimensions") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Metric | Value |
|---|---|
| Observations | 433323 |
| Variables | 350 |
Section 8: Select the nine variables listed in the table
# Select only the variables we want to work with
brfss_select <- brfss_raw |>
select(menthlth, physhlth, bmi5, sexvar, exerany2, ageg5yr, incomg1, educa, smoker3)
tibble(Metric = c("Observations", "Variables"),
Value = c(nrow(brfss_select), ncol(brfss_select))) |>
kable(caption = "Only Necessary Dataset Dimensions") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Metric | Value |
|---|---|
| Observations | 433323 |
| Variables | 9 |
# Save necessary data and variables as an RDS file
saveRDS(brfss_select,
"/Users/zoya_hayes/Desktop/EPI553/EPI553_Coding/Assignments/HW03/brfss_2023_select.rds")Section 9: Recode all variables as described. Store all categorical variables as labeled factors
# Recode all the variables selected
brfss_recode <- brfss_select |>
mutate(
# Outcome: Mentally unheathy days in past 30
menthlth_days = case_when(
menthlth == 88 ~ 0,
menthlth >= 1 & menthlth <= 30 ~ as.numeric(menthlth),
menthlth %in% c(77, 99) ~ NA_real_,
TRUE ~ NA_real_
),
# Physical health days
physhlth_days = case_when(
physhlth == 88 ~ 0,
physhlth >= 1 & physhlth <= 30 ~ as.numeric(physhlth),
physhlth %in% c(77, 99) ~ NA_real_,
TRUE ~ NA_real_
),
# BMI
bmi = case_when(
bmi5 == 9999 ~ NA_real_,
TRUE ~ bmi5 / 100
),
# Sex
sex = factor(sexvar, levels = c(1, 2), labels = c("Male", "Female")),
# Exercise in past 30 days
exercise = factor(case_when(
exerany2 == 2 ~ "No",
exerany2 == 1 ~ "Yes",
TRUE ~ NA_character_
), levels = c("No", "Yes")),
# Age
age_group = factor(case_when(
ageg5yr == 1 ~ "18-24",
ageg5yr == 2 ~ "25-29",
ageg5yr == 3 ~ "30-34",
ageg5yr == 4 ~ "35-39",
ageg5yr == 5 ~ "40-44",
ageg5yr == 6 ~ "45-49",
ageg5yr == 7 ~ "50-54",
ageg5yr == 8 ~ "55-59",
ageg5yr == 9 ~ "60-64",
ageg5yr == 10 ~ "65-69",
ageg5yr == 11 ~ "70-74",
ageg5yr == 12 ~ "75-79",
ageg5yr == 13 ~ "80+",
ageg5yr == 14 ~ NA_character_,
TRUE ~ NA_character_
),
levels = c("18-24", "25-29","30-34","35-39","40-44", "45-49","50-54","55-59","60-64", "65-69", "70-74", "75-79", "80+")),
# Income
income = factor(case_when(
incomg1 == 1 ~ "Less than $15,000",
incomg1 == 2 ~ "$15,000 to < $25,000",
incomg1 == 3 ~ "$25,000 to < $35,000",
incomg1 == 4 ~ "$35,000 to < $50,000",
incomg1 == 5 ~ "$50,000 to < $100,000",
incomg1 == 6 ~ "$100,000 to < $200,000",
incomg1 == 7 ~ "More than $200,000",
incomg1 == 9 ~ NA_character_,
TRUE ~ NA_character_
),
levels = c("Less than $15,000", "$15,000 to < $25,000", "$25,000 to < $35,000", "$35,000 to < $50,000", "$50,000 to < $100,000", "$100,000 to < $200,000", "More than $200,000")),
# Education
education = factor(case_when(
educa %in% c(1, 2) ~ "Less than high school",
educa == 3 ~ "High school diploma or GED",
educa == 4 ~ "Some college or technical school",
educa == 5 ~ "College graduate",
educa == 6 ~ "Graduate or professional degree",
educa == 9 ~ NA_character_,
TRUE ~ NA_character_),
levels = c("Less than high school", "High school diploma or GED", "Some college or technical school", "College graduate", "Graduate or professional degree")),
# Smoker Status
smoking = factor(case_when(
smoker3 == 1 ~ "Current daily smoker",
smoker3 == 2 ~ "Current some-day smoker",
smoker3 == 3 ~ "Former smoker",
smoker3 == 4 ~ "Never smoker",
smoker3 == 9 ~ NA_character_,
TRUE ~ NA_character_),
levels = c("Current daily smoker", "Current some-day smoker", "Former smoker", "Never smoker")),
)
brfss_analytic <- brfss_recode |>
select (menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking)Section 10: Report the number and percentage of missing observations for menthlth_days and at least two other key variables, before removing any missing value
# Report missing values for the key variables and print out total sample size
missing_summary <- data.frame(
Variable = names(brfss_analytic),
Missing_Count = colSums(is.na(brfss_analytic)),
Missing_Percent = round(colSums(is.na(brfss_analytic)) / nrow(brfss_analytic) * 100, 2)
)
# Show variables with the most missing data
print(missing_summary[order(-missing_summary$Missing_Count), ][1:9, ]) |>
kable(
caption = "Variables by Missingness"
) |>
kable_styling(
bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "center"
)## Variable Missing_Count Missing_Percent
## income income 86623 19.99
## bmi bmi 40535 9.35
## smoking smoking 23062 5.32
## physhlth_days physhlth_days 10785 2.49
## menthlth_days menthlth_days 8108 1.87
## age_group age_group 7779 1.80
## education education 2325 0.54
## exercise exercise 1251 0.29
## sex sex 0 0.00
| Variable | Missing_Count | Missing_Percent | |
|---|---|---|---|
| income | income | 86623 | 19.99 |
| bmi | bmi | 40535 | 9.35 |
| smoking | smoking | 23062 | 5.32 |
| physhlth_days | physhlth_days | 10785 | 2.49 |
| menthlth_days | menthlth_days | 8108 | 1.87 |
| age_group | age_group | 7779 | 1.80 |
| education | education | 2325 | 0.54 |
| exercise | exercise | 1251 | 0.29 |
| sex | sex | 0 | 0.00 |
The missing count for menthlth_days is 8108 or 1.87%. For BMI, the missing count is 40,535 with 9.35% missing. Income also has a relatively high missing count with 86,623 missing or 19.99%.
Section 11: Create the analytic dataset using drop_na() and slice_sample() with set.seed(1220). Report the final analytic n.
# Create analytic dataset
set.seed(1220)
brfss_analytic <- brfss_analytic |>
drop_na(menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking) |>
slice_sample(n = 8000)
# Display subset dimensions
cat("Working subset dimensions:",
nrow(brfss_analytic), "observations,",
ncol(brfss_analytic), "variables\n")## Working subset dimensions: 8000 observations, 9 variables
# Save for later use
saveRDS(brfss_analytic,
"/Users/zoya_hayes/Desktop/EPI553/EPI553_Coding/Assignments/HW03/brfss_2023_analytic.rds")
# Print out total sample size
cat("Final Analytic N:", nrow(brfss_analytic), "\n")## Final Analytic N: 8000
tibble(Metric = c("Observations", "Variables"),
Value = c(nrow(brfss_analytic), ncol(brfss_analytic))) |>
kable(caption = "Analytic Dataset Dimensions") |>
kable_styling(bootstrap_options = "striped", full_width = FALSE)| Metric | Value |
|---|---|
| Observations | 8000 |
| Variables | 9 |
The final analytic n is 8,000.
Section 12: Produce a descriptive statistics table using gtsummary::tbl_summary(), stratified by sex.
# Create Table 1 Descriptive Statistics
brfss_analytic |>
select(menthlth_days, physhlth_days, bmi, sex, exercise, age_group, income, education, smoking) |>
tbl_summary(
by = sex,
type = list(exercise ~ "categorical"),
label = list(
menthlth_days ~ "Mentally Unhealthy Days (Last 30 Days)",
physhlth_days ~ "Physically Unhealthy Days (Last 30 Days)",
bmi ~ "Body Mass Index (kg/m2)",
exercise ~ "Exercise Status",
age_group ~ "Age Category",
income ~ "Income Category",
education ~ "Education Category",
smoking ~ "Smoking Status"
),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
digits = all_continuous() ~ 1,
missing = "no"
) |>
add_n() |>
bold_labels() |>
italicize_levels() |>
modify_caption("**Table 1. Descriptive Statistics — BRFSS 2023 Analytic Sample (n = 8,000)**") |>
modify_footnote(
all_stat_cols() ~
"Mean (SD); n(%); Analytic sample includes 8,000 complete-case observations. Missing values were excluded listwise for all variables included in this table."
) |>
as_flex_table()Characteristic | N | Male | Female |
|---|---|---|---|
Mentally Unhealthy Days (Last 30 Days) | 8,000 | 3.5 (7.5) | 5.1 (8.6) |
Physically Unhealthy Days (Last 30 Days) | 8,000 | 4.0 (8.4) | 4.9 (8.9) |
Body Mass Index (kg/m2) | 8,000 | 28.7 (6.0) | 28.7 (7.0) |
Exercise Status | 8,000 | ||
No | 790 (20%) | 970 (24%) | |
Yes | 3,146 (80%) | 3,094 (76%) | |
Age Category | 8,000 | ||
18-24 | 235 (6.0%) | 171 (4.2%) | |
25-29 | 219 (5.6%) | 189 (4.7%) | |
30-34 | 253 (6.4%) | 210 (5.2%) | |
35-39 | 263 (6.7%) | 302 (7.4%) | |
40-44 | 290 (7.4%) | 292 (7.2%) | |
45-49 | 266 (6.8%) | 252 (6.2%) | |
50-54 | 305 (7.7%) | 303 (7.5%) | |
55-59 | 308 (7.8%) | 352 (8.7%) | |
60-64 | 408 (10%) | 379 (9.3%) | |
65-69 | 418 (11%) | 483 (12%) | |
70-74 | 382 (9.7%) | 426 (10%) | |
75-79 | 325 (8.3%) | 338 (8.3%) | |
80+ | 264 (6.7%) | 367 (9.0%) | |
Income Category | 8,000 | ||
Less than $15,000 | 160 (4.1%) | 247 (6.1%) | |
$15,000 to < $25,000 | 271 (6.9%) | 370 (9.1%) | |
$25,000 to < $35,000 | 376 (9.6%) | 495 (12%) | |
$35,000 to < $50,000 | 482 (12%) | 585 (14%) | |
$50,000 to < $100,000 | 1,251 (32%) | 1,260 (31%) | |
$100,000 to < $200,000 | 996 (25%) | 869 (21%) | |
More than $200,000 | 400 (10%) | 238 (5.9%) | |
Education Category | 8,000 | ||
Less than high school | 75 (1.9%) | 49 (1.2%) | |
High school diploma or GED | 130 (3.3%) | 122 (3.0%) | |
Some college or technical school | 950 (24%) | 877 (22%) | |
College graduate | 1,018 (26%) | 1,120 (28%) | |
Graduate or professional degree | 1,763 (45%) | 1,896 (47%) | |
Smoking Status | 8,000 | ||
Current daily smoker | 339 (8.6%) | 319 (7.8%) | |
Current some-day smoker | 151 (3.8%) | 117 (2.9%) | |
Former smoker | 1,207 (31%) | 1,055 (26%) | |
Never smoker | 2,239 (57%) | 2,573 (63%) | |
1Mean (SD); n(%); Analytic sample includes 8,000 complete-case observations. Missing values were excluded listwise for all variables included in this table. | |||
Research Question: What is the independent association of each predictor with the number of mentally unhealthy days in the past 30 days
# Fitting full multiple linear regression
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education + smoking, data = brfss_analytic)
tidy(model_full, conf.int = TRUE) |>
mutate(
term = dplyr::recode(term,
"(Intercept)" = "Intercept",
"physhlth_days" = "Physical health days",
"bmi" = "Body Mass Index",
"sexFemale" = "Sex: Female (ref = Male)",
"exerciseYes" = "Exercise: Yes (ref = No)",
),
across(where(is.numeric), ~ round(., 3))
) |>
kable(
caption = "Table 3. Multiple Linear Regression: Mentally Unhealthy Days ~ Multiple Predictors (BRFSS 2023, n = 8,000)",
col.names = c("Term", "Estimate(β)", "Std. Error", "t-statistic",
"p-value", "95% CI Lower", "95% CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE) | Term | Estimate(β) | Std. Error | t-statistic | p-value | 95% CI Lower | 95% CI Upper |
|---|---|---|---|---|---|---|
| Intercept | 9.651 | 0.954 | 10.115 | 0.000 | 7.780 | 11.521 |
| Physical health days | 0.266 | 0.010 | 26.384 | 0.000 | 0.246 | 0.285 |
| Body Mass Index | 0.033 | 0.013 | 2.527 | 0.012 | 0.007 | 0.059 |
| Sex: Female (ref = Male) | 1.390 | 0.168 | 8.301 | 0.000 | 1.062 | 1.719 |
| Exercise: Yes (ref = No) | -0.651 | 0.215 | -3.033 | 0.002 | -1.072 | -0.230 |
| age_group25-29 | -1.057 | 0.519 | -2.034 | 0.042 | -2.075 | -0.038 |
| age_group30-34 | -1.094 | 0.506 | -2.160 | 0.031 | -2.087 | -0.101 |
| age_group35-39 | -1.811 | 0.489 | -3.707 | 0.000 | -2.769 | -0.853 |
| age_group40-44 | -2.893 | 0.487 | -5.935 | 0.000 | -3.849 | -1.937 |
| age_group45-49 | -3.056 | 0.498 | -6.141 | 0.000 | -4.032 | -2.081 |
| age_group50-54 | -3.517 | 0.483 | -7.277 | 0.000 | -4.464 | -2.569 |
| age_group55-59 | -4.496 | 0.476 | -9.454 | 0.000 | -5.428 | -3.564 |
| age_group60-64 | -4.522 | 0.458 | -9.863 | 0.000 | -5.421 | -3.623 |
| age_group65-69 | -5.578 | 0.450 | -12.390 | 0.000 | -6.460 | -4.695 |
| age_group70-74 | -6.025 | 0.457 | -13.173 | 0.000 | -6.922 | -5.129 |
| age_group75-79 | -6.287 | 0.475 | -13.239 | 0.000 | -7.217 | -5.356 |
| age_group80+ | -6.820 | 0.477 | -14.302 | 0.000 | -7.754 | -5.885 |
| income$15,000 to < $25,000 | -1.637 | 0.469 | -3.491 | 0.000 | -2.556 | -0.718 |
| income$25,000 to < $35,000 | -2.076 | 0.448 | -4.635 | 0.000 | -2.955 | -1.198 |
| income$35,000 to < $50,000 | -2.547 | 0.438 | -5.812 | 0.000 | -3.406 | -1.688 |
| income$50,000 to < $100,000 | -3.050 | 0.411 | -7.428 | 0.000 | -3.856 | -2.245 |
| income$100,000 to < $200,000 | -3.500 | 0.429 | -8.154 | 0.000 | -4.341 | -2.658 |
| incomeMore than $200,000 | -3.784 | 0.500 | -7.563 | 0.000 | -4.765 | -2.803 |
| educationHigh school diploma or GED | 0.080 | 0.811 | 0.099 | 0.921 | -1.509 | 1.669 |
| educationSome college or technical school | 1.077 | 0.690 | 1.561 | 0.119 | -0.275 | 2.429 |
| educationCollege graduate | 1.791 | 0.691 | 2.591 | 0.010 | 0.436 | 3.146 |
| educationGraduate or professional degree | 1.738 | 0.693 | 2.509 | 0.012 | 0.380 | 3.095 |
| smokingCurrent some-day smoker | -1.587 | 0.535 | -2.965 | 0.003 | -2.636 | -0.538 |
| smokingFormer smoker | -1.990 | 0.337 | -5.902 | 0.000 | -2.651 | -1.329 |
| smokingNever smoker | -2.937 | 0.322 | -9.131 | 0.000 | -3.567 | -2.306 |
The model is:
Menthlth_days = 9.651 + 0.266(physhlth_days) + 0.033(bmi) + 1.390(sex: Female, ref = male) + -0.651(exercise: Yes, ref = no), + -1.057(age_group25-29, ref = 18-24) + -1.094(age_group30-34, ref = 18-24) + -1.811(age_group35-39, ref = 18-24) + -2.893(age_group40-44, ref = 18-24) + -3.056(age_group45-49, ref = 18-24) + -3.517(age_group50-54, ref = 18-24) + -4.496(age_group55-59, ref = 18-24) + -4.522(age_group60-64, ref = 18-24) + -5.578(age_group65-69, ref = 18-24) + -6.025(age_group70-74, ref = 18-24) + -6.287(age_group75-79, ref = 18-24) + -6.820(age_group80+, ref = 18-24) + - 1.637(income15,000 to < 25,000, ref less than 15,000) + - 2.076(income25,000 to < 35,000, ref less than 15,000) + - 2.547(income35,000 to < 50,000, ref less than 15,000) + - 3.050(income50,000 to < 100,000, ref less than 15,000) + - 3.500(income100,000 to < 200,000, ref less than 15,000) + - 3.784(incomeMore than 200,000, ref less than 15,000) + 0.080(educationHigh school diploma or GED, ref less than high school) + 1.077(educationSome college or technical school, ref less than high school) + 1.791(educationCollege graduate, ref less than high school) + 1.738(educationGraduate or professional degree, ref less than high school) + - 1.587(smokingCurrent some-day smoker, ref = current everyday smoker) + - 1.990(smokingFormer smoker, ref = current everyday smoker) + - 2.937(smokingNever smoker, ref = current everyday smoker)
physhlth_days (\(\hat{\beta}\) = 0.266): Each additional day of poor physical health is associated with an estimated 0.266 additional mentally unhealthy days on average, holding bmi, sex, exercise, age, income,education, and smoking constant.
bmi(\(\hat{\beta}\) = 0.033): Each one unit increase in bmi, is associated with an estimated 0.033 additional mentally unhealthy days on average, adjusting for all other covariates.
sex: Female vs. Male(reference) (\(\hat{\beta}\) = 1.39): Compared to males (the reference group), females report an estiamted 1.39 more mentally unhealthy days on average, holding all other varaibles constant.
exercise: Yes vs. No(reference) (\(\hat{\beta}\) = -0.651): People who engaged in any physical activity in the past 30 days report an estimated 0.651 fewer mentally unhealthy days compared to those who did not exercise, adjusting for all other covariates.
Age_group 80+ (\(\hat{\beta}\) = -6.820): Compared to 18-24 years olds (reference group), individuals aged 80 and older report 6.820 fewer mentally unhealthy days on average, holding all other variables constant.
Income 15,000 - < 25,000 (ref = less than 15,000) (\(\hat{\beta}\) = -1.637): Compared to the less than 15,000 (reference group), individuals who make 15,000 to 25,000 report 1.637 fewer mentally unhealthy days on average, holding all other variables constant.
Income More than 200,000 (ref = less than 15,000) (\(\hat{\beta}\) = -3.784): Compared to the less than 15,000 (reference group), individuals who make more than 200,000 report 3.784 fewer mentally unhealthy days on average, holding all other variables constant.
• R-squared: proportion of variance in mentally unhealthy days explained by all predictors • Adjusted R-squared: how it differs from R-squared and what it adds • Root MSE (residual standard error): average prediction error of the model • Overall F-test: state H0, F-statistic, degrees of freedom (numerator and denominator), p-value, and conclusion
glance(model_full) |>
select(r.squared, adj.r.squared, sigma, statistic, p.value, df, df.residual, nobs) |>
mutate(across(where(is.numeric), ~ round(., 4))) |>
pivot_longer(everything(), names_to = "Statistic", values_to = "Value") |>
mutate(Statistic = dplyr::recode(Statistic,
"r.squared" = "R²",
"adj.r.squared" = "Adjusted R²",
"sigma" = "Residual Std. Error (Root MSE)",
"statistic" = "F-statistic",
"p.value" = "p-value (overall F-test)",
"df" = "Model df (p)",
"df.residual" = "Residual df (n − p − 1)",
"nobs" = "n (observations)"
)) |>
kable(caption = "Overall Model Summary — Full Model") |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Statistic | Value |
|---|---|
| R² | 0.1853 |
| Adjusted R² | 0.1824 |
| Residual Std. Error (Root MSE) | 7.3516 |
| F-statistic | 62.5234 |
| p-value (overall F-test) | 0.0000 |
| Model df (p) | 29.0000 |
| Residual df (n − p − 1) | 7970.0000 |
| n (observations) | 8000.0000 |
R-squared = 0.1853 It appears that the full model with all predictors explains 18.53% of the variance in mentally unhealthy days. R-squared is not the best to use because it always increases when you add a predictor, even if it is random noise and should not be used to compare models with different numbers of predictors.
Adjusted R-squared = 0.1824 It appears that the full model with all predictors explains 18.24% of the variance in mentally unhealthy days when using adjusted r-squared which penalizes for the number of predictors and only increase when a new predictor improves the model by more than chance alone.
Root MSE (residual standard error) = 7.3516 On average, the full models prediction is off by 7.3156 mentally unhealthy days. Root MSE is the estimated standard deviation of errors and also goes by the residual standard error.
Overall F-test state H0, F-statistic, degrees of freedom (numerator and denominator), p-value, and conclusion
\[H_0: \beta_1 = \beta_2 = \cdots = \beta_p = 0 \quad \text{(no predictor is associated with mentally unhealthy days)}\]
\[H_1: \text{At least one } \beta_j \neq 0\] The f-statistic is 62.5234. The numerator degrees of freedom are 29 and the denominator degrees of freedom are 7970. The p-value is less than .001 for the overall F-test. In this case, we would reject the null hypothesis that no predictor is associated with mentally unhealthy days. We would instead consider the evidence that at least one predictor is significantly associated with mentally unhealthy days (alternative hypothesis). —
# Type III using car::Anova()
anova_type3 <- Anova(model_full, type = "III")
anova_type3 |>
as.data.frame() |>
rownames_to_column("Source") |>
mutate(across(where(is.numeric), ~ round(., 4))) |>
kable(
caption = "Type III (Partial) Sums of Squares — car::Anova()",
col.names = c("Source", "Sum of Sq", "df", "F value", "p-value")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Source | Sum of Sq | df | F value | p-value |
|---|---|---|---|---|
| (Intercept) | 5529.7722 | 1 | 102.3152 | 0.0000 |
| physhlth_days | 37622.7952 | 1 | 696.1198 | 0.0000 |
| bmi | 345.2408 | 1 | 6.3879 | 0.0115 |
| sex | 3723.8662 | 1 | 68.9012 | 0.0000 |
| exercise | 497.0434 | 1 | 9.1966 | 0.0024 |
| age_group | 30092.1774 | 12 | 46.3986 | 0.0000 |
| income | 4733.8943 | 6 | 14.5982 | 0.0000 |
| education | 1265.1504 | 4 | 5.8521 | 0.0001 |
| smoking | 5101.1076 | 3 | 31.4613 | 0.0000 |
| Residuals | 430750.0872 | 7970 | NA | NA |
All predictors have statistically significant partial associations at alpha = 0.05 this includes physhlth_days, bmi, sex, exercise, age_group, income, education, and smoking.
Section 13: Fit a reduced model that includes all predictors except income.
# Reduced model: Without Income
model_reduced <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + education + smoking, data = brfss_analytic)
# Full model: With Income
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + education + smoking + income, data = brfss_analytic)Section 14: Use anova(model_reduced, model_full) to compare the two models.
# Chunk Test Income
anova(model_reduced, model_full) |>
as.data.frame() |>
rownames_to_column("Model") |>
mutate(
Model = c("Reduced (without income)", "Full (+ income)"),
across(where(is.numeric), ~ round(., 4))
) |>
kable(
caption = "Chunk Test: Does income collectively add to the model?",
col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Model | Res. df | RSS | df | Sum of Sq | F | p-value |
|---|---|---|---|---|---|---|
| Reduced (without income) | 7976 | 435484.0 | NA | NA | NA | NA |
| Full (+ income) | 7970 | 430750.1 | 6 | 4733.894 | 14.5982 | 0 |
Section 15: State H0 and HA for this test clearly
\[H_0: \beta_{\text{income}} = 0\text{(income has no effect on the relationship, holding all other variables constant)}\] \[H_1: \beta_{\text{income}} \neq 0\text{(income has a signficant effect on the relationship)}\] Section 16: Report the F-statistic, degrees of freedom, and p-value.
The f-statistic is 14.5982, the numerator degrees of freedom was 6 and the denominator degrees of freedom was 7970. The p-value for this chunk test was less than 0.001.
Section 17: State your conclusion at α = 0.05.
In this case, we would reject the null hypothesis that income does not have a significant effect on the relationship holding all else constant. We would instead consider the evidence that income has a significant effect on the relationship with mentally unhealthy days holding all else constant (alternative hypothesis).
# Reduced model: Without Education
model_reduced <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + smoking, data = brfss_analytic)
# Full model: With Education
model_full <- lm(menthlth_days ~ physhlth_days + bmi + sex + exercise + age_group + income + education + smoking, data = brfss_analytic)# Chunk Test Education
anova(model_reduced, model_full) |>
as.data.frame() |>
rownames_to_column("Model") |>
mutate(
Model = c("Reduced (without education)", "Full (+ education)"),
across(where(is.numeric), ~ round(., 4))
) |>
kable(
caption = "Chunk Test: Does education collectively add to the model?",
col.names = c("Model", "Res. df", "RSS", "df", "Sum of Sq", "F", "p-value")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Model | Res. df | RSS | df | Sum of Sq | F | p-value |
|---|---|---|---|---|---|---|
| Reduced (without education) | 7974 | 432015.2 | NA | NA | NA | NA |
| Full (+ education) | 7970 | 430750.1 | 4 | 1265.15 | 5.8521 | 1e-04 |
Chunk Test Education State H0 and HA for this test clearly
\[H_0: \beta_{\text{education}} = 0\text{(education has no effect on the relationship, holding all other variables constant)}\] \[H_1: \beta_{\text{education}} \neq 0\text{(education has a signficant effect on the relationship)}\] Chunk Test Education Report the F-statistic, degrees of freedom, and p-value.
The f-statistic is 5.8521, the numerator degrees of freedom was 4 and the denominator degrees of freedom was 7970. The p-value for this chunk test was 1e-04 which is less than .001.
Chunk Test Education State your conclusion at α = 0.05.
In this case, we would reject the null hypothesis that education does not have a significant effect on the relationship holding all else constant. We would instead consider the evidence that education has a significant effect on the relationship with mentally unhealthy days holding all else constant (alternative hypothesis).
According to the Type III results, all of the predictors have partial assocaitions that are significant at the α = 0.05 level. The predictor that makes the strongest independent contribution is physhlth_days because it produces the larges Type III sum of square (37622.7952) and the largest f-value (696.1198) and sex also has a relatively large f-value of 68.9012. According to the chunk tests performed, income and education do have a significant effect on the relationship with mentally unhealthy days holding all else constant. Chunk tests are great to use because they can assess a group of variables impact together similar to the omnibus chi-squared test earlier in the semester as well as being able to test all levels of a categorical variable. If using individual t-tests, type I error increases while chunk tests can test everyone at one significance level.
• “Current smoker” includes current daily smokers and current some-day smokers. • “Non-smoker” includes former smokers and never smokers. Use “Non-smoker” as the reference category.
• Model A (main effects only): regress menthlth_days on physhlth_days, bmi, sex, smoker_current, exercise, age_group, income, and education. • Model B (with interaction): same as Model A, but use sex * smoker_current in the formula to include the interaction term.
# Model with Main Effects Only
model_A <- lm(menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex + smoker_current, data = brfss_smoker)
# Model with interaction
model_B <- lm(menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex * smoker_current, data = brfss_smoker)Section 18: Use anova(model_A, model_B) to compare the two models.
# Test for Interaction
anova(model_A, model_B) |>
tidy() |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(caption = "Partial F-test: Is the Sex x Smoker Interaction Significant?") |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| term | df.residual | rss | df | sumsq | statistic | p.value |
|---|---|---|---|---|---|---|
| menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex + smoker_current | 7972 | 432508.9 | NA | NA | NA | NA |
| menthlth_days ~ physhlth_days + bmi + exercise + age_group + income + education + sex * smoker_current | 7971 | 432174.9 | 1 | 333.9749 | 6.1598 | 0.0131 |
Section 19: State H0 and HA for this test.
\[H_0:\beta_3 = 0\text{(slopes are equal, lines are parallel, no interaction between sex and smoker status)}\] \[H_1:\beta_3 \neq 0\text{(slopes differ, interaction is present between sex and smoker status)}\] Section 20: Report the F-statistic, degrees of freedom, and p-value.
The f-statistic is 6.1598, the numerator degrees of freedom are 1, the denominator degrees of freedom ar 7971, and the p-value is 0.0131.
Section 21: State your conclusion at α = 0.05.
Since the p-value is less than the conventional α = 0.05 threshold, we would reject the null hypothesis that the sex slope is equal across all smoker levels and instead cosnider the evidence that an interaction is present between sex and smoker status.
tidy(model_B, conf.int = TRUE) |>
mutate(across(where(is.numeric), \(x) round(x, 4))) |>
kable(
caption = "Interaction Model: Sex * Smoker_Current → Mentally UnHealthy Days",
col.names = c("Term", "Estimate", "SE", "t", "p-value", "CI Lower", "CI Upper")
) |>
kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)| Term | Estimate | SE | t | p-value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 6.8994 | 0.9286 | 7.4301 | 0.0000 | 5.0791 | 8.7196 |
| physhlth_days | 0.2686 | 0.0101 | 26.6788 | 0.0000 | 0.2489 | 0.2883 |
| bmi | 0.0331 | 0.0132 | 2.5017 | 0.0124 | 0.0072 | 0.0590 |
| exerciseYes | -0.6728 | 0.2150 | -3.1301 | 0.0018 | -1.0942 | -0.2515 |
| age_group25-29 | -0.9202 | 0.5196 | -1.7709 | 0.0766 | -1.9388 | 0.0984 |
| age_group30-34 | -0.8924 | 0.5059 | -1.7640 | 0.0778 | -1.8841 | 0.0993 |
| age_group35-39 | -1.5929 | 0.4875 | -3.2673 | 0.0011 | -2.5485 | -0.6372 |
| age_group40-44 | -2.6286 | 0.4856 | -5.4127 | 0.0000 | -3.5806 | -1.6766 |
| age_group45-49 | -2.8425 | 0.4969 | -5.7209 | 0.0000 | -3.8165 | -1.8686 |
| age_group50-54 | -3.2778 | 0.4820 | -6.8012 | 0.0000 | -4.2226 | -2.3331 |
| age_group55-59 | -4.2499 | 0.4740 | -8.9652 | 0.0000 | -5.1791 | -3.3206 |
| age_group60-64 | -4.2640 | 0.4567 | -9.3364 | 0.0000 | -5.1593 | -3.3688 |
| age_group65-69 | -5.2506 | 0.4466 | -11.7563 | 0.0000 | -6.1261 | -4.3751 |
| age_group70-74 | -5.7111 | 0.4544 | -12.5686 | 0.0000 | -6.6018 | -4.8203 |
| age_group75-79 | -5.9076 | 0.4702 | -12.5646 | 0.0000 | -6.8292 | -4.9859 |
| age_group80+ | -6.4995 | 0.4736 | -13.7239 | 0.0000 | -7.4278 | -5.5711 |
| income$15,000 to < $25,000 | -1.6357 | 0.4700 | -3.4804 | 0.0005 | -2.5570 | -0.7144 |
| income$25,000 to < $35,000 | -2.0746 | 0.4486 | -4.6243 | 0.0000 | -2.9540 | -1.1952 |
| income$35,000 to < $50,000 | -2.5455 | 0.4392 | -5.7964 | 0.0000 | -3.4064 | -1.6847 |
| income$50,000 to < $100,000 | -3.0430 | 0.4116 | -7.3935 | 0.0000 | -3.8498 | -2.2362 |
| income$100,000 to < $200,000 | -3.5097 | 0.4300 | -8.1623 | 0.0000 | -4.3526 | -2.6668 |
| incomeMore than $200,000 | -3.8405 | 0.5010 | -7.6651 | 0.0000 | -4.8226 | -2.8583 |
| educationHigh school diploma or GED | 0.1256 | 0.8124 | 0.1546 | 0.8772 | -1.4669 | 1.7180 |
| educationSome college or technical school | 1.1179 | 0.6912 | 1.6172 | 0.1059 | -0.2371 | 2.4729 |
| educationCollege graduate | 1.8179 | 0.6928 | 2.6239 | 0.0087 | 0.4598 | 3.1760 |
| educationGraduate or professional degree | 1.6691 | 0.6943 | 2.4040 | 0.0162 | 0.3081 | 3.0300 |
| sexFemale | 1.1855 | 0.1775 | 6.6784 | 0.0000 | 0.8376 | 1.5335 |
| smoker_currentCurrent Smoker | 1.5208 | 0.3654 | 4.1621 | 0.0000 | 0.8045 | 2.2371 |
| sexFemale:smoker_currentCurrent Smoker | 1.2833 | 0.5171 | 2.4819 | 0.0131 | 0.2697 | 2.2968 |
Section 22: Report the estimated effect of current smoking on mentally unhealthy days among men (the coefficient for smoker_current). Interpret in plain language.
Being a current smoker is associated with 1.5208 more mentally unhealthy days among males holding all else constant.
Section 23: Compute and report the estimated effect among women (smoker_current coefficient plus the interaction coefficient). Interpret in plain language.
Effect for women = smoker_current + interaction term (sexFemale:smoker_current) = 1.5208 + 1.2833 = 2.8041
Being a current smoker is associated with 2.8041 more mentally unhealthy days among women holding all else constant.
Section 24: In 2 to 3 sentences, explain what these estimates mean together: is smoking more strongly associated with poor mental health in one sex than the other, and by how much?
Smoking is more strongly associated with poor mental health days in women, who demonstrate roughly 1.28 more mentally unhealthy days over the past thirty days holding all else constant. Although both sexes appear to have worsened mental health if they are current smokers, the effect is greater in women.
int_B <- ggpredict(model_B, terms = c("smoker_current", "sex"))
ggplot(int_B, aes (x = x, y = predicted, color = group, fill = group, group = group)) +
geom_line(linewidth = 1.2) +
geom_point(size = 2) +
geom_ribbon(aes(ymin = conf.low, ymax = conf.high), alpha = 0.12, color = NA) +
labs(
title = "Predicted Mentally Unhealthy Days by Smoking Status and Sex",
subtitle = "From the interaction model: sex * smoker_curernt",
x = "Current Soking Status",
y = "Predicted Mentally Unhealthy Days (past 30 days)",
color = "Sex",
fill = "Sex"
) +
theme_minimal(base_size = 13) +
scale_color_brewer(palette = "Set2")It appears that current smokers, whether male or female have increased levels of mentally unhealthy days when compared to nonsmokers of the same gender. It is important to note that there is an interaction because females who are current smokers seem to be affected more by smoking status than males are. This is apparent because female mentally health days increases more when going from non-smoker to current smoker than when compared to males going from non-smoker to current smoker. This is a substantive finding because it suggests who public health campaigns to limit smoking should be directed to and focused on (females). In addition, mental health screening for smoking women could be beneficial to understand what societal, environmental, or physical factors may be influencing the difference between men and women. —
Write 250–400 words in continuous prose (not bullet points) addressing all three topics below.
A. Findings and Limitations (6 points)
[What do the results suggest about the determinants of mental health among U.S. adults? Which predictors were most strongly associated? Which were not statistically significant? What are the key limitations of using cross-sectional survey data? Name at least two specific confounders that could bias the associations you estimated.]
The results suggest that mental health among U.S adults depends on the intersection of the determinants of health, and that one predictor alone can not explain nearly as much as when they are combined. Even then physically unhealthy days, BMI, sex, exercise, age, income, education, and smoking can only explain 18.53% of the variance in mentally unhealthy days, meaning there is still room for explaining other variables that were not currently included in the model or random noise. The most strongly associated predictors with partial assocaitions were physhlth_days with a f-value of 696.1198, sex with a f-value of 68.9012, and age_group with a f-value of 46.3986. All of the predictors, including physhlth_days, bmi, sex, exercise, age_group, income, education, and smoking, were significant at the α = 0.05 level. Cross-sectional data are imperative in epidemiological studies; however, in data such as the BRFSS, the information is sometimes collected in familial clusters, leading to possible caveats regarding independence as well as inability to determine causality. Two confounders that could lead to bias in the associations that I discovered could be income and health insurance status which are not presently included in this model and would be worth look into.
B. From Simple to Multiple Regression (4 points)
[What does Adjusted R-squared tell you that regular R-squared does not, and why is this distinction especially important when adding multiple categorical predictors? Why is it useful to test groups of predictors (income, education) collectively with chunk tests, rather than relying only on the individual t-test for each level?]
Regular R-squared can tell the proportion of total variance in Y that is explained by the model; however, this always increases or stays the same when we add predictors to the model, and should not be used to compare models with different numbers of predictors. Adjusted R-squared instead will penalize the number of predictors and only increase when a new predictor improves the model by more than chance alone. This is especially important when adding multiple categorical predictors because the dummy variables can impact the regular R-squared, whereas the adjusted R-squared will deduct for predictors that are not beneficial. It is useful to test groups of predictors with chunk tests because they can test all levels of a categorical variable collectively, instead of conducting individual t-tests, which can increase type I errors.
C. AI Transparency (5 points)
[Describe which parts of the assignment (if any) you sought AI assistance for, what specific prompts you used, and how you verified the AI-assisted output was correct. If you did not use AI, state that explicitly]
Most of my code was from Dr.Masum’s Lecture notes or from my own code from the lab. I did employ AI for two parts of this assignment that I could consider debugging. My first issue surrounded making table one (descriptive statistics), my code would not generate both levels of exercise (yes and no). To get help, I used copilot and used the prompt “I am currently working on an assignment for graduate level statistics two and have already created a descriptive table; however, I would like exercise to be separated into yes and no and the variable is already coded into a factor variable” and I added the current code I had. Copilot suggested to add type = list(exercise ~ “categorical”) to my code that I already had. To verify this was correct I googled what this line of code would add and it explicitly states that this will change the way the variable is received from continuous to factor allowing both levels to be included. This was listed on numerous Rblogs as a way to fit proper table output. The second part I employed copilot for was visualizing the interaction. Originally using code from the lecture, kept supplying me with an error that each variable only had one observation. I entered this prompt I am currently working on an assignment for graduate level statistics two and have already created a plot; however, I would like there to be lines drawn for each level of sex (male and female) depending on smoking status. I inserted the codeI had already written and Copilot suggested adding this line of code (group = group). I again googled and went to rblogs and they suggested that using group-group will allow the points to form a line even though the x was not a continuous variable. —
End of Homework 3