NHATS Financial Strain and Dementia Study

📄 Project Ovierview

This report details the process of constructing an analytic dataset from the National Health and Aging Trends Study (NHATS) to examine the relationship between financial strain and dementia classification. Data were drawn from Rounds 1, 5, 6, and 7 of NHATS and harmonized to build a clean, merged dataset suitable for analysis. We restricted the cohort to sample-person respondents living in the community, excluding those in nursing homes, assisted living facilities, or other institutions.

Round 6 served as the anchor wave because it contained the study’s primary exposure variable, Financial Strain, derived from four items about whether participants skipped meals or struggled to pay for rent, utilities, or medical care. Participants were coded as Any Strain if they endorsed any “Yes,” No Strain only if they answered “No” to all four items, and Missing if any response was refused, unknown, or inapplicable in the absence of any financial strain flag.

Round 7 provided the study’s primary outcome, dementia classification, combining clinician diagnosis with cognitive testing. Participants were classified as Probable Dementia if a diagnosis was confirmed, Possible Dementia if ≥2 cognitive domains (memory, orientation, executive function) were impaired, No Dementia if dementia was ruled out and all domains were intact, or Missing if classification was not possible.

To support covariate completeness, demographic and socioeconomic data were merged from Rounds 1 and 5, with Round 5 values prioritized and Round 1 used to backfill missing values. The resulting FullData dataset retained all Round 6 participants, preserving expected structural missingness from other waves. From this, an analysis dataset called CleanData was created by excluding anyone missing either the exposure or outcome, while retaining explicit “Missing” categories for other covariates to ensure transparency in descriptive analyses and reporting.

# --- Load Libraries ---
library(tidyverse)   # Data wrangling
library(haven)       # Read SAS files
library(janitor)     # Clean variable names
library(skimr)       # Data inspection
library(arsenal)    # For beautiful Table 1s
library(Hmisc)       # for label()

🛠 Helper Definitions

Before I started cleaning NHATS data, I defined a few helper objects and functions to standardize how I handled missing data codes and factors across the dataset.

First, I created vectors for NHATS missing data codes (e.g., -1, -7) and their labels.
Next, I set up a standard Yes/No structure that I reuse for multiple variables.
Finally, I wrote a helper function safe_collapse() that will safely collapse all the special missing codes into one single “Missing” level for any factor variable.

# Missing data codes used across NHATS
special_missing_levels <- c("-1", "-7", "-8", "-9")
special_missing_labels <- c("Inapplicable", "Refused", "Don’t know", "Missing")

# Common Yes/No structure across NHATS
yes_no_levels <- c("1", "2", special_missing_levels)
yes_no_labels <- c("Yes", "No", special_missing_labels)

# Convert special missing codes to numeric for case_when logic
special_missing_numeric <- as.numeric(special_missing_levels)

# 🛠 Helper: Collapse special missing levels into a single “Missing” factor level.
safe_collapse <- function(x, missing_levels = special_missing_labels) {
  if (!is.factor(x)) {
    return(x) # Do not attempt to collapse if it's not a factor
  }
  valid <- intersect(levels(x), missing_levels)
  if (length(valid) > 0) {
    x |> fct_collapse(Missing = valid) |> fct_explicit_na("Missing")
  } else {
    x |> fct_explicit_na("Missing")
  }
}

🧹 Cleaning Round 1 Data (R1)

At this stage of the project, I was focused on cleaning up Round 1 (R1) and Round 5 (R5), which would ultimately provide the covariates for my analysis.
R1’s role here wasn’t to stand alone — it was really meant to supplement R5 where R5 was incomplete. More coming soon here.

Specifically, for R1, I: - Imported the Round 1 SP file from NHATS.
- Selected only the key variables I knew I would need for covariates.
- Filtered down to sample-person respondents living at home, because those are the cases relevant for my analysis.
- Renamed variables to have clearer, more intuitive names for later merging.
- Converted household size and income to numeric values.
- Simplified occupation codes into 6 buckets and converted several fields into factors - Converted -1 inapplicable values from Section 8 Housing variable as no due to skip logic to reduce missing data for this variable.

R1_SP <- read_sas("NHATS_Round_1_SP_File.sas7bdat")


R1_SP_clean <- R1_SP |>

# Selecting the variables of interest
  select(
    spid, is1resptype, r1dresid, r1d2intvrage, r1dgender, el1higstschl,
    lf1doccpctgy, lf1occupaton, hp1ownrentot, lf1workfpay, hp1sec8pubsn,
    ip1cmedicaid, ew1progneed1, ew1progneed2, ew1progneed3, el1hlthchild,
    ia1totinc, hh1dhshldnum, rl1dracehisp
  ) |>
  
# Filtering to sample-person responders living at home
  filter(is1resptype == 1 & r1dresid == 1) |>   
  
# Renaming variables 
  rename(
    Responder1 = is1resptype,
    Residential1 = r1dresid,
    Age1 = r1d2intvrage,
    Gender1 = r1dgender,
    Education1 = el1higstschl,
    Occupation1 = lf1doccpctgy,
    EverWorked1 = lf1occupaton,
    HomeOwnership1 = hp1ownrentot,
    HouseholdSize1 = hh1dhshldnum,
    HouseholdIncome1 = ia1totinc,
    RetirementStatus1 = lf1workfpay,
    Section81 = hp1sec8pubsn,
    Medicaid1 = ip1cmedicaid,
    FoodAssist11 = ew1progneed1,
    FoodAssist21 = ew1progneed2,
    FoodAssist31 = ew1progneed3,
    ChildhoodHealth1 = el1hlthchild,
    RaceEthnicity1 = rl1dracehisp
  ) |>
  mutate(
    
# changing household size and income to numeric variables
    HouseholdSize1 = case_when(
      as.character(HouseholdSize1) %in% special_missing_levels ~ NA_real_,
      TRUE ~ as.numeric(as.character(HouseholdSize1)) # Ensure conversion from potentially factor/character
    ),
    HouseholdIncome1 = case_when(
      as.character(HouseholdIncome1) %in% special_missing_levels ~ NA_real_,
      TRUE ~ as.numeric(as.character(HouseholdIncome1)) # Ensure conversion from potentially factor/character
    ),
  
    
# Simplifying occupation codes into 6 buckets 
    Occupation1 = case_when(
      Occupation1 %in% 1:11 ~ "1",       # Management / Professional
      Occupation1 %in% 12:15 ~ "2",      # Service
      Occupation1 %in% 16:17 ~ "3",      # Sales / Office
      Occupation1 %in% 18:20 ~ "4",      # Construction / Farming
      Occupation1 %in% 21:23 ~ "5",      # Production
      EverWorked1 %in% c(2,3) ~ "6",     # Never worked / Homemaker
      TRUE ~ NA_character_
    ),
    
# Converting variables to labeled factors 
    Responder1 = factor(Responder1, c("1","2"), c("Sample_Person","Proxy")),
    Residential1 = factor(Residential1, c("1","2","3","4","5", special_missing_levels),
                          c("Home/apartment","Retirement community","Assisted living","Nursing home","Other institution", special_missing_labels)),
    Age1 = factor(Age1, c("1","2","3","4","5","6", special_missing_levels),
                  c("65–69","70–74","75–79","80–84","85–89","90+", special_missing_labels)),
    Gender1 = factor(Gender1, c("1","2"), c("Male","Female")),
    Education1 = factor(Education1, c("1","2","3","4","5","6","7","8","9", special_missing_levels),
                        c("No schooling","1–8th grade","9–12 (no diploma)","HS grad",
                          "Vocational","Some college","Associate","Bachelor","Master/PhD", special_missing_labels)),
    Occupation1 = factor(Occupation1, c("1","2","3","4","5","6"),
                         c("Management/Professional","Service","Sales/Office",
                           "Construction/Farming","Production","Homemaker")),
    HomeOwnership1 = factor(HomeOwnership1, c("1","2","3", special_missing_levels),
                            c("Own","Rent","Other", special_missing_labels)),
    RetirementStatus1 = factor(RetirementStatus1, c("1","2","3", special_missing_levels),
                               c("Yes","No","Retired", special_missing_labels)),
    Section81 = factor(Section81, yes_no_levels, yes_no_labels),
    Medicaid1 = factor(Medicaid1, yes_no_levels, yes_no_labels),
    FoodAssist11 = factor(FoodAssist11, yes_no_levels, yes_no_labels),
    FoodAssist21 = factor(FoodAssist21, yes_no_levels, yes_no_labels),
    FoodAssist31 = factor(FoodAssist31, yes_no_levels, yes_no_labels),
    ChildhoodHealth1 = factor(ChildhoodHealth1, c("1","2","3","4","5", special_missing_levels),
                              c("Excellent","Very good","Good","Fair","Poor", special_missing_labels)),
    RaceEthnicity1 = factor(RaceEthnicity1)
  ) |>
  
  # Handling -1s "inapplicable" as "no" for Section 8 housing
  mutate(Section81 = fct_recode(Section81, "No" = "Inapplicable"))

🧽 Cleaning Round 5 Data (R5)

After cleaning up Round 1 (R1), I shifted my focus to Round 5 (R5). This was the core dataset I relied on for the covariates used for analysis, since R5 represents a more current snapshot of the NHATS cohort of interest.

I used R1 mainly as a supplement — but R5 needed a full clean on its own first.

Here’s what I did for R5 (mirroring R1):
- Imported the Round 5 SP file.
- Selected the key variables that would ultimately serve as covariates in my analysis.
- Filtered the dataset to include only sample-person respondents living at home (just as I did for R1).
- Renamed variables for consistency and readability when merging across rounds.
- Converted household size and income into numeric format to handle NHATS’ special missing codes.
- Collapsed the occupation categories into simplified buckets and built out factors for categorical fields like education, marital status, and Section 8 status.
- Ensured everything was factorized and labeled consistently for later merging. - Converted -1 inapplicable values from Section 8 Housing variable as no due to skip logic to reduce missing data for this variable.

R5_SP <- read_sas("NHATS_Round_5_SP_File_v2.sas7bdat")


R5_SP_clean <- R5_SP |>

# Selecting the variables of interest
  select(
    spid, is5resptype, r5dresid, r5dcontnew, r5d2intvrage, r5dgender, el5higstschl,
    lf5doccpctgy, lf5occupaton, hp5ownrentot, lf5workfpay, hp5sec8pubsn,
    ip5cmedicaid, ew5progneed1, ew5progneed2, ew5progneed3, el5hlthchild,
    hh5dmarstat, ia5totinc, hh5dhshldnum, fl5newsample, rl5dracehisp
  ) |>
  
# Filtering to sample-person responders living at home
  filter(is5resptype == 1 & r5dresid == 1) |>
  
# Renaming variables 
  rename(
    Responder5 = is5resptype,
    Residential5 = r5dresid,
    NewPerson = r5dcontnew,
    Age = r5d2intvrage,
    Gender = r5dgender,
    Education = el5higstschl,
    Occupation = lf5doccpctgy,
    EverWorked = lf5occupaton,
    HomeOwnership = hp5ownrentot,
    HouseholdSize = hh5dhshldnum,
    HouseholdIncome = ia5totinc,
    RetirementStatus = lf5workfpay,
    Section8 = hp5sec8pubsn,
    Medicaid = ip5cmedicaid,
    FoodAssist1 = ew5progneed1,
    FoodAssist2 = ew5progneed2,
    FoodAssist3 = ew5progneed3,
    ChildhoodHealth = el5hlthchild,
    MaritalStatus = hh5dmarstat,
    Newsample = fl5newsample,
    RaceEthnicity = rl5dracehisp
  ) |>
  mutate(
    
# changing household size and income to numeric variables
    HouseholdSize = case_when(
      as.character(HouseholdSize) %in% special_missing_levels ~ NA_real_,
      TRUE ~ as.numeric(as.character(HouseholdSize))
    ),
    HouseholdIncome = case_when(
      as.character(HouseholdIncome) %in% special_missing_levels ~ NA_real_,
      TRUE ~ as.numeric(as.character(HouseholdIncome))
    ),
    
# Simplifying occupation codes into 6 buckets 
    Occupation = case_when(
      Occupation %in% 1:11 ~ "1",
      Occupation %in% 12:15 ~ "2",
      Occupation %in% 16:17 ~ "3",
      Occupation %in% 18:20 ~ "4",
      Occupation %in% 21:23 ~ "5",
      EverWorked %in% c(2,3) ~ "6",
      Occupation == -1 ~ "7",
      TRUE ~ NA_character_
    ),
    
# Converting variables to labeled factors 
    Responder5 = factor(Responder5, c("1","2"), c("Sample_Person","Proxy")),
    Residential5 = factor(Residential5, c("1","2","3","4","5", special_missing_levels),
                          c("Home/apartment","Retirement community","Assisted living","Nursing home","Other institution", special_missing_labels)),
    NewPerson = factor(NewPerson, yes_no_levels, yes_no_labels),
    Age = factor(Age, c("1","2","3","4","5","6", special_missing_levels),
                 c("65–69","70–74","75–79","80–84","85–89","90+", special_missing_labels)),
    Gender = factor(Gender, c("1","2"), c("Male","Female")),
    Education = factor(Education, c("1","2","3","4","5","6","7","8","9", special_missing_levels),
                       c("No schooling","1–8th grade","9–12 (no diploma)","HS grad",
                         "Vocational","Some college","Associate","Bachelor","Master/PhD", special_missing_labels)),
    Occupation = factor(Occupation, c("1","2","3","4","5","6","7"),
                        c("Management/Professional","Service","Sales/Office","Construction/Farming","Production","Homemaker","Not working/retired")),
    HomeOwnership = factor(HomeOwnership, c("1","2","3", special_missing_levels),
                           c("Own","Rent","Other", special_missing_labels)),
    RetirementStatus = factor(RetirementStatus, c("1","2","3", special_missing_levels),
                              c("Yes","No","Retired", special_missing_labels)),
    Section8 = factor(Section8, yes_no_levels, yes_no_labels),
    Medicaid = factor(Medicaid, yes_no_levels, yes_no_labels),
    FoodAssist1 = factor(FoodAssist1, yes_no_levels, yes_no_labels),
    FoodAssist2 = factor(FoodAssist2, yes_no_levels, yes_no_labels),
    FoodAssist3 = factor(FoodAssist3, yes_no_levels, yes_no_labels),
    ChildhoodHealth = factor(ChildhoodHealth, c("1","2","3","4","5", special_missing_levels),
                             c("Excellent","Very good","Good","Fair","Poor", special_missing_labels)),
    MaritalStatus = factor(MaritalStatus, c("1","2","3","4","5","6", special_missing_levels),
                           c("Married","Living with Partner","Separated","Divorced","Widowed","Never married", special_missing_labels)),
    Newsample = factor(Newsample, yes_no_levels, yes_no_labels),
    RaceEthnicity = factor(RaceEthnicity)
  ) |>
  
  # Handling -1s "inapplicable" as "no" for Section 8 housing
  mutate(Section8 = fct_recode(Section8, "No" = "Inapplicable"))

🔗 Merging Round 1 and Round 5 Data

Once I had cleaned both R1 and R5, the next step was to merge them together.
Round 5 serves as the main covariate dataset, but I wanted to pull in information from Round 1 to fill in any gaps where R5 data were missing or coded as one of NHATS’s special missing values (like -1, -7, -8, or -9).

Here’s what I did:

I started by using the cleaned versions of both datasets: R5_SP_clean and R1_SP_clean.
I used left_join() to merge R1 into R5, keeping all Round 5 participants (since R5 is my primary round for covariates).
For each covariate, I told R to use the R1 value only if R5’s value was marked as missing.
After filling from R1, I standardized the missingness across the merged dataset by collapsing all special missing codes into one "Missing" category for easier viewing.
Finally, I **kept only the “_final” columns** so the dataset is clean and ready for the next step to integrate with R6 and R7 data.

# Merge Round 5 with Round 1
R5_merged <- R5_SP_clean |> 
  left_join(R1_SP_clean, by = "spid") |> 
  
# Fill from R1 if R5 is special missing (-1, -7, -8, -9)
  mutate(
    Responder_final       = if_else(Responder5 %in% special_missing_labels, Responder1, Responder5),
    Residential_final     = if_else(Residential5 %in% special_missing_labels, Residential1, Residential5),
    Age_final             = if_else(Age %in% special_missing_labels, Age1, Age),
    Gender_final          = if_else(Gender %in% special_missing_labels, Gender1, Gender),
    Education_final       = if_else(Education %in% special_missing_labels, Education1, Education),
    Occupation_final      = if_else(Occupation %in% special_missing_labels, Occupation1, Occupation),
    HomeOwnership_final   = if_else(HomeOwnership %in% special_missing_labels, HomeOwnership1, HomeOwnership),
    HouseholdSize_final   = if_else(HouseholdSize %in% special_missing_labels, HouseholdSize1, HouseholdSize),  
    HouseholdIncome_final = if_else(HouseholdIncome %in% special_missing_labels, HouseholdIncome1, HouseholdIncome),
    RetirementStatus_final = if_else(RetirementStatus %in% special_missing_labels, RetirementStatus1, RetirementStatus),
    Section8_final        = if_else(Section8 %in% special_missing_labels, Section81, Section8),
    Medicaid_final        = if_else(Medicaid %in% special_missing_labels, Medicaid1, Medicaid),
    FoodAssist1_final     = if_else(FoodAssist1 %in% special_missing_labels, FoodAssist11, FoodAssist1),
    FoodAssist2_final     = if_else(FoodAssist2 %in% special_missing_labels, FoodAssist21, FoodAssist2),
    FoodAssist3_final     = if_else(FoodAssist3 %in% special_missing_labels, FoodAssist31, FoodAssist3),
    ChildhoodHealth_final = if_else(ChildhoodHealth %in% special_missing_labels, ChildhoodHealth1, ChildhoodHealth),
    RaceEthnicity_final   = if_else(RaceEthnicity %in% special_missing_labels, RaceEthnicity1, RaceEthnicity),
    
# Include Round 5 only variables
    NewPerson_final       = NewPerson,
    MaritalStatus_final   = MaritalStatus,
    Newsample_final       = Newsample
  )

# Collapse missing levels consistently into one “Missing” level
R5_merged <- R5_merged |> 
  mutate(across(ends_with("_final"), safe_collapse, .names = "{.col}"))

# Keep only final columns for downstream merges
R5_merged <- R5_merged |> 
  select(spid, ends_with("_final"))

🧰 Cleaning Round 6 Data (R6)

After merging R1 and R5, I moved on to Round 6 (R6).
This roundprovided the key exposure variable for my analysis: Financial Strain.

Here’s how I approached cleaning R6:

Imported the Round 6 SP file.
Selected only the 4 variables related to basic needs strain (e.g., skipping meals, struggling to pay rent, utilities, or medical bills).
Filtered the dataset to include only sample-person respondents living at home.
Renamed the variables for clarity (e.g., ew6mealskip1 became SkippedMeals).
Converted these items into factors using the Yes/No coding already defined earlier with the helper functions.
Built a new variable called FinancialStrainFlag:
- Marked participants as “Any Strain” if they answered Yes to any of the four items.
- Marked them as “No Strain” if they explicitly answered No to all four items.
- Then classified participants as “Missing” if there was any missing or inapplicable data in the absence of any single Yes to the four items.

R6_SP <- read_sas("NHATS_Round_6_SP_File_V2.sas7bdat")

R6_SP_clean <- R6_SP |>
  
# Selecting the variables of interest
  select(spid, is6resptype, r6dresid,
         ew6mealskip1, ew6nopayhous, ew6nopayutil, ew6nopaymed) |>
  
#Filtering to sample-person responders living at home
  filter(is6resptype == 1 & r6dresid == 1) |>

# Renaming variables 
  rename(
    SkippedMeals = ew6mealskip1,
    UnableToPayRent = ew6nopayhous,
    UnableToPayUtilities = ew6nopayutil,
    UnableToPayMedical = ew6nopaymed
  ) |>
  
# Convert all 4 financial strain indicators into labeled factors at once
  mutate(
    across(c(SkippedMeals, UnableToPayRent, UnableToPayUtilities, UnableToPayMedical),
           ~ factor(.x, yes_no_levels, yes_no_labels)),
    
# Creating new Financial Strain Flag variable
    FinancialStrainFlag = case_when(
      
# If ANY of the four is YES, classify as Any Strain
      SkippedMeals == "Yes" | UnableToPayRent == "Yes" |
        UnableToPayUtilities == "Yes" | UnableToPayMedical == "Yes" ~ "Any Strain",
      
# If ALL FOUR are explicitly NO, classify as No Strain
      SkippedMeals == "No" & UnableToPayRent == "No" &
        UnableToPayUtilities == "No" & UnableToPayMedical == "No" ~ "No Strain",
      
# If we get here, at least one answer is missing/inapplicable/refused
      TRUE ~ "Missing"
    ) |> 
  
# Treat Financialstrainflag as a factor variable with three levls
  factor(c("No Strain", "Any Strain", "Missing"))
  )

🧴 Cleaning Round 7 Data (R7)

Next, I turned to Round 7 (R7), used to create the dementia classification outcome in the analysis.

R7 contained both: - A clinician‑verified dementia diagnosis field. - Multiple cognitive test results across three domains: memory, orientation, and executive function.

Here’s what I did step by step:

Imported the R7 SP file and selected the variables I needed for cognition and dementia classification.
Filtered the data to keep only participants living at home, since those are my analytic population. However, I allowed for proxy responders to account for individuals who may have developed dementia and were unable to answer since r6.
Renamed variables so their purpose was clear (e.g., cg7dwrdimmrc became MemoryImmediate).
Calculated a memory score by combining immediate and delayed recall scores (assigning 0 if a test result was missing).
Calculated an orientation score by re-codeing orientation items (like date, president, and vice president questions) into correct/incorrect and then summarized the scores.
Flagged impairment in executive function based on the clock‑draw task.
Counted how many of the three domains (memory, orientation, executive) were impaired for each participant.
Created the final dementia classification (dementia_class) using this logic:
- Probable Dementia if the NHATS clinician diagnosis confirmed dementia.
- Possible Dementia if there was no formal diagnosis but two or more domains were impaired.
- No Dementia if the diagnosis explicitly ruled it out and all domains were sufficiently intact no missing().
- Missing if Probable Dementia was not confirmed, and there was any missing data related to the memory, orientation score, or executive function domains.

R7_SP <- read_sas("NHATS_Round_7_SP_File.sas7bdat")

R7_SP_clean <- R7_SP |>
  
# Select variables of interest
  select(spid, is7resptype, r7dresid,
         hc7disescn9, cg7dwrdimmrc, cg7dwrddlyrc,
         cg7todaydat1, cg7todaydat2, cg7todaydat3, cg7todaydat4,
         cg7presidna1, cg7presidna3, cg7vpname1, cg7vpname3,
         cg7dclkdraw) |>
  
# Filter to only community dwelling respondents, but did not filter out proxy respondents this time. 
  filter(r7dresid == 1) |>   
  
# Rename variables
  rename(
    DementiaDx = hc7disescn9,           # Clinician-verified dementia diagnosis
    MemoryImmediate = cg7dwrdimmrc,     # Immediate recall
    MemoryDelayed = cg7dwrddlyrc,       # Delayed recall
    ExecutiveDraw = cg7dclkdraw         # Clock draw
  ) |>

  mutate(
# Calculate 🧠 MEMORY SCORE: Immediate + Delayed Recall
# If either is missing, we assign 0 points for that part (as NHATS coding convention).
    memory_score = if_else(MemoryImmediate >= 0, MemoryImmediate, 0) +
      if_else(MemoryDelayed >= 0, MemoryDelayed, 0),
    memory_impaired = memory_score <= 3,  # Binary: <=3 points means impaired
    
# Calculate 🧭 ORIENTATION SCORE: Recode correct/incorrect items (1=correct, 2=incorrect)
    across(c(cg7todaydat1, cg7todaydat2, cg7todaydat3, cg7todaydat4,
             cg7presidna1, cg7presidna3, cg7vpname1, cg7vpname3),
           ~ case_when(.x == 2 ~ 1,     # Incorrect = 1 point (impaired)
                       .x == 1 ~ 0,     # Correct = 0 points
                       TRUE ~ NA_real_), 
           .names = "{.col}_rec"),
    
    orientation_score = rowSums(across(ends_with("_rec")), na.rm = TRUE),
    orientation_impaired = orientation_score >= 5,   # ≥5 wrong answers = impaired
    
# Calculate 🕰 EXECUTIVE FUNCTION: Clock draw task
    exec_impaired = ExecutiveDraw %in% c(0, 1),  # 0/1 scores indicate impairment
    
# 🏷 Count how many domains (memory/orientation/executive) are impaired
    impaired_domains = rowSums(across(c(memory_impaired, orientation_impaired, exec_impaired)), 
                               na.rm = TRUE),
    
#  Classify dementia 
    dementia_class = case_when(
      
  # 1️⃣ **Probable Dementia**: If the NHATS clinician diagnosis says so, trust it
      DementiaDx == 1 ~ "Probable Dementia",
      
  # 2️⃣ **Missing**: If DementiaDx is missing/refused/inapplicable (-1, -7, -8, -9)
      DementiaDx %in% special_missing_numeric ~ "Missing",
      
  # 3️⃣ **Possible Dementia**: No formal diagnosis, but ≥2 impaired domains
      DementiaDx != 1 & impaired_domains >= 2 ~ "Possible Dementia",
      
  # 4️⃣ **No Dementia**: Explicit diagnosis of “No dementia” and all three domains are fully observed (none missing) AND <2 impaired domains
      DementiaDx == 2 & impaired_domains < 2 &
        !is.na(memory_impaired) & !is.na(orientation_impaired) & !is.na(exec_impaired) ~ "No Dementia",
      
  # 5️⃣ **Missing**: Anything else (e.g., partial data, can’t confidently classify)
      TRUE ~ "Missing"
  
  # Treat Dementia class as a factor variable with four levls
    ) |> factor(c("No Dementia", "Possible Dementia", "Probable Dementia", "Missing"))
  )

🏗 Final Merge: Combining All Rounds

With all four rounds cleaned, I moved on to the final merge step.
This was where I brought everything together into one unified dataset for analysis.

Here’s what I did:

Started with Round 6 (R6) as the base since it contained the exposure variable (Financial Strain).
Left‑joined the merged R1/R5 dataset so I could bring in all of the covariates.
Left‑joined Round 7 (R7) to add the dementia classification outcome.
Saved that combined dataset as FullData, which keeps everyone from R6 — even participants with missing exposures or outcomes.

Then I created CleanData. This step was more than just removing missing values — it also filtered out people who couldn’t contribute to the analysis.

I filtered out anyone without usable exposure or outcome data:
- If FinancialStrainFlag was coded as "Missing" or set to true missing (NA), they were removed.
- If dementia_class was "Missing" or NA, they were removed too.

Importantly, this means that participants without Round 7 data (and therefore no dementia classification at all) were also excluded, since they wouldn’t meet the requirement of having a non-missing dementia_class.

This filtering step ensured that CleanData only included participants with both: - A valid financial strain classification from R6, and
- A valid dementia classification from R7.

# FullData keeps everyone from R6, even if FinancialStrainFlag or dementia_class = "Missing"
FullData <- R6_SP_clean |>
  left_join(R5_merged, by = "spid") |>
  left_join(R7_SP_clean, by = "spid")

# CleanData removes people missing either the exposure (FinancialStrainFlag) or outcome (dementia_class)
CleanData <- FullData |>
  filter(
    !is.na(FinancialStrainFlag),       # drop rows that are NA (true missing)
    !is.na(dementia_class),            # drop rows that are NA (true missing)
    FinancialStrainFlag != "Missing",  # drop coded "Missing" exposures
    dementia_class != "Missing"        # drop coded "Missing" outcomes
  )

🔍 Inspecting Missingness in FullData and CleanData

Before doing any analysis, I skimmed both FullData and CleanData to understand the pattern of missing values and where they come from.

skim(FullData)

Data summary
Name	FullData
Number of rows	5628
Number of columns	57
_______________________
Column type frequency:
factor	24
logical	3
numeric	30
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
SkippedMeals	0	1.00	FALSE	4	No: 5522, Yes: 58, Don: 37, Ref: 11
UnableToPayRent	0	1.00	FALSE	4	No: 5457, Yes: 111, Don: 41, Ref: 19
UnableToPayUtilities	0	1.00	FALSE	4	No: 5388, Yes: 182, Don: 42, Ref: 16
UnableToPayMedical	0	1.00	FALSE	4	No: 5389, Yes: 181, Don: 42, Ref: 16
FinancialStrainFlag	0	1.00	FALSE	3	No : 5219, Any: 343, Mis: 66
Responder_final	45	0.99	FALSE	1	Sam: 5583, Pro: 0
Residential_final	45	0.99	FALSE	1	Hom: 5583, Ret: 0, Ass: 0, Nur: 0
Age_final	45	0.99	FALSE	6	70–: 1468, 75–: 1271, 80–: 1009, 65–: 856
Gender_final	45	0.99	FALSE	2	Fem: 3190, Mal: 2393
Education_final	45	0.99	FALSE	10	HS : 1450, Som: 800, Bac: 743, Mas: 722
Occupation_final	45	0.99	FALSE	8	Not: 2736, Man: 1017, Sal: 541, Pro: 458
HomeOwnership_final	45	0.99	FALSE	4	Own: 4178, Ren: 872, Oth: 426, Mis: 107
RetirementStatus_final	45	0.99	FALSE	4	Ret: 2645, No: 2081, Yes: 754, Mis: 103
Section8_final	45	0.99	FALSE	3	No: 5307, Yes: 263, Mis: 13
Medicaid_final	45	0.99	FALSE	3	No: 4746, Yes: 699, Mis: 138
FoodAssist1_final	45	0.99	FALSE	3	No: 5011, Yes: 457, Mis: 115
FoodAssist2_final	45	0.99	FALSE	3	No: 5318, Yes: 149, Mis: 116
FoodAssist3_final	45	0.99	FALSE	3	No: 5123, Yes: 343, Mis: 117
ChildhoodHealth_final	45	0.99	FALSE	6	Exc: 2685, Ver: 1481, Goo: 920, Fai: 275
RaceEthnicity_final	45	0.99	FALSE	6	1: 3898, 2: 1136, 4: 314, 3: 132
NewPerson_final	45	0.99	FALSE	2	No: 2847, Yes: 2736, Mis: 0
MaritalStatus_final	45	0.99	FALSE	7	Mar: 2745, Wid: 1702, Div: 691, Nev: 209
Newsample_final	45	0.99	FALSE	2	Yes: 2847, Mis: 2736, No: 0
dementia_class	705	0.87	FALSE	4	No : 4506, Pos: 202, Mis: 130, Pro: 85

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
memory_impaired	705	0.87	0.11	FAL: 4384, TRU: 539
orientation_impaired	705	0.87	0.07	FAL: 4603, TRU: 320
exec_impaired	705	0.87	0.04	FAL: 4741, TRU: 182

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
spid	0	1.00	15095468.64	4998327.09	1e+07	10006283	20000129	20003671	20007119	▇▁▁▁▇
is6resptype	0	1.00	1.00	0.00	1e+00	1	1	1	1	▁▁▇▁▁
r6dresid	0	1.00	1.00	0.00	1e+00	1	1	1	1	▁▁▇▁▁
HouseholdSize_final	45	0.99	1.99	1.06	1e+00	1	2	2	11	▇▁▁▁▁
HouseholdIncome_final	2317	0.59	67394.31	477048.95	0e+00	19000	35000	67000	25000000	▇▁▁▁▁
is7resptype	705	0.87	1.01	0.12	1e+00	1	1	1	2	▇▁▁▁▁
r7dresid	705	0.87	1.00	0.00	1e+00	1	1	1	1	▁▁▇▁▁
DementiaDx	705	0.87	2.16	0.95	-8e+00	2	2	2	7	▁▁▁▇▁
MemoryImmediate	705	0.87	4.74	2.06	-7e+00	4	5	6	10	▁▁▂▇▂
MemoryDelayed	705	0.87	3.45	2.31	-7e+00	2	4	5	9	▁▁▅▇▂
cg7todaydat1	705	0.87	1.04	0.28	-1e+00	1	1	1	2	▁▁▁▇▁
cg7todaydat2	705	0.87	1.23	0.46	-1e+00	1	1	1	2	▁▁▁▇▂
cg7todaydat3	705	0.87	1.05	0.29	-1e+00	1	1	1	2	▁▁▁▇▁
cg7todaydat4	705	0.87	1.04	0.27	-1e+00	1	1	1	2	▁▁▁▇▁
cg7presidna1	705	0.87	1.04	0.41	-7e+00	1	1	1	2	▁▁▁▁▇
cg7presidna3	705	0.87	1.14	0.50	-7e+00	1	1	1	2	▁▁▁▁▇
cg7vpname1	705	0.87	1.38	0.57	-7e+00	1	1	2	2	▁▁▁▁▇
cg7vpname3	705	0.87	1.69	0.56	-7e+00	1	2	2	2	▁▁▁▁▇
ExecutiveDraw	705	0.87	3.65	1.61	-9e+00	3	4	5	5	▁▁▁▁▇
memory_score	705	0.87	8.32	3.67	0e+00	6	9	11	19	▂▆▇▃▁
cg7todaydat1_rec	734	0.87	0.06	0.23	0e+00	0	0	0	1	▇▁▁▁▁
cg7todaydat2_rec	734	0.87	0.24	0.43	0e+00	0	0	0	1	▇▁▁▁▂
cg7todaydat3_rec	734	0.87	0.06	0.24	0e+00	0	0	0	1	▇▁▁▁▁
cg7todaydat4_rec	734	0.87	0.05	0.23	0e+00	0	0	0	1	▇▁▁▁▁
cg7presidna1_rec	752	0.87	0.07	0.25	0e+00	0	0	0	1	▇▁▁▁▁
cg7presidna3_rec	752	0.87	0.17	0.37	0e+00	0	0	0	1	▇▁▁▁▂
cg7vpname1_rec	750	0.87	0.40	0.49	0e+00	0	0	1	1	▇▁▁▁▆
cg7vpname3_rec	750	0.87	0.72	0.45	0e+00	0	1	1	1	▃▁▁▁▇
orientation_score	705	0.87	1.76	1.63	0e+00	1	1	2	8	▇▆▁▁▁
impaired_domains	705	0.87	0.21	0.54	0e+00	0	0	0	3	▇▁▁▁▁

What I saw in FullData:

705 missing values for many Round 7 variables (e.g., dementia_class, MemoryImmediate, MemoryDelayed):
These are people from Round 6 who never had a Round 7 interview at all (not surveyed, not alive, or otherwise not present in R7).
Because R7_SP_clean was left-joined to R6, those rows remain, but every R7 variable is NA.
45 missing values for many R1/R5-derived covariates (e.g., Education_final, Occupation_final):
These are participants in R6 who weren’t present in R1 or R5.
The left_join() kept them, but all R1/R5 variables stayed blank (NA).
2,317 missing values for HouseholdIncome_final: *******************I NEED TO FIX THIS *************************** This isn’t from missing interviews — it’s from NHATS survey coding.
Some participants refused to answer, didn’t know, or weren’t asked, and those responses are collapsed to NA.
FullData is designed this way — it keeps everyone from Round 6 so nothing is lost prematurely, which means we expect a lot of structural missingness from other rounds.

Now let’s have a look at the “Cleandata” set

skim(CleanData)

Data summary
Name	CleanData
Number of rows	4756
Number of columns	57
_______________________
Column type frequency:
factor	24
logical	3
numeric	30
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
SkippedMeals	0	1.00	FALSE	2	No: 4709, Yes: 47, Ina: 0, Ref: 0
UnableToPayRent	0	1.00	FALSE	2	No: 4663, Yes: 93, Ina: 0, Ref: 0
UnableToPayUtilities	0	1.00	FALSE	3	No: 4601, Yes: 154, Ref: 1, Ina: 0
UnableToPayMedical	0	1.00	FALSE	2	No: 4604, Yes: 152, Ina: 0, Ref: 0
FinancialStrainFlag	0	1.00	FALSE	2	No : 4469, Any: 287, Mis: 0
Responder_final	30	0.99	FALSE	1	Sam: 4726, Pro: 0
Residential_final	30	0.99	FALSE	1	Hom: 4726, Ret: 0, Ass: 0, Nur: 0
Age_final	30	0.99	FALSE	6	70–: 1305, 75–: 1102, 80–: 842, 65–: 754
Gender_final	30	0.99	FALSE	2	Fem: 2701, Mal: 2025
Education_final	30	0.99	FALSE	10	HS : 1208, Som: 697, Bac: 636, Mas: 636
Occupation_final	30	0.99	FALSE	8	Not: 2314, Man: 872, Sal: 463, Pro: 384
HomeOwnership_final	30	0.99	FALSE	4	Own: 3597, Ren: 712, Oth: 342, Mis: 75
RetirementStatus_final	30	0.99	FALSE	4	Ret: 2218, No: 1749, Yes: 686, Mis: 73
Section8_final	30	0.99	FALSE	3	No: 4503, Yes: 215, Mis: 8
Medicaid_final	30	0.99	FALSE	3	No: 4050, Yes: 576, Mis: 100
FoodAssist1_final	30	0.99	FALSE	3	No: 4254, Yes: 392, Mis: 80
FoodAssist2_final	30	0.99	FALSE	3	No: 4530, Yes: 115, Mis: 81
FoodAssist3_final	30	0.99	FALSE	3	No: 4347, Yes: 296, Mis: 83
ChildhoodHealth_final	30	0.99	FALSE	6	Exc: 2296, Ver: 1266, Goo: 768, Fai: 227
RaceEthnicity_final	30	0.99	FALSE	6	1: 3303, 2: 987, 4: 255, 3: 109
NewPerson_final	30	0.99	FALSE	2	No: 2412, Yes: 2314, Mis: 0
MaritalStatus_final	30	0.99	FALSE	6	Mar: 2364, Wid: 1387, Div: 588, Nev: 189
Newsample_final	30	0.99	FALSE	2	Yes: 2412, Mis: 2314, No: 0
dementia_class	0	1.00	FALSE	3	No : 4473, Pos: 199, Pro: 84, Mis: 0

Variable type: logical

skim_variable	complete_rate	mean	count
memory_impaired	1	0.10	FAL: 4261, TRU: 495
orientation_impaired	1	0.06	FAL: 4448, TRU: 308
exec_impaired	1	0.04	FAL: 4581, TRU: 175

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
spid	0	1.00	15103687.19	4998232.90	1e+07	10006363.2	20000129	20003683	20007119	▇▁▁▁▇
is6resptype	0	1.00	1.00	0.00	1e+00	1.0	1	1	1	▁▁▇▁▁
r6dresid	0	1.00	1.00	0.00	1e+00	1.0	1	1	1	▁▁▇▁▁
HouseholdSize_final	30	0.99	1.98	1.05	1e+00	1.0	2	2	11	▇▁▁▁▁
HouseholdIncome_final	1885	0.60	70141.54	511528.88	0e+00	19638.5	36000	70000	25000000	▇▁▁▁▁
is7resptype	0	1.00	1.01	0.10	1e+00	1.0	1	1	2	▇▁▁▁▁
r7dresid	0	1.00	1.00	0.00	1e+00	1.0	1	1	1	▁▁▇▁▁
DementiaDx	0	1.00	2.03	0.50	1e+00	2.0	2	2	7	▇▁▁▁▁
MemoryImmediate	0	1.00	4.79	2.01	-7e+00	4.0	5	6	10	▁▁▂▇▂
MemoryDelayed	0	1.00	3.50	2.29	-7e+00	2.0	4	5	9	▁▁▅▇▂
cg7todaydat1	0	1.00	1.04	0.26	-1e+00	1.0	1	1	2	▁▁▁▇▁
cg7todaydat2	0	1.00	1.23	0.45	-1e+00	1.0	1	1	2	▁▁▁▇▂
cg7todaydat3	0	1.00	1.05	0.28	-1e+00	1.0	1	1	2	▁▁▁▇▁
cg7todaydat4	0	1.00	1.04	0.26	-1e+00	1.0	1	1	2	▁▁▁▇▁
cg7presidna1	0	1.00	1.05	0.37	-7e+00	1.0	1	1	2	▁▁▁▁▇
cg7presidna3	0	1.00	1.15	0.47	-7e+00	1.0	1	1	2	▁▁▁▁▇
cg7vpname1	0	1.00	1.38	0.55	-7e+00	1.0	1	2	2	▁▁▁▁▇
cg7vpname3	0	1.00	1.69	0.52	-7e+00	1.0	2	2	2	▁▁▁▁▇
ExecutiveDraw	0	1.00	3.70	1.48	-9e+00	3.0	4	5	5	▁▁▁▁▇
memory_score	0	1.00	8.41	3.64	0e+00	6.0	9	11	19	▂▅▇▃▁
cg7todaydat1_rec	22	1.00	0.05	0.22	0e+00	0.0	0	0	1	▇▁▁▁▁
cg7todaydat2_rec	22	1.00	0.24	0.43	0e+00	0.0	0	0	1	▇▁▁▁▂
cg7todaydat3_rec	22	1.00	0.06	0.24	0e+00	0.0	0	0	1	▇▁▁▁▁
cg7todaydat4_rec	22	1.00	0.05	0.22	0e+00	0.0	0	0	1	▇▁▁▁▁
cg7presidna1_rec	29	0.99	0.07	0.25	0e+00	0.0	0	0	1	▇▁▁▁▁
cg7presidna3_rec	29	0.99	0.17	0.37	0e+00	0.0	0	0	1	▇▁▁▁▂
cg7vpname1_rec	27	0.99	0.40	0.49	0e+00	0.0	0	1	1	▇▁▁▁▅
cg7vpname3_rec	27	0.99	0.71	0.45	0e+00	0.0	1	1	1	▃▁▁▁▇
orientation_score	0	1.00	1.73	1.62	0e+00	1.0	1	2	8	▇▆▁▁▁
impaired_domains	0	1.00	0.21	0.54	0e+00	0.0	0	0	3	▇▁▁▁▁

What I saw in CleanData:

All participants missing the exposure (FinancialStrainFlag) or outcome (dementia_class) were removed.
That’s why there are 0 missing for those two key variables here.
Only ~30 missing values remain for R1/R5-derived covariates:
These are likely the same participants who had no data in R1 or R5.
They stayed because they had R6 exposure and R7 outcome data, but we couldn’t backfill their covariates.
22–29 missing values on the _rec cognition items (e.g., cg7todaydat1_rec):
These are mostly due to item-level missingness in R7.
Some respondents skipped or refused certain test items, and NHATS coded those as special missing values (e.g., -7, -8), which you collapsed to NA.
1,885 missing for HouseholdIncome_final: ********* I NEED TO FIX THIS ********************* Same story as in FullData — this is a survey artifact (refusals, “don’t know,” or skipped questions).
CleanData dramatically reduced missingness in the key analytic variables (exposure and outcome) but kept “real-world” missingness for some covariates and individual test items. This is expected in a complex survey merge and doesn’t need to be over-cleaned — it’s just important to explain.

📊 Building Table 1

To summarize the characteristics of my dataset, I created a Table 1 for Cleandata using the arsenal package.

I wanted this table to compare participants by Financial Strain Flag across key demographic and health variables. Before building the table, I:

Labeled the variables in Cleandata so the table would display readable column names instead of raw variable names.
Specified which covariates should appear as factors (e.g., Gender_final, Education_final) and which should remain numeric (e.g., HouseholdSize_final, HouseholdIncome_final).
Chose summary statistics for numeric variables — here I displayed the median and standard deviation rather than the mean, since income and household size are often skewed.

Finally, I ran tableby() to generate the table, and then used summary()

## 📊 Building Table 1 with Arsenal (Hmisc Label Method)

# Step 1: Assign pretty labels to all variables in CleanData
label(CleanData$FinancialStrainFlag)    <- "Financial Strain Flag"
label(CleanData$dementia_class)         <- "Dementia Classification"
label(CleanData$Age_final)              <- "Age Group"
label(CleanData$Gender_final)           <- "Gender"
label(CleanData$Education_final)        <- "Education Level"
label(CleanData$Occupation_final)       <- "Occupation Category"
label(CleanData$HomeOwnership_final)    <- "Home Ownership"
label(CleanData$HouseholdSize_final)    <- "Household Size"
label(CleanData$HouseholdIncome_final)  <- "Household Income"
label(CleanData$Section8_final)         <- "Receives Section 8"
label(CleanData$Medicaid_final)         <- "Receives Medicaid"
label(CleanData$ChildhoodHealth_final)  <- "Childhood Health Status"
label(CleanData$RaceEthnicity_final)    <- "Race and Ethnicity"
label(CleanData$MaritalStatus_final)    <- "Marital Status"

# Step 2: Build Table 1 comparing participants by Financial Strain Flag
tab1 <- tableby(
  FinancialStrainFlag ~
    dementia_class +
    Age_final +
    Gender_final +
    Education_final +
    Occupation_final +
    HomeOwnership_final +
    HouseholdSize_final +
    HouseholdIncome_final +
    Section8_final +
    Medicaid_final +
    ChildhoodHealth_final +
    RaceEthnicity_final +
    MaritalStatus_final,
  data = CleanData,
  numeric.stats = c("median", "sd")
)

# Step 3: Output summary (labels will now appear automatically)
summary(tab1, text = TRUE)

	No Strain (N=4469)	Any Strain (N=287)	Total (N=4756)	p value
Dementia Classification
- No Dementia	4218 (94.4%)	255 (88.9%)	4473 (94.0%)
- Possible Dementia	174 (3.9%)	25 (8.7%)	199 (4.2%)
- Probable Dementia	77 (1.7%)	7 (2.4%)	84 (1.8%)
- Missing	0 (0.0%)	0 (0.0%)	0 (0.0%)
Age Group
- N-Miss	27	3	30
- 65–69	691 (15.6%)	63 (22.2%)	754 (16.0%)
- 70–74	1209 (27.2%)	96 (33.8%)	1305 (27.6%)
- 75–79	1035 (23.3%)	67 (23.6%)	1102 (23.3%)
- 80–84	803 (18.1%)	39 (13.7%)	842 (17.8%)
- 85–89	495 (11.1%)	15 (5.3%)	510 (10.8%)
- 90+	209 (4.7%)	4 (1.4%)	213 (4.5%)
- Missing	0 (0.0%)	0 (0.0%)	0 (0.0%)
Gender				0.039
- N-Miss	27	3	30
- Male	1920 (43.2%)	105 (37.0%)	2025 (42.8%)
- Female	2522 (56.8%)	179 (63.0%)	2701 (57.2%)
Education Level				< 0.001
- N-Miss	27	3	30
- No schooling	16 (0.4%)	2 (0.7%)	18 (0.4%)
- 1–8th grade	304 (6.8%)	49 (17.3%)	353 (7.5%)
- 9–12 (no diploma)	455 (10.2%)	55 (19.4%)	510 (10.8%)
- HS grad	1139 (25.6%)	69 (24.3%)	1208 (25.6%)
- Vocational	320 (7.2%)	18 (6.3%)	338 (7.2%)
- Some college	665 (15.0%)	32 (11.3%)	697 (14.7%)
- Associate	224 (5.0%)	19 (6.7%)	243 (5.1%)
- Bachelor	613 (13.8%)	23 (8.1%)	636 (13.5%)
- Master/PhD	623 (14.0%)	13 (4.6%)	636 (13.5%)
- Missing	83 (1.9%)	4 (1.4%)	87 (1.8%)
Occupation Category				< 0.001
- N-Miss	27	3	30
- Management/Professional	832 (18.7%)	40 (14.1%)	872 (18.5%)
- Service	175 (3.9%)	27 (9.5%)	202 (4.3%)
- Sales/Office	433 (9.7%)	30 (10.6%)	463 (9.8%)
- Construction/Farming	210 (4.7%)	16 (5.6%)	226 (4.8%)
- Production	352 (7.9%)	32 (11.3%)	384 (8.1%)
- Homemaker	127 (2.9%)	7 (2.5%)	134 (2.8%)
- Not working/retired	2190 (49.3%)	124 (43.7%)	2314 (49.0%)
- Missing	123 (2.8%)	8 (2.8%)	131 (2.8%)
Home Ownership				< 0.001
- N-Miss	27	3	30
- Own	3448 (77.6%)	149 (52.5%)	3597 (76.1%)
- Rent	605 (13.6%)	107 (37.7%)	712 (15.1%)
- Other	319 (7.2%)	23 (8.1%)	342 (7.2%)
- Missing	70 (1.6%)	5 (1.8%)	75 (1.6%)
Household Size				0.077
- Median	2.000	2.000	2.000
- SD	1.030	1.365	1.053
Household Income				0.188
- Median	40000.000	15000.000	36000.000
- SD	528553.118	26876.641	511528.883
Receives Section 8				< 0.001
- N-Miss	27	3	30
- Yes	180 (4.1%)	35 (12.3%)	215 (4.5%)
- No	4256 (95.8%)	247 (87.0%)	4503 (95.3%)
- Missing	6 (0.1%)	2 (0.7%)	8 (0.2%)
Receives Medicaid				< 0.001
- N-Miss	27	3	30
- Yes	488 (11.0%)	88 (31.0%)	576 (12.2%)
- No	3864 (87.0%)	186 (65.5%)	4050 (85.7%)
- Missing	90 (2.0%)	10 (3.5%)	100 (2.1%)
Childhood Health Status				< 0.001
- N-Miss	27	3	30
- Excellent	2181 (49.1%)	115 (40.5%)	2296 (48.6%)
- Very good	1193 (26.9%)	73 (25.7%)	1266 (26.8%)
- Good	708 (15.9%)	60 (21.1%)	768 (16.3%)
- Fair	207 (4.7%)	20 (7.0%)	227 (4.8%)
- Poor	72 (1.6%)	14 (4.9%)	86 (1.8%)
- Missing	81 (1.8%)	2 (0.7%)	83 (1.8%)
Race and Ethnicity				< 0.001
- N-Miss	27	3	30
- 1	3209 (72.2%)	94 (33.1%)	3303 (69.9%)
- 2	851 (19.2%)	136 (47.9%)	987 (20.9%)
- 3	94 (2.1%)	15 (5.3%)	109 (2.3%)
- 4	223 (5.0%)	32 (11.3%)	255 (5.4%)
- 5	3 (0.1%)	1 (0.4%)	4 (0.1%)
- 6	62 (1.4%)	6 (2.1%)	68 (1.4%)
Marital Status
- N-Miss	27	3	30
- Married	2275 (51.2%)	89 (31.3%)	2364 (50.0%)
- Living with Partner	104 (2.3%)	9 (3.2%)	113 (2.4%)
- Separated	69 (1.6%)	16 (5.6%)	85 (1.8%)
- Divorced	529 (11.9%)	59 (20.8%)	588 (12.4%)
- Widowed	1297 (29.2%)	90 (31.7%)	1387 (29.3%)
- Never married	168 (3.8%)	21 (7.4%)	189 (4.0%)
- Missing	0 (0.0%)	0 (0.0%)	0 (0.0%)