This report details the process of constructing an analytic dataset from the National Health and Aging Trends Study (NHATS) to examine the relationship between financial strain and dementia classification. Data were drawn from Roundsβ―1, 5, 6, and 7 of NHATS and harmonized to build a clean, merged dataset suitable for analysis. We restricted the cohort to sample-person respondents living in the community, excluding those in nursing homes, assisted living facilities, or other institutions.
Roundβ―6 served as the anchor wave because it contained the studyβs primary exposure variable, Financial Strain, derived from four items about whether participants skipped meals or struggled to pay for rent, utilities, or medical care. Participants were coded as Any Strain if they endorsed any βYes,β No Strain only if they answered βNoβ to all four items, and Missing if any response was refused, unknown, or inapplicable in the absence of any financial strain flag.
Roundβ―7 provided the studyβs primary outcome, dementia classification, combining clinician diagnosis with cognitive testing. Participants were classified as Probable Dementia if a diagnosis was confirmed, Possible Dementia if β₯2 cognitive domains (memory, orientation, executive function) were impaired, No Dementia if dementia was ruled out and all domains were intact, or Missing if classification was not possible.
To support covariate completeness, demographic and socioeconomic data were merged from Roundsβ―1 and 5, with Roundβ―5 values prioritized and Roundβ―1 used to backfill missing values. The resulting FullData dataset retained all Roundβ―6 participants, preserving expected structural missingness from other waves. From this, an analysis dataset called CleanData was created by excluding anyone missing either the exposure or outcome, while retaining explicit βMissingβ categories for other covariates to ensure transparency in descriptive analyses and reporting.
# --- Load Libraries ---
library(tidyverse) # Data wrangling
library(haven) # Read SAS files
library(janitor) # Clean variable names
library(skimr) # Data inspection
library(arsenal) # For beautiful Table 1s
library(Hmisc) # for label()
Before I started cleaning NHATS data, I defined a few helper objects and functions to standardize how I handled missing data codes and factors across the dataset.
-1
, -7
) and their labels.safe_collapse()
that will safely collapse all the special
missing codes into one single βMissingβ level for any
factor variable.# Missing data codes used across NHATS
special_missing_levels <- c("-1", "-7", "-8", "-9")
special_missing_labels <- c("Inapplicable", "Refused", "Donβt know", "Missing")
# Common Yes/No structure across NHATS
yes_no_levels <- c("1", "2", special_missing_levels)
yes_no_labels <- c("Yes", "No", special_missing_labels)
# Convert special missing codes to numeric for case_when logic
special_missing_numeric <- as.numeric(special_missing_levels)
# π Helper: Collapse special missing levels into a single βMissingβ factor level.
safe_collapse <- function(x, missing_levels = special_missing_labels) {
if (!is.factor(x)) {
return(x) # Do not attempt to collapse if it's not a factor
}
valid <- intersect(levels(x), missing_levels)
if (length(valid) > 0) {
x |> fct_collapse(Missing = valid) |> fct_explicit_na("Missing")
} else {
x |> fct_explicit_na("Missing")
}
}
At this stage of the project, I was focused on cleaning up
Round 1 (R1) and Round 5 (R5), which would ultimately provide
the covariates for my analysis.
R1βs role here wasnβt to stand alone β it was really meant to
supplement R5 where R5 was incomplete. More coming soon
here.
Specifically, for R1, I: - Imported the Round 1 SP
file from NHATS.
- Selected only the key variables I knew I would need
for covariates.
- Filtered down to sample-person respondents living at
home, because those are the cases relevant for my
analysis.
- Renamed variables to have clearer, more intuitive names for later
merging.
- Converted household size and income to numeric values.
- Simplified occupation codes into 6 buckets and
converted several fields into factors - Converted
-1 inapplicable values from Section 8
Housing variable as no due to skip logic to reduce
missing data for this variable.
R1_SP <- read_sas("NHATS_Round_1_SP_File.sas7bdat")
R1_SP_clean <- R1_SP |>
# Selecting the variables of interest
select(
spid, is1resptype, r1dresid, r1d2intvrage, r1dgender, el1higstschl,
lf1doccpctgy, lf1occupaton, hp1ownrentot, lf1workfpay, hp1sec8pubsn,
ip1cmedicaid, ew1progneed1, ew1progneed2, ew1progneed3, el1hlthchild,
ia1totinc, hh1dhshldnum, rl1dracehisp
) |>
# Filtering to sample-person responders living at home
filter(is1resptype == 1 & r1dresid == 1) |>
# Renaming variables
rename(
Responder1 = is1resptype,
Residential1 = r1dresid,
Age1 = r1d2intvrage,
Gender1 = r1dgender,
Education1 = el1higstschl,
Occupation1 = lf1doccpctgy,
EverWorked1 = lf1occupaton,
HomeOwnership1 = hp1ownrentot,
HouseholdSize1 = hh1dhshldnum,
HouseholdIncome1 = ia1totinc,
RetirementStatus1 = lf1workfpay,
Section81 = hp1sec8pubsn,
Medicaid1 = ip1cmedicaid,
FoodAssist11 = ew1progneed1,
FoodAssist21 = ew1progneed2,
FoodAssist31 = ew1progneed3,
ChildhoodHealth1 = el1hlthchild,
RaceEthnicity1 = rl1dracehisp
) |>
mutate(
# changing household size and income to numeric variables
HouseholdSize1 = case_when(
as.character(HouseholdSize1) %in% special_missing_levels ~ NA_real_,
TRUE ~ as.numeric(as.character(HouseholdSize1)) # Ensure conversion from potentially factor/character
),
HouseholdIncome1 = case_when(
as.character(HouseholdIncome1) %in% special_missing_levels ~ NA_real_,
TRUE ~ as.numeric(as.character(HouseholdIncome1)) # Ensure conversion from potentially factor/character
),
# Simplifying occupation codes into 6 buckets
Occupation1 = case_when(
Occupation1 %in% 1:11 ~ "1", # Management / Professional
Occupation1 %in% 12:15 ~ "2", # Service
Occupation1 %in% 16:17 ~ "3", # Sales / Office
Occupation1 %in% 18:20 ~ "4", # Construction / Farming
Occupation1 %in% 21:23 ~ "5", # Production
EverWorked1 %in% c(2,3) ~ "6", # Never worked / Homemaker
TRUE ~ NA_character_
),
# Converting variables to labeled factors
Responder1 = factor(Responder1, c("1","2"), c("Sample_Person","Proxy")),
Residential1 = factor(Residential1, c("1","2","3","4","5", special_missing_levels),
c("Home/apartment","Retirement community","Assisted living","Nursing home","Other institution", special_missing_labels)),
Age1 = factor(Age1, c("1","2","3","4","5","6", special_missing_levels),
c("65β69","70β74","75β79","80β84","85β89","90+", special_missing_labels)),
Gender1 = factor(Gender1, c("1","2"), c("Male","Female")),
Education1 = factor(Education1, c("1","2","3","4","5","6","7","8","9", special_missing_levels),
c("No schooling","1β8th grade","9β12 (no diploma)","HS grad",
"Vocational","Some college","Associate","Bachelor","Master/PhD", special_missing_labels)),
Occupation1 = factor(Occupation1, c("1","2","3","4","5","6"),
c("Management/Professional","Service","Sales/Office",
"Construction/Farming","Production","Homemaker")),
HomeOwnership1 = factor(HomeOwnership1, c("1","2","3", special_missing_levels),
c("Own","Rent","Other", special_missing_labels)),
RetirementStatus1 = factor(RetirementStatus1, c("1","2","3", special_missing_levels),
c("Yes","No","Retired", special_missing_labels)),
Section81 = factor(Section81, yes_no_levels, yes_no_labels),
Medicaid1 = factor(Medicaid1, yes_no_levels, yes_no_labels),
FoodAssist11 = factor(FoodAssist11, yes_no_levels, yes_no_labels),
FoodAssist21 = factor(FoodAssist21, yes_no_levels, yes_no_labels),
FoodAssist31 = factor(FoodAssist31, yes_no_levels, yes_no_labels),
ChildhoodHealth1 = factor(ChildhoodHealth1, c("1","2","3","4","5", special_missing_levels),
c("Excellent","Very good","Good","Fair","Poor", special_missing_labels)),
RaceEthnicity1 = factor(RaceEthnicity1)
) |>
# Handling -1s "inapplicable" as "no" for Section 8 housing
mutate(Section81 = fct_recode(Section81, "No" = "Inapplicable"))
After cleaning up Round 1 (R1), I shifted my focus to Round 5 (R5). This was the core dataset I relied on for the covariates used for analysis, since R5 represents a more current snapshot of the NHATS cohort of interest.
I used R1 mainly as a supplement β but R5 needed a full clean on its own first.
Hereβs what I did for R5 (mirroring R1):
- Imported the Round 5 SP file.
- Selected the key variables that would ultimately
serve as covariates in my analysis.
- Filtered the dataset to include only
sample-person respondents living at home (just as I did
for R1).
- Renamed variables for consistency and readability
when merging across rounds.
- Converted household size and income into numeric
format to handle NHATSβ special missing codes.
- Collapsed the occupation categories into simplified
buckets and built out factors for categorical fields like education,
marital status, and Section 8 status.
- Ensured everything was factorized and labeled
consistently for later merging. - Converted -1
inapplicable values from Section 8 Housing
variable as no due to skip logic to reduce missing data for
this variable.
R5_SP <- read_sas("NHATS_Round_5_SP_File_v2.sas7bdat")
R5_SP_clean <- R5_SP |>
# Selecting the variables of interest
select(
spid, is5resptype, r5dresid, r5dcontnew, r5d2intvrage, r5dgender, el5higstschl,
lf5doccpctgy, lf5occupaton, hp5ownrentot, lf5workfpay, hp5sec8pubsn,
ip5cmedicaid, ew5progneed1, ew5progneed2, ew5progneed3, el5hlthchild,
hh5dmarstat, ia5totinc, hh5dhshldnum, fl5newsample, rl5dracehisp
) |>
# Filtering to sample-person responders living at home
filter(is5resptype == 1 & r5dresid == 1) |>
# Renaming variables
rename(
Responder5 = is5resptype,
Residential5 = r5dresid,
NewPerson = r5dcontnew,
Age = r5d2intvrage,
Gender = r5dgender,
Education = el5higstschl,
Occupation = lf5doccpctgy,
EverWorked = lf5occupaton,
HomeOwnership = hp5ownrentot,
HouseholdSize = hh5dhshldnum,
HouseholdIncome = ia5totinc,
RetirementStatus = lf5workfpay,
Section8 = hp5sec8pubsn,
Medicaid = ip5cmedicaid,
FoodAssist1 = ew5progneed1,
FoodAssist2 = ew5progneed2,
FoodAssist3 = ew5progneed3,
ChildhoodHealth = el5hlthchild,
MaritalStatus = hh5dmarstat,
Newsample = fl5newsample,
RaceEthnicity = rl5dracehisp
) |>
mutate(
# changing household size and income to numeric variables
HouseholdSize = case_when(
as.character(HouseholdSize) %in% special_missing_levels ~ NA_real_,
TRUE ~ as.numeric(as.character(HouseholdSize))
),
HouseholdIncome = case_when(
as.character(HouseholdIncome) %in% special_missing_levels ~ NA_real_,
TRUE ~ as.numeric(as.character(HouseholdIncome))
),
# Simplifying occupation codes into 6 buckets
Occupation = case_when(
Occupation %in% 1:11 ~ "1",
Occupation %in% 12:15 ~ "2",
Occupation %in% 16:17 ~ "3",
Occupation %in% 18:20 ~ "4",
Occupation %in% 21:23 ~ "5",
EverWorked %in% c(2,3) ~ "6",
Occupation == -1 ~ "7",
TRUE ~ NA_character_
),
# Converting variables to labeled factors
Responder5 = factor(Responder5, c("1","2"), c("Sample_Person","Proxy")),
Residential5 = factor(Residential5, c("1","2","3","4","5", special_missing_levels),
c("Home/apartment","Retirement community","Assisted living","Nursing home","Other institution", special_missing_labels)),
NewPerson = factor(NewPerson, yes_no_levels, yes_no_labels),
Age = factor(Age, c("1","2","3","4","5","6", special_missing_levels),
c("65β69","70β74","75β79","80β84","85β89","90+", special_missing_labels)),
Gender = factor(Gender, c("1","2"), c("Male","Female")),
Education = factor(Education, c("1","2","3","4","5","6","7","8","9", special_missing_levels),
c("No schooling","1β8th grade","9β12 (no diploma)","HS grad",
"Vocational","Some college","Associate","Bachelor","Master/PhD", special_missing_labels)),
Occupation = factor(Occupation, c("1","2","3","4","5","6","7"),
c("Management/Professional","Service","Sales/Office","Construction/Farming","Production","Homemaker","Not working/retired")),
HomeOwnership = factor(HomeOwnership, c("1","2","3", special_missing_levels),
c("Own","Rent","Other", special_missing_labels)),
RetirementStatus = factor(RetirementStatus, c("1","2","3", special_missing_levels),
c("Yes","No","Retired", special_missing_labels)),
Section8 = factor(Section8, yes_no_levels, yes_no_labels),
Medicaid = factor(Medicaid, yes_no_levels, yes_no_labels),
FoodAssist1 = factor(FoodAssist1, yes_no_levels, yes_no_labels),
FoodAssist2 = factor(FoodAssist2, yes_no_levels, yes_no_labels),
FoodAssist3 = factor(FoodAssist3, yes_no_levels, yes_no_labels),
ChildhoodHealth = factor(ChildhoodHealth, c("1","2","3","4","5", special_missing_levels),
c("Excellent","Very good","Good","Fair","Poor", special_missing_labels)),
MaritalStatus = factor(MaritalStatus, c("1","2","3","4","5","6", special_missing_levels),
c("Married","Living with Partner","Separated","Divorced","Widowed","Never married", special_missing_labels)),
Newsample = factor(Newsample, yes_no_levels, yes_no_labels),
RaceEthnicity = factor(RaceEthnicity)
) |>
# Handling -1s "inapplicable" as "no" for Section 8 housing
mutate(Section8 = fct_recode(Section8, "No" = "Inapplicable"))
Once I had cleaned both R1 and R5, the next step was
to merge them together.
Round 5 serves as the main covariate dataset, but I wanted to
pull in information from Round 1 to fill in any gaps
where R5 data were missing or coded as one of NHATSβs special missing
values (like -1
, -7
, -8
, or
-9
).
Hereβs what I did:
R5_SP_clean
and R1_SP_clean
.left_join()
to merge R1 into
R5, keeping all Round 5 participants (since R5
is my primary round for covariates)."Missing"
category for easier
viewing.# Merge Round 5 with Round 1
R5_merged <- R5_SP_clean |>
left_join(R1_SP_clean, by = "spid") |>
# Fill from R1 if R5 is special missing (-1, -7, -8, -9)
mutate(
Responder_final = if_else(Responder5 %in% special_missing_labels, Responder1, Responder5),
Residential_final = if_else(Residential5 %in% special_missing_labels, Residential1, Residential5),
Age_final = if_else(Age %in% special_missing_labels, Age1, Age),
Gender_final = if_else(Gender %in% special_missing_labels, Gender1, Gender),
Education_final = if_else(Education %in% special_missing_labels, Education1, Education),
Occupation_final = if_else(Occupation %in% special_missing_labels, Occupation1, Occupation),
HomeOwnership_final = if_else(HomeOwnership %in% special_missing_labels, HomeOwnership1, HomeOwnership),
HouseholdSize_final = if_else(HouseholdSize %in% special_missing_labels, HouseholdSize1, HouseholdSize),
HouseholdIncome_final = if_else(HouseholdIncome %in% special_missing_labels, HouseholdIncome1, HouseholdIncome),
RetirementStatus_final = if_else(RetirementStatus %in% special_missing_labels, RetirementStatus1, RetirementStatus),
Section8_final = if_else(Section8 %in% special_missing_labels, Section81, Section8),
Medicaid_final = if_else(Medicaid %in% special_missing_labels, Medicaid1, Medicaid),
FoodAssist1_final = if_else(FoodAssist1 %in% special_missing_labels, FoodAssist11, FoodAssist1),
FoodAssist2_final = if_else(FoodAssist2 %in% special_missing_labels, FoodAssist21, FoodAssist2),
FoodAssist3_final = if_else(FoodAssist3 %in% special_missing_labels, FoodAssist31, FoodAssist3),
ChildhoodHealth_final = if_else(ChildhoodHealth %in% special_missing_labels, ChildhoodHealth1, ChildhoodHealth),
RaceEthnicity_final = if_else(RaceEthnicity %in% special_missing_labels, RaceEthnicity1, RaceEthnicity),
# Include Round 5 only variables
NewPerson_final = NewPerson,
MaritalStatus_final = MaritalStatus,
Newsample_final = Newsample
)
# Collapse missing levels consistently into one βMissingβ level
R5_merged <- R5_merged |>
mutate(across(ends_with("_final"), safe_collapse, .names = "{.col}"))
# Keep only final columns for downstream merges
R5_merged <- R5_merged |>
select(spid, ends_with("_final"))
After merging R1 and R5, I moved on to Round 6
(R6).
This roundprovided the key exposure variable for my
analysis: Financial Strain.
Hereβs how I approached cleaning R6:
ew6mealskip1
became SkippedMeals
).R6_SP <- read_sas("NHATS_Round_6_SP_File_V2.sas7bdat")
R6_SP_clean <- R6_SP |>
# Selecting the variables of interest
select(spid, is6resptype, r6dresid,
ew6mealskip1, ew6nopayhous, ew6nopayutil, ew6nopaymed) |>
#Filtering to sample-person responders living at home
filter(is6resptype == 1 & r6dresid == 1) |>
# Renaming variables
rename(
SkippedMeals = ew6mealskip1,
UnableToPayRent = ew6nopayhous,
UnableToPayUtilities = ew6nopayutil,
UnableToPayMedical = ew6nopaymed
) |>
# Convert all 4 financial strain indicators into labeled factors at once
mutate(
across(c(SkippedMeals, UnableToPayRent, UnableToPayUtilities, UnableToPayMedical),
~ factor(.x, yes_no_levels, yes_no_labels)),
# Creating new Financial Strain Flag variable
FinancialStrainFlag = case_when(
# If ANY of the four is YES, classify as Any Strain
SkippedMeals == "Yes" | UnableToPayRent == "Yes" |
UnableToPayUtilities == "Yes" | UnableToPayMedical == "Yes" ~ "Any Strain",
# If ALL FOUR are explicitly NO, classify as No Strain
SkippedMeals == "No" & UnableToPayRent == "No" &
UnableToPayUtilities == "No" & UnableToPayMedical == "No" ~ "No Strain",
# If we get here, at least one answer is missing/inapplicable/refused
TRUE ~ "Missing"
) |>
# Treat Financialstrainflag as a factor variable with three levls
factor(c("No Strain", "Any Strain", "Missing"))
)
Next, I turned to Roundβ―7 (R7), used to create the dementia classification outcome in the analysis.
R7 contained both: - A clinicianβverified dementia diagnosis field. - Multiple cognitive test results across three domains: memory, orientation, and executive function.
Hereβs what I did step by step:
cg7dwrdimmrc
became MemoryImmediate
).dementia_class
) using this logic:
R7_SP <- read_sas("NHATS_Round_7_SP_File.sas7bdat")
R7_SP_clean <- R7_SP |>
# Select variables of interest
select(spid, is7resptype, r7dresid,
hc7disescn9, cg7dwrdimmrc, cg7dwrddlyrc,
cg7todaydat1, cg7todaydat2, cg7todaydat3, cg7todaydat4,
cg7presidna1, cg7presidna3, cg7vpname1, cg7vpname3,
cg7dclkdraw) |>
# Filter to only community dwelling respondents, but did not filter out proxy respondents this time.
filter(r7dresid == 1) |>
# Rename variables
rename(
DementiaDx = hc7disescn9, # Clinician-verified dementia diagnosis
MemoryImmediate = cg7dwrdimmrc, # Immediate recall
MemoryDelayed = cg7dwrddlyrc, # Delayed recall
ExecutiveDraw = cg7dclkdraw # Clock draw
) |>
mutate(
# Calculate π§ MEMORY SCORE: Immediate + Delayed Recall
# If either is missing, we assign 0 points for that part (as NHATS coding convention).
memory_score = if_else(MemoryImmediate >= 0, MemoryImmediate, 0) +
if_else(MemoryDelayed >= 0, MemoryDelayed, 0),
memory_impaired = memory_score <= 3, # Binary: <=3 points means impaired
# Calculate π§ ORIENTATION SCORE: Recode correct/incorrect items (1=correct, 2=incorrect)
across(c(cg7todaydat1, cg7todaydat2, cg7todaydat3, cg7todaydat4,
cg7presidna1, cg7presidna3, cg7vpname1, cg7vpname3),
~ case_when(.x == 2 ~ 1, # Incorrect = 1 point (impaired)
.x == 1 ~ 0, # Correct = 0 points
TRUE ~ NA_real_),
.names = "{.col}_rec"),
orientation_score = rowSums(across(ends_with("_rec")), na.rm = TRUE),
orientation_impaired = orientation_score >= 5, # β₯5 wrong answers = impaired
# Calculate π° EXECUTIVE FUNCTION: Clock draw task
exec_impaired = ExecutiveDraw %in% c(0, 1), # 0/1 scores indicate impairment
# π· Count how many domains (memory/orientation/executive) are impaired
impaired_domains = rowSums(across(c(memory_impaired, orientation_impaired, exec_impaired)),
na.rm = TRUE),
# Classify dementia
dementia_class = case_when(
# 1οΈβ£ **Probable Dementia**: If the NHATS clinician diagnosis says so, trust it
DementiaDx == 1 ~ "Probable Dementia",
# 2οΈβ£ **Missing**: If DementiaDx is missing/refused/inapplicable (-1, -7, -8, -9)
DementiaDx %in% special_missing_numeric ~ "Missing",
# 3οΈβ£ **Possible Dementia**: No formal diagnosis, but β₯2 impaired domains
DementiaDx != 1 & impaired_domains >= 2 ~ "Possible Dementia",
# 4οΈβ£ **No Dementia**: Explicit diagnosis of βNo dementiaβ and all three domains are fully observed (none missing) AND <2 impaired domains
DementiaDx == 2 & impaired_domains < 2 &
!is.na(memory_impaired) & !is.na(orientation_impaired) & !is.na(exec_impaired) ~ "No Dementia",
# 5οΈβ£ **Missing**: Anything else (e.g., partial data, canβt confidently classify)
TRUE ~ "Missing"
# Treat Dementia class as a factor variable with four levls
) |> factor(c("No Dementia", "Possible Dementia", "Probable Dementia", "Missing"))
)
With all four rounds cleaned, I moved on to the final
merge step.
This was where I brought everything together into one
unified dataset for analysis.
Hereβs what I did:
Then I created CleanData. This step was more than just removing missing values β it also filtered out people who couldnβt contribute to the analysis.
"Missing"
or set to true missing (NA
), they
were removed."Missing"
or
NA
, they were removed too.Importantly, this means that participants without Roundβ―7
data (and therefore no dementia classification at all) were
also excluded, since they wouldnβt meet the requirement of having a
non-missing dementia_class
.
This filtering step ensured that CleanData only included
participants with both: - A valid financial strain
classification from R6, and
- A valid dementia classification from R7.
# FullData keeps everyone from R6, even if FinancialStrainFlag or dementia_class = "Missing"
FullData <- R6_SP_clean |>
left_join(R5_merged, by = "spid") |>
left_join(R7_SP_clean, by = "spid")
# CleanData removes people missing either the exposure (FinancialStrainFlag) or outcome (dementia_class)
CleanData <- FullData |>
filter(
!is.na(FinancialStrainFlag), # drop rows that are NA (true missing)
!is.na(dementia_class), # drop rows that are NA (true missing)
FinancialStrainFlag != "Missing", # drop coded "Missing" exposures
dementia_class != "Missing" # drop coded "Missing" outcomes
)
Before doing any analysis, I skimmed both FullData and CleanData to understand the pattern of missing values and where they come from.
skim(FullData)
Name | FullData |
Number of rows | 5628 |
Number of columns | 57 |
_______________________ | |
Column type frequency: | |
factor | 24 |
logical | 3 |
numeric | 30 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
SkippedMeals | 0 | 1.00 | FALSE | 4 | No: 5522, Yes: 58, Don: 37, Ref: 11 |
UnableToPayRent | 0 | 1.00 | FALSE | 4 | No: 5457, Yes: 111, Don: 41, Ref: 19 |
UnableToPayUtilities | 0 | 1.00 | FALSE | 4 | No: 5388, Yes: 182, Don: 42, Ref: 16 |
UnableToPayMedical | 0 | 1.00 | FALSE | 4 | No: 5389, Yes: 181, Don: 42, Ref: 16 |
FinancialStrainFlag | 0 | 1.00 | FALSE | 3 | No : 5219, Any: 343, Mis: 66 |
Responder_final | 45 | 0.99 | FALSE | 1 | Sam: 5583, Pro: 0 |
Residential_final | 45 | 0.99 | FALSE | 1 | Hom: 5583, Ret: 0, Ass: 0, Nur: 0 |
Age_final | 45 | 0.99 | FALSE | 6 | 70β: 1468, 75β: 1271, 80β: 1009, 65β: 856 |
Gender_final | 45 | 0.99 | FALSE | 2 | Fem: 3190, Mal: 2393 |
Education_final | 45 | 0.99 | FALSE | 10 | HS : 1450, Som: 800, Bac: 743, Mas: 722 |
Occupation_final | 45 | 0.99 | FALSE | 8 | Not: 2736, Man: 1017, Sal: 541, Pro: 458 |
HomeOwnership_final | 45 | 0.99 | FALSE | 4 | Own: 4178, Ren: 872, Oth: 426, Mis: 107 |
RetirementStatus_final | 45 | 0.99 | FALSE | 4 | Ret: 2645, No: 2081, Yes: 754, Mis: 103 |
Section8_final | 45 | 0.99 | FALSE | 3 | No: 5307, Yes: 263, Mis: 13 |
Medicaid_final | 45 | 0.99 | FALSE | 3 | No: 4746, Yes: 699, Mis: 138 |
FoodAssist1_final | 45 | 0.99 | FALSE | 3 | No: 5011, Yes: 457, Mis: 115 |
FoodAssist2_final | 45 | 0.99 | FALSE | 3 | No: 5318, Yes: 149, Mis: 116 |
FoodAssist3_final | 45 | 0.99 | FALSE | 3 | No: 5123, Yes: 343, Mis: 117 |
ChildhoodHealth_final | 45 | 0.99 | FALSE | 6 | Exc: 2685, Ver: 1481, Goo: 920, Fai: 275 |
RaceEthnicity_final | 45 | 0.99 | FALSE | 6 | 1: 3898, 2: 1136, 4: 314, 3: 132 |
NewPerson_final | 45 | 0.99 | FALSE | 2 | No: 2847, Yes: 2736, Mis: 0 |
MaritalStatus_final | 45 | 0.99 | FALSE | 7 | Mar: 2745, Wid: 1702, Div: 691, Nev: 209 |
Newsample_final | 45 | 0.99 | FALSE | 2 | Yes: 2847, Mis: 2736, No: 0 |
dementia_class | 705 | 0.87 | FALSE | 4 | No : 4506, Pos: 202, Mis: 130, Pro: 85 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
memory_impaired | 705 | 0.87 | 0.11 | FAL: 4384, TRU: 539 |
orientation_impaired | 705 | 0.87 | 0.07 | FAL: 4603, TRU: 320 |
exec_impaired | 705 | 0.87 | 0.04 | FAL: 4741, TRU: 182 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
spid | 0 | 1.00 | 15095468.64 | 4998327.09 | 1e+07 | 10006283 | 20000129 | 20003671 | 20007119 | βββββ |
is6resptype | 0 | 1.00 | 1.00 | 0.00 | 1e+00 | 1 | 1 | 1 | 1 | βββββ |
r6dresid | 0 | 1.00 | 1.00 | 0.00 | 1e+00 | 1 | 1 | 1 | 1 | βββββ |
HouseholdSize_final | 45 | 0.99 | 1.99 | 1.06 | 1e+00 | 1 | 2 | 2 | 11 | βββββ |
HouseholdIncome_final | 2317 | 0.59 | 67394.31 | 477048.95 | 0e+00 | 19000 | 35000 | 67000 | 25000000 | βββββ |
is7resptype | 705 | 0.87 | 1.01 | 0.12 | 1e+00 | 1 | 1 | 1 | 2 | βββββ |
r7dresid | 705 | 0.87 | 1.00 | 0.00 | 1e+00 | 1 | 1 | 1 | 1 | βββββ |
DementiaDx | 705 | 0.87 | 2.16 | 0.95 | -8e+00 | 2 | 2 | 2 | 7 | βββββ |
MemoryImmediate | 705 | 0.87 | 4.74 | 2.06 | -7e+00 | 4 | 5 | 6 | 10 | βββββ |
MemoryDelayed | 705 | 0.87 | 3.45 | 2.31 | -7e+00 | 2 | 4 | 5 | 9 | βββ ββ |
cg7todaydat1 | 705 | 0.87 | 1.04 | 0.28 | -1e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7todaydat2 | 705 | 0.87 | 1.23 | 0.46 | -1e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7todaydat3 | 705 | 0.87 | 1.05 | 0.29 | -1e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7todaydat4 | 705 | 0.87 | 1.04 | 0.27 | -1e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7presidna1 | 705 | 0.87 | 1.04 | 0.41 | -7e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7presidna3 | 705 | 0.87 | 1.14 | 0.50 | -7e+00 | 1 | 1 | 1 | 2 | βββββ |
cg7vpname1 | 705 | 0.87 | 1.38 | 0.57 | -7e+00 | 1 | 1 | 2 | 2 | βββββ |
cg7vpname3 | 705 | 0.87 | 1.69 | 0.56 | -7e+00 | 1 | 2 | 2 | 2 | βββββ |
ExecutiveDraw | 705 | 0.87 | 3.65 | 1.61 | -9e+00 | 3 | 4 | 5 | 5 | βββββ |
memory_score | 705 | 0.87 | 8.32 | 3.67 | 0e+00 | 6 | 9 | 11 | 19 | βββββ |
cg7todaydat1_rec | 734 | 0.87 | 0.06 | 0.23 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7todaydat2_rec | 734 | 0.87 | 0.24 | 0.43 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7todaydat3_rec | 734 | 0.87 | 0.06 | 0.24 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7todaydat4_rec | 734 | 0.87 | 0.05 | 0.23 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7presidna1_rec | 752 | 0.87 | 0.07 | 0.25 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7presidna3_rec | 752 | 0.87 | 0.17 | 0.37 | 0e+00 | 0 | 0 | 0 | 1 | βββββ |
cg7vpname1_rec | 750 | 0.87 | 0.40 | 0.49 | 0e+00 | 0 | 0 | 1 | 1 | βββββ |
cg7vpname3_rec | 750 | 0.87 | 0.72 | 0.45 | 0e+00 | 0 | 1 | 1 | 1 | βββββ |
orientation_score | 705 | 0.87 | 1.76 | 1.63 | 0e+00 | 1 | 1 | 2 | 8 | βββββ |
impaired_domains | 705 | 0.87 | 0.21 | 0.54 | 0e+00 | 0 | 0 | 0 | 3 | βββββ |
705 missing values for many Roundβ―7 variables
(e.g., dementia_class
, MemoryImmediate
,
MemoryDelayed
):
These are people from Roundβ―6 who never had a Roundβ―7 interview
at all (not surveyed, not alive, or otherwise not present in
R7).
Because R7_SP_clean
was left-joined to R6,
those rows remain, but every R7 variable is NA
.
45 missing values for many R1/R5-derived
covariates (e.g., Education_final
,
Occupation_final
):
These are participants in R6 who werenβt present in R1 or
R5.
The left_join()
kept them, but all R1/R5 variables stayed
blank (NA
).
2,317 missing values for
HouseholdIncome_final
: *******************I NEED
TO FIX THIS *************************** This isnβt from missing
interviews β itβs from NHATS survey coding.
Some participants refused to answer, didnβt
know, or werenβt asked, and those responses
are collapsed to NA
.
FullData is designed this way β it keeps everyone from Roundβ―6 so nothing is lost prematurely, which means we expect a lot of structural missingness from other rounds.
Now letβs have a look at the βCleandataβ set
skim(CleanData)
Name | CleanData |
Number of rows | 4756 |
Number of columns | 57 |
_______________________ | |
Column type frequency: | |
factor | 24 |
logical | 3 |
numeric | 30 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
SkippedMeals | 0 | 1.00 | FALSE | 2 | No: 4709, Yes: 47, Ina: 0, Ref: 0 |
UnableToPayRent | 0 | 1.00 | FALSE | 2 | No: 4663, Yes: 93, Ina: 0, Ref: 0 |
UnableToPayUtilities | 0 | 1.00 | FALSE | 3 | No: 4601, Yes: 154, Ref: 1, Ina: 0 |
UnableToPayMedical | 0 | 1.00 | FALSE | 2 | No: 4604, Yes: 152, Ina: 0, Ref: 0 |
FinancialStrainFlag | 0 | 1.00 | FALSE | 2 | No : 4469, Any: 287, Mis: 0 |
Responder_final | 30 | 0.99 | FALSE | 1 | Sam: 4726, Pro: 0 |
Residential_final | 30 | 0.99 | FALSE | 1 | Hom: 4726, Ret: 0, Ass: 0, Nur: 0 |
Age_final | 30 | 0.99 | FALSE | 6 | 70β: 1305, 75β: 1102, 80β: 842, 65β: 754 |
Gender_final | 30 | 0.99 | FALSE | 2 | Fem: 2701, Mal: 2025 |
Education_final | 30 | 0.99 | FALSE | 10 | HS : 1208, Som: 697, Bac: 636, Mas: 636 |
Occupation_final | 30 | 0.99 | FALSE | 8 | Not: 2314, Man: 872, Sal: 463, Pro: 384 |
HomeOwnership_final | 30 | 0.99 | FALSE | 4 | Own: 3597, Ren: 712, Oth: 342, Mis: 75 |
RetirementStatus_final | 30 | 0.99 | FALSE | 4 | Ret: 2218, No: 1749, Yes: 686, Mis: 73 |
Section8_final | 30 | 0.99 | FALSE | 3 | No: 4503, Yes: 215, Mis: 8 |
Medicaid_final | 30 | 0.99 | FALSE | 3 | No: 4050, Yes: 576, Mis: 100 |
FoodAssist1_final | 30 | 0.99 | FALSE | 3 | No: 4254, Yes: 392, Mis: 80 |
FoodAssist2_final | 30 | 0.99 | FALSE | 3 | No: 4530, Yes: 115, Mis: 81 |
FoodAssist3_final | 30 | 0.99 | FALSE | 3 | No: 4347, Yes: 296, Mis: 83 |
ChildhoodHealth_final | 30 | 0.99 | FALSE | 6 | Exc: 2296, Ver: 1266, Goo: 768, Fai: 227 |
RaceEthnicity_final | 30 | 0.99 | FALSE | 6 | 1: 3303, 2: 987, 4: 255, 3: 109 |
NewPerson_final | 30 | 0.99 | FALSE | 2 | No: 2412, Yes: 2314, Mis: 0 |
MaritalStatus_final | 30 | 0.99 | FALSE | 6 | Mar: 2364, Wid: 1387, Div: 588, Nev: 189 |
Newsample_final | 30 | 0.99 | FALSE | 2 | Yes: 2412, Mis: 2314, No: 0 |
dementia_class | 0 | 1.00 | FALSE | 3 | No : 4473, Pos: 199, Pro: 84, Mis: 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
memory_impaired | 0 | 1 | 0.10 | FAL: 4261, TRU: 495 |
orientation_impaired | 0 | 1 | 0.06 | FAL: 4448, TRU: 308 |
exec_impaired | 0 | 1 | 0.04 | FAL: 4581, TRU: 175 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
spid | 0 | 1.00 | 15103687.19 | 4998232.90 | 1e+07 | 10006363.2 | 20000129 | 20003683 | 20007119 | βββββ |
is6resptype | 0 | 1.00 | 1.00 | 0.00 | 1e+00 | 1.0 | 1 | 1 | 1 | βββββ |
r6dresid | 0 | 1.00 | 1.00 | 0.00 | 1e+00 | 1.0 | 1 | 1 | 1 | βββββ |
HouseholdSize_final | 30 | 0.99 | 1.98 | 1.05 | 1e+00 | 1.0 | 2 | 2 | 11 | βββββ |
HouseholdIncome_final | 1885 | 0.60 | 70141.54 | 511528.88 | 0e+00 | 19638.5 | 36000 | 70000 | 25000000 | βββββ |
is7resptype | 0 | 1.00 | 1.01 | 0.10 | 1e+00 | 1.0 | 1 | 1 | 2 | βββββ |
r7dresid | 0 | 1.00 | 1.00 | 0.00 | 1e+00 | 1.0 | 1 | 1 | 1 | βββββ |
DementiaDx | 0 | 1.00 | 2.03 | 0.50 | 1e+00 | 2.0 | 2 | 2 | 7 | βββββ |
MemoryImmediate | 0 | 1.00 | 4.79 | 2.01 | -7e+00 | 4.0 | 5 | 6 | 10 | βββββ |
MemoryDelayed | 0 | 1.00 | 3.50 | 2.29 | -7e+00 | 2.0 | 4 | 5 | 9 | βββ ββ |
cg7todaydat1 | 0 | 1.00 | 1.04 | 0.26 | -1e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7todaydat2 | 0 | 1.00 | 1.23 | 0.45 | -1e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7todaydat3 | 0 | 1.00 | 1.05 | 0.28 | -1e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7todaydat4 | 0 | 1.00 | 1.04 | 0.26 | -1e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7presidna1 | 0 | 1.00 | 1.05 | 0.37 | -7e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7presidna3 | 0 | 1.00 | 1.15 | 0.47 | -7e+00 | 1.0 | 1 | 1 | 2 | βββββ |
cg7vpname1 | 0 | 1.00 | 1.38 | 0.55 | -7e+00 | 1.0 | 1 | 2 | 2 | βββββ |
cg7vpname3 | 0 | 1.00 | 1.69 | 0.52 | -7e+00 | 1.0 | 2 | 2 | 2 | βββββ |
ExecutiveDraw | 0 | 1.00 | 3.70 | 1.48 | -9e+00 | 3.0 | 4 | 5 | 5 | βββββ |
memory_score | 0 | 1.00 | 8.41 | 3.64 | 0e+00 | 6.0 | 9 | 11 | 19 | ββ βββ |
cg7todaydat1_rec | 22 | 1.00 | 0.05 | 0.22 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7todaydat2_rec | 22 | 1.00 | 0.24 | 0.43 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7todaydat3_rec | 22 | 1.00 | 0.06 | 0.24 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7todaydat4_rec | 22 | 1.00 | 0.05 | 0.22 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7presidna1_rec | 29 | 0.99 | 0.07 | 0.25 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7presidna3_rec | 29 | 0.99 | 0.17 | 0.37 | 0e+00 | 0.0 | 0 | 0 | 1 | βββββ |
cg7vpname1_rec | 27 | 0.99 | 0.40 | 0.49 | 0e+00 | 0.0 | 0 | 1 | 1 | βββββ |
cg7vpname3_rec | 27 | 0.99 | 0.71 | 0.45 | 0e+00 | 0.0 | 1 | 1 | 1 | βββββ |
orientation_score | 0 | 1.00 | 1.73 | 1.62 | 0e+00 | 1.0 | 1 | 2 | 8 | βββββ |
impaired_domains | 0 | 1.00 | 0.21 | 0.54 | 0e+00 | 0.0 | 0 | 0 | 3 | βββββ |
All participants missing the exposure
(FinancialStrainFlag
) or outcome
(dementia_class
) were removed.
Thatβs why there are 0 missing for those two key variables
here.
Only ~30 missing values remain for R1/R5-derived
covariates:
These are likely the same participants who had no data in R1 or
R5.
They stayed because they had R6 exposure and R7 outcome data, but we
couldnβt backfill their covariates.
22β29 missing values on the _rec
cognition items
(e.g., cg7todaydat1_rec
):
These are mostly due to item-level missingness in R7.
Some respondents skipped or refused certain test items, and NHATS coded
those as special missing values (e.g., -7
,
-8
), which you collapsed to NA
.
1,885 missing for HouseholdIncome_final
: ********* I
NEED TO FIX THIS ********************* Same story as in FullData β this
is a survey artifact (refusals, βdonβt know,β or skipped
questions).
CleanData dramatically reduced missingness in the key analytic variables (exposure and outcome) but kept βreal-worldβ missingness for some covariates and individual test items. This is expected in a complex survey merge and doesnβt need to be over-cleaned β itβs just important to explain.
To summarize the characteristics of my dataset, I created a
TableΒ 1 for Cleandata
using the
arsenal
package.
I wanted this table to compare participants by Financial Strain Flag across key demographic and health variables. Before building the table, I:
Cleandata
so
the table would display readable column names instead of raw variable
names.Gender_final
, Education_final
) and
which should remain numeric (e.g., HouseholdSize_final
,
HouseholdIncome_final
).Finally, I ran tableby()
to generate the table, and then
used summary()
## π Building Table 1 with Arsenal (Hmisc Label Method)
# Step 1: Assign pretty labels to all variables in CleanData
label(CleanData$FinancialStrainFlag) <- "Financial Strain Flag"
label(CleanData$dementia_class) <- "Dementia Classification"
label(CleanData$Age_final) <- "Age Group"
label(CleanData$Gender_final) <- "Gender"
label(CleanData$Education_final) <- "Education Level"
label(CleanData$Occupation_final) <- "Occupation Category"
label(CleanData$HomeOwnership_final) <- "Home Ownership"
label(CleanData$HouseholdSize_final) <- "Household Size"
label(CleanData$HouseholdIncome_final) <- "Household Income"
label(CleanData$Section8_final) <- "Receives Section 8"
label(CleanData$Medicaid_final) <- "Receives Medicaid"
label(CleanData$ChildhoodHealth_final) <- "Childhood Health Status"
label(CleanData$RaceEthnicity_final) <- "Race and Ethnicity"
label(CleanData$MaritalStatus_final) <- "Marital Status"
# Step 2: Build Table 1 comparing participants by Financial Strain Flag
tab1 <- tableby(
FinancialStrainFlag ~
dementia_class +
Age_final +
Gender_final +
Education_final +
Occupation_final +
HomeOwnership_final +
HouseholdSize_final +
HouseholdIncome_final +
Section8_final +
Medicaid_final +
ChildhoodHealth_final +
RaceEthnicity_final +
MaritalStatus_final,
data = CleanData,
numeric.stats = c("median", "sd")
)
# Step 3: Output summary (labels will now appear automatically)
summary(tab1, text = TRUE)
No Strain (N=4469) | Any Strain (N=287) | Total (N=4756) | p value | |
---|---|---|---|---|
Dementia Classification | ||||
- No Dementia | 4218 (94.4%) | 255 (88.9%) | 4473 (94.0%) | |
- Possible Dementia | 174 (3.9%) | 25 (8.7%) | 199 (4.2%) | |
- Probable Dementia | 77 (1.7%) | 7 (2.4%) | 84 (1.8%) | |
- Missing | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |
Age Group | ||||
- N-Miss | 27 | 3 | 30 | |
- 65β69 | 691 (15.6%) | 63 (22.2%) | 754 (16.0%) | |
- 70β74 | 1209 (27.2%) | 96 (33.8%) | 1305 (27.6%) | |
- 75β79 | 1035 (23.3%) | 67 (23.6%) | 1102 (23.3%) | |
- 80β84 | 803 (18.1%) | 39 (13.7%) | 842 (17.8%) | |
- 85β89 | 495 (11.1%) | 15 (5.3%) | 510 (10.8%) | |
- 90+ | 209 (4.7%) | 4 (1.4%) | 213 (4.5%) | |
- Missing | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) | |
Gender | 0.039 | |||
- N-Miss | 27 | 3 | 30 | |
- Male | 1920 (43.2%) | 105 (37.0%) | 2025 (42.8%) | |
- Female | 2522 (56.8%) | 179 (63.0%) | 2701 (57.2%) | |
Education Level | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- No schooling | 16 (0.4%) | 2 (0.7%) | 18 (0.4%) | |
- 1β8th grade | 304 (6.8%) | 49 (17.3%) | 353 (7.5%) | |
- 9β12 (no diploma) | 455 (10.2%) | 55 (19.4%) | 510 (10.8%) | |
- HS grad | 1139 (25.6%) | 69 (24.3%) | 1208 (25.6%) | |
- Vocational | 320 (7.2%) | 18 (6.3%) | 338 (7.2%) | |
- Some college | 665 (15.0%) | 32 (11.3%) | 697 (14.7%) | |
- Associate | 224 (5.0%) | 19 (6.7%) | 243 (5.1%) | |
- Bachelor | 613 (13.8%) | 23 (8.1%) | 636 (13.5%) | |
- Master/PhD | 623 (14.0%) | 13 (4.6%) | 636 (13.5%) | |
- Missing | 83 (1.9%) | 4 (1.4%) | 87 (1.8%) | |
Occupation Category | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- Management/Professional | 832 (18.7%) | 40 (14.1%) | 872 (18.5%) | |
- Service | 175 (3.9%) | 27 (9.5%) | 202 (4.3%) | |
- Sales/Office | 433 (9.7%) | 30 (10.6%) | 463 (9.8%) | |
- Construction/Farming | 210 (4.7%) | 16 (5.6%) | 226 (4.8%) | |
- Production | 352 (7.9%) | 32 (11.3%) | 384 (8.1%) | |
- Homemaker | 127 (2.9%) | 7 (2.5%) | 134 (2.8%) | |
- Not working/retired | 2190 (49.3%) | 124 (43.7%) | 2314 (49.0%) | |
- Missing | 123 (2.8%) | 8 (2.8%) | 131 (2.8%) | |
Home Ownership | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- Own | 3448 (77.6%) | 149 (52.5%) | 3597 (76.1%) | |
- Rent | 605 (13.6%) | 107 (37.7%) | 712 (15.1%) | |
- Other | 319 (7.2%) | 23 (8.1%) | 342 (7.2%) | |
- Missing | 70 (1.6%) | 5 (1.8%) | 75 (1.6%) | |
Household Size | 0.077 | |||
- Median | 2.000 | 2.000 | 2.000 | |
- SD | 1.030 | 1.365 | 1.053 | |
Household Income | 0.188 | |||
- Median | 40000.000 | 15000.000 | 36000.000 | |
- SD | 528553.118 | 26876.641 | 511528.883 | |
Receives Section 8 | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- Yes | 180 (4.1%) | 35 (12.3%) | 215 (4.5%) | |
- No | 4256 (95.8%) | 247 (87.0%) | 4503 (95.3%) | |
- Missing | 6 (0.1%) | 2 (0.7%) | 8 (0.2%) | |
Receives Medicaid | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- Yes | 488 (11.0%) | 88 (31.0%) | 576 (12.2%) | |
- No | 3864 (87.0%) | 186 (65.5%) | 4050 (85.7%) | |
- Missing | 90 (2.0%) | 10 (3.5%) | 100 (2.1%) | |
Childhood Health Status | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- Excellent | 2181 (49.1%) | 115 (40.5%) | 2296 (48.6%) | |
- Very good | 1193 (26.9%) | 73 (25.7%) | 1266 (26.8%) | |
- Good | 708 (15.9%) | 60 (21.1%) | 768 (16.3%) | |
- Fair | 207 (4.7%) | 20 (7.0%) | 227 (4.8%) | |
- Poor | 72 (1.6%) | 14 (4.9%) | 86 (1.8%) | |
- Missing | 81 (1.8%) | 2 (0.7%) | 83 (1.8%) | |
Race and Ethnicity | < 0.001 | |||
- N-Miss | 27 | 3 | 30 | |
- 1 | 3209 (72.2%) | 94 (33.1%) | 3303 (69.9%) | |
- 2 | 851 (19.2%) | 136 (47.9%) | 987 (20.9%) | |
- 3 | 94 (2.1%) | 15 (5.3%) | 109 (2.3%) | |
- 4 | 223 (5.0%) | 32 (11.3%) | 255 (5.4%) | |
- 5 | 3 (0.1%) | 1 (0.4%) | 4 (0.1%) | |
- 6 | 62 (1.4%) | 6 (2.1%) | 68 (1.4%) | |
Marital Status | ||||
- N-Miss | 27 | 3 | 30 | |
- Married | 2275 (51.2%) | 89 (31.3%) | 2364 (50.0%) | |
- Living with Partner | 104 (2.3%) | 9 (3.2%) | 113 (2.4%) | |
- Separated | 69 (1.6%) | 16 (5.6%) | 85 (1.8%) | |
- Divorced | 529 (11.9%) | 59 (20.8%) | 588 (12.4%) | |
- Widowed | 1297 (29.2%) | 90 (31.7%) | 1387 (29.3%) | |
- Never married | 168 (3.8%) | 21 (7.4%) | 189 (4.0%) | |
- Missing | 0 (0.0%) | 0 (0.0%) | 0 (0.0%) |