Full Dataset Imputation
In this analysis, I solve for missing data at the census tract level to ensure that all tracts can be included in the final composite index.
Public pre-K enrollment initially had suppressed values in several census tracts. Rather than excluding those tracts, I used a multiple imputation approach to estimate plausible values based on relationships with other socioeconomic indicators in the dataset (such as poverty, income, health outcomes, and housing conditions).
Because a small number of other index indicators also contained limited missing values (each affecting fewer than 5% of tracts), I implemented a full joint imputation model. This approach allows the variables to inform one another during estimation, preserving the overall structure of the data while filling in missing entries.
Predictive Mean Matching (PMM) was used for imputation. This method replaces missing values with observed values from similar census tracts, ensuring that imputed estimates remain realistic and within the observed range of the data.
After imputation:
All index indicators were complete across 249 census tracts.
Observed values were unchanged.
The overall distribution and correlation structure of the dataset were preserved, based on validation checks towards end of qmd
This process allows the composite index to be calculated using a complete and internally consistent dataset while minimizing bias from listwise deletion.
Note: In this code, the workflow is a little off. First initial chunks focused only on prek imputation. Refer to table of contents for full dataset imputation. Only keeping irrelevant code for notes and process documentation.
I reviewed KWs code: https://rpubs.com/kaitlan/1396846
I asked her to use CS code as guide: https://rpubs.com/corey_sparks/883799
Other resources: https://stats.oarc.ucla.edu/r/faq/how-do-i-perform-multiple-imputation-using-predictive-mean-matching-in-r/
https://stefvanbuuren.name/fimd/sec-pmm.html
Import and Join
I brought together KW’s master dataframe with my LE imputed dataset
library (readr)
library (dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library (stringr)
data_dir <- "C:/Users/rayo-garza/OneDrive - Center for Public Policy Priorities/Documents/Data 2025"
master_path <- file.path (data_dir, "Compiled_Data_Austin_LTD_FULL_02.06 (1).csv" )
imputed_path <- file.path (data_dir, "Final_Life_Expectancy_Imputed_2020.csv" )
#force IDs to character so leading zeros are safe
master <- read_csv (master_path, col_types = cols (.default = col_guess (),
GEOID_clean = col_character ()))
imputed <- read_csv (imputed_path, col_types = cols (.default = col_guess (),
GEOID = col_character (),
AFFGEOID = col_character (),
TRACTCE = col_character (),
COUNTYFP = col_character (),
STATEFP = col_character ()))
#clean/standardize GEOIDs to 11-digit tract GEOID (SSCCCTTTTTT) ---
clean_geoid11 <- function (x) {
x %>%
str_trim () %>%
str_replace_all ("[^0-9]" , "" ) %>% # removes "1400000US"
str_sub (- 11 ) %>% # keep last 11 digits if longer
str_pad (11 , side = "left" , pad = "0" )
}
master <- master %>%
mutate (GEOID_clean = clean_geoid11 (GEOID_clean))
imputed <- imputed %>%
mutate (GEOID11 = clean_geoid11 (GEOID))
#checks: duplicates in the imputed file
dups <- imputed %>%
count (GEOID11) %>%
filter (n > 1 )
if (nrow (dups) > 0 ) {
print (dups)
stop ("Duplicate GEOIDs found in imputed file. Resolve before joining." )
}
#Join
master2 <- master %>%
left_join (
imputed %>%
select (GEOID11, le_2020_normalized, imputed_flag, data_coverage) %>%
rename (
LE_imputed_2020 = le_2020_normalized,
LE_imputed_flag = imputed_flag,
LE_data_coverage = data_coverage
),
by = c ("GEOID_clean" = "GEOID11" )
)
#match rate + how many got LE values
qa <- master2 %>%
summarise (
n_master = n (),
n_with_imputed_LE = sum (! is.na (LE_imputed_2020)),
n_missing_imputed_LE = sum (is.na (LE_imputed_2020)),
match_rate = n_with_imputed_LE / n_master
)
print (qa)
# A tibble: 1 × 4
n_master n_with_imputed_LE n_missing_imputed_LE match_rate
<int> <int> <int> <dbl>
1 250 249 1 0.996
#unmatched tracts
unmatched <- master2 %>%
filter (is.na (LE_imputed_2020)) %>%
select (GEOID_clean, ` Tract Full Name ` ) %>%
head (20 )
print (unmatched)
# A tibble: 1 × 2
GEOID_clean `Tract Full Name`
<chr> <chr>
1 48453980000 Census Tract 9800; Travis County; Texas
#remove airport
master2 <- master2 %>%
filter (GEOID_clean != "48453980000" )
out_path <- file.path (data_dir, "master_with_imputed_LE.csv" )
write_csv (master2, out_path)
#are there other special use tracts?
master2 <- master2 %>%
mutate (
tract_code = str_sub (GEOID_clean, - 6 ) # last 6 digits
)
special_use <- master2 %>%
filter (str_detect (tract_code, "^(98|99)" )) %>%
select (GEOID_clean, ` Tract Full Name ` , tract_code)
special_use
# A tibble: 0 × 3
# ℹ 3 variables: GEOID_clean <chr>, Tract Full Name <chr>, tract_code <chr>
(Obselete)
The below code was when I was only imputing for prek
# library(mice)
# library(dplyr)
#
#
# #create the MICE dataset with the outcome + predictors (different from KW in that I didn't use the whole indicator list for prediction, just select)
# mice_vars <- c(
# "public_preK_est",
# "est_belowpov_pct",
# "est_med_hh_inc",
# "pct_female_hh_kids",
# "pct_hh_with_kids",
# "Total Population",
# "est_uninsured_rate",
# "est_no_internet_perc"
# )
#
# mice_data <- master2 %>%
# select(all_of(mice_vars))
#
# # (Optional) Make sure blanks aren't hiding as empty strings (usually not in CSV, but safe)
# mice_data[mice_data == ""] <- NA
#
# pred <- make.predictorMatrix(mice_data)
# pred[,] <- 0
# pred["public_preK_est", setdiff(colnames(mice_data), "public_preK_est")] <- 1
#
# meth <- make.method(mice_data)
# meth[] <- ""
# meth["public_preK_est"] <- "pmm"
#
# set.seed(123)
# imp <- mice(
# data = mice_data,
# m = 5,
# method = meth,
# predictorMatrix = pred,
# maxit = 20,
# printFlag = FALSE
# )
#
# completed <- complete(imp, action = 1)
#
# #add to master
# master2 <- master2 %>%
# mutate(
# public_preK_imputed = completed$public_preK_est,
# preK_was_imputed = is.na(public_preK_est),
# #finalversion keeps observed values exactly and fills only missings
# public_preK_final = if_else(is.na(public_preK_est), public_preK_imputed, public_preK_est)
# )
#
# #observed values should be unchanged
# cor(master2$public_preK_est, master2$public_preK_final, use = "complete.obs")
# table(is.na(master2$public_preK_final))
#
# out_path <- file.path(data_dir, "master_with_imputed_LE_and_preK2.csv")
# write_csv(master2, out_path)
#
# out_path
# library(dplyr)
#
# master2 <- master2 %>%
# mutate(
# preK_z_observed = scale(public_preK_est)[,1],
# preK_z_imputed = scale(public_preK_imputed)[,1]
# )
#
# cor(master2$preK_z_observed,
# master2$preK_z_imputed,
# use = "complete.obs")
>.95 means PMM did not change any observed values. It only filled in missing ones and it did not distort the distribution, which you can see in the histogram (there is polarization, which is reflective of reality).
Notes
Missing values for public Pre-K enrollment (25.7% of tracts) were imputed using Predictive Mean Matching (PMM) with multiple imputation (m=5), incorporating tract-level poverty, median household income, population size, household composition, uninsured rate, and internet access as predictors. PMM preserved the original variance (SD_observed = 40.64; SD_imputed = 41.34) and maintained tract ranking among observed cases (r = 1.00).
Full Dataset Imputation
Warning: package 'mice' was built under R version 4.4.3
Attaching package: 'mice'
The following object is masked from 'package:stats':
filter
The following objects are masked from 'package:base':
cbind, rbind
library (dplyr)
library (readr)
#define the index variable list (use the ORIGINAL preK variable)
index_vars <- c (
"public_preK_est" ,
"est_uninsured_rate" ,
"LE_imputed_2020" ,
"est_has_disability_pct" ,
"est_65plus_amb_pct" ,
"LPA_CrudePrev" ,
"est_med_hh_inc" ,
"est_underemp_perc" ,
"est_belowpov_pct" ,
"Eviction filing rate" ,
"hh_support_risk_score" ,
"Energy Burden (% income)" ,
"est_lesh_perc" ,
"est_no_internet_perc" ,
"lths_pct" ,
"Percent Institutionalized GQ" ,
"Persistent Poverty" ,
"Temperature_Diff"
)
#build MICE dataset (only index variables)
mice_data <- master2 %>% select (all_of (index_vars))
#ensure numeric (important because spaces/special chars can be tricky)
mice_data <- mice_data %>% mutate (across (everything (), as.numeric))
#set methods: impute only variables that actually have NAs, using PMM
meth <- make.method (mice_data)
meth[] <- "" # <--default: do not impute--Took from CS and KW notes
vars_with_na <- names (which (colSums (is.na (mice_data)) > 0 ))
meth[vars_with_na] <- "pmm"
#predictor matrix: by default allow all variables to predict each other (do I need to limit any one specific variable? Need to check)
pred <- make.predictorMatrix (mice_data)
pred[,] <- 1
diag (pred) <- 0
set.seed (123 )
imp <- mice (
data = mice_data,
m = 10 ,
method = meth,
predictorMatrix = pred,
maxit = 20 ,
printFlag = FALSE
)
#extract one completed dataset. Basically telling R to give me the first of the 10 completed datasets
completed <- complete (imp, action = 1 )
#merge back into master2 as *_imp and *_flag columns
#keeping original columns untouched for transparency.
for (v in index_vars) {
imp_col <- paste0 (v, "_imp" )
flag_col <- paste0 (v, "_flag" )
master2[[imp_col]] <- completed[[v]]
master2[[flag_col]] <- ifelse (is.na (mice_data[[v]]), "Imputed" , "Original" )
}
#any remaining NA in the imputed index matrix?
imputed_cols <- paste0 (index_vars, "_imp" )
colSums (is.na (master2[, imputed_cols]))
public_preK_est_imp est_uninsured_rate_imp
0 0
LE_imputed_2020_imp est_has_disability_pct_imp
0 0
est_65plus_amb_pct_imp LPA_CrudePrev_imp
0 0
est_med_hh_inc_imp est_underemp_perc_imp
0 0
est_belowpov_pct_imp Eviction filing rate_imp
0 0
hh_support_risk_score_imp Energy Burden (% income)_imp
0 0
est_lesh_perc_imp est_no_internet_perc_imp
0 0
lths_pct_imp Percent Institutionalized GQ_imp
0 0
Persistent Poverty_imp Temperature_Diff_imp
0 0
out_path <- file.path (data_dir, "master_FULL_joint_imputed_for_index.csv" )
write_csv (master2, out_path)
out_path
[1] "C:/Users/rayo-garza/OneDrive - Center for Public Policy Priorities/Documents/Data 2025/master_FULL_joint_imputed_for_index.csv"
#do we have any nas after imputation
imputed_cols <- paste0 (index_vars, "_imp" )
colSums (is.na (master2[, imputed_cols]))
public_preK_est_imp est_uninsured_rate_imp
0 0
LE_imputed_2020_imp est_has_disability_pct_imp
0 0
est_65plus_amb_pct_imp LPA_CrudePrev_imp
0 0
est_med_hh_inc_imp est_underemp_perc_imp
0 0
est_belowpov_pct_imp Eviction filing rate_imp
0 0
hh_support_risk_score_imp Energy Burden (% income)_imp
0 0
est_lesh_perc_imp est_no_internet_perc_imp
0 0
lths_pct_imp Percent Institutionalized GQ_imp
0 0
Persistent Poverty_imp Temperature_Diff_imp
0 0
#make sure observed variables were NOT changed
cor (master2$ public_preK_est,
master2$ public_preK_est_imp,
use= "complete.obs" )
cor (master2$ est_med_hh_inc,
master2$ est_med_hh_inc_imp,
use= "complete.obs" )
cor (master2$ ` Eviction filing rate ` ,
master2$ ` Eviction filing rate_imp ` ,
use= "complete.obs" )
#distribution preservation
summary (master2$ public_preK_est)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.00 0.00 30.70 42.47 80.50 100.00 64
summary (master2$ public_preK_est_imp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.00 46.20 47.95 100.00 100.00
summary (master2$ est_med_hh_inc)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
18286 78290 95536 105625 123417 250001 2
summary (master2$ est_med_hh_inc_imp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
18286 78026 95536 105783 123798 250001
summary (master2$ ` Eviction filing rate ` )
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.000 1.625 3.175 5.310 6.570 61.720 7
summary (master2$ ` Eviction filing rate_imp ` )
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 1.600 3.180 5.344 6.570 61.720
#variance
sd (master2$ public_preK_est, na.rm= TRUE )
sd (master2$ public_preK_est_imp)
sd (master2$ est_med_hh_inc, na.rm= TRUE )
sd (master2$ est_med_hh_inc_imp)
library (dplyr)
rank_before <- percent_rank (master2$ public_preK_est)
rank_after <- percent_rank (master2$ public_preK_est_imp)
cor (rank_before, rank_after, use= "complete.obs" )
master2 %>%
filter (public_preK_est_flag == "Imputed" ) %>%
select (public_preK_est_imp) %>%
summary ()
public_preK_est_imp
Min. : 0.00
1st Qu.: 25.90
Median : 77.20
Mean : 63.79
3rd Qu.:100.00
Max. :100.00
master2 %>%
filter (est_med_hh_inc_flag == "Imputed" ) %>%
select (est_med_hh_inc_imp) %>%
summary ()
est_med_hh_inc_imp
Min. : 73977
1st Qu.: 99590
Median :125203
Mean :125203
3rd Qu.:150816
Max. :176429
cor (master2[, index_vars], use= "complete.obs" )
public_preK_est est_uninsured_rate LE_imputed_2020
public_preK_est 1.00000000 0.44424579 -0.260946630
est_uninsured_rate 0.44424579 1.00000000 -0.380712468
LE_imputed_2020 -0.26094663 -0.38071247 1.000000000
est_has_disability_pct 0.25827067 0.40359270 -0.441964645
est_65plus_amb_pct 0.22977167 0.42897600 -0.400097965
LPA_CrudePrev 0.46421492 0.70654947 -0.405548159
est_med_hh_inc -0.34150893 -0.60248355 0.422620905
est_underemp_perc 0.05879988 -0.02763129 0.173286383
est_belowpov_pct 0.26831694 0.57825993 -0.340672337
Eviction filing rate 0.31622218 0.25788419 -0.289151917
hh_support_risk_score 0.09606129 0.10606816 -0.004763105
Energy Burden (% income) 0.41223181 0.57010456 -0.294115672
est_lesh_perc 0.37304380 0.70115373 -0.223539877
est_no_internet_perc 0.24611549 0.37481867 -0.308701232
lths_pct 0.48628646 0.72304126 -0.416267462
Percent Institutionalized GQ 0.10353749 -0.01687152 -0.064396911
Persistent Poverty 0.08390720 0.15832105 -0.233754838
Temperature_Diff 0.07050733 0.26165504 -0.213673773
est_has_disability_pct est_65plus_amb_pct
public_preK_est 0.25827067 0.2297716713
est_uninsured_rate 0.40359270 0.4289759988
LE_imputed_2020 -0.44196464 -0.4000979648
est_has_disability_pct 1.00000000 0.5013507524
est_65plus_amb_pct 0.50135075 1.0000000000
LPA_CrudePrev 0.53693813 0.5297073068
est_med_hh_inc -0.41211095 -0.3002824716
est_underemp_perc -0.05289979 -0.1133425311
est_belowpov_pct 0.38084466 0.4099607762
Eviction filing rate 0.30685306 0.2101222086
hh_support_risk_score 0.08896564 0.1155191053
Energy Burden (% income) 0.46141149 0.3353493163
est_lesh_perc 0.25000550 0.3207646305
est_no_internet_perc 0.36533163 0.3756851390
lths_pct 0.43692022 0.4386941572
Percent Institutionalized GQ 0.11070460 0.0002500681
Persistent Poverty 0.11422317 0.1646435633
Temperature_Diff 0.16839073 0.1948639636
LPA_CrudePrev est_med_hh_inc est_underemp_perc
public_preK_est 0.46421492 -0.34150893 0.058799885
est_uninsured_rate 0.70654947 -0.60248355 -0.027631289
LE_imputed_2020 -0.40554816 0.42262091 0.173286383
est_has_disability_pct 0.53693813 -0.41211095 -0.052899790
est_65plus_amb_pct 0.52970731 -0.30028247 -0.113342531
LPA_CrudePrev 1.00000000 -0.54502242 -0.120764423
est_med_hh_inc -0.54502242 1.00000000 0.163491021
est_underemp_perc -0.12076442 0.16349102 1.000000000
est_belowpov_pct 0.65173997 -0.59654508 -0.001474493
Eviction filing rate 0.47327100 -0.32040995 -0.034348132
hh_support_risk_score 0.23838664 -0.18033507 0.050315259
Energy Burden (% income) 0.76606051 -0.45356344 -0.079698732
est_lesh_perc 0.67058794 -0.46457492 -0.001084096
est_no_internet_perc 0.53286680 -0.30195000 -0.195163097
lths_pct 0.84666442 -0.45393078 -0.061155315
Percent Institutionalized GQ 0.08702208 -0.02056847 -0.009991584
Persistent Poverty 0.11121638 -0.08941666 0.036939876
Temperature_Diff 0.33438410 -0.43579453 -0.274660206
est_belowpov_pct Eviction filing rate
public_preK_est 0.268316944 0.316222184
est_uninsured_rate 0.578259932 0.257884187
LE_imputed_2020 -0.340672337 -0.289151917
est_has_disability_pct 0.380844664 0.306853064
est_65plus_amb_pct 0.409960776 0.210122209
LPA_CrudePrev 0.651739973 0.473270998
est_med_hh_inc -0.596545075 -0.320409953
est_underemp_perc -0.001474493 -0.034348132
est_belowpov_pct 1.000000000 0.217769082
Eviction filing rate 0.217769082 1.000000000
hh_support_risk_score 0.217935999 0.124982356
Energy Burden (% income) 0.508837463 0.423777529
est_lesh_perc 0.546749770 0.203220450
est_no_internet_perc 0.488904523 0.146081485
lths_pct 0.536173717 0.436625321
Percent Institutionalized GQ -0.020405988 0.038193004
Persistent Poverty 0.285568868 -0.003957288
Temperature_Diff 0.270229141 -0.027410753
hh_support_risk_score Energy Burden (% income)
public_preK_est 0.096061288 0.41223181
est_uninsured_rate 0.106068157 0.57010456
LE_imputed_2020 -0.004763105 -0.29411567
est_has_disability_pct 0.088965643 0.46141149
est_65plus_amb_pct 0.115519105 0.33534932
LPA_CrudePrev 0.238386638 0.76606051
est_med_hh_inc -0.180335072 -0.45356344
est_underemp_perc 0.050315259 -0.07969873
est_belowpov_pct 0.217935999 0.50883746
Eviction filing rate 0.124982356 0.42377753
hh_support_risk_score 1.000000000 0.22930946
Energy Burden (% income) 0.229309464 1.00000000
est_lesh_perc 0.181352974 0.51756307
est_no_internet_perc 0.162892228 0.34072914
lths_pct 0.156710568 0.69863313
Percent Institutionalized GQ 0.027206012 0.09087590
Persistent Poverty -0.023694422 -0.06923601
Temperature_Diff 0.100137564 0.21545279
est_lesh_perc est_no_internet_perc lths_pct
public_preK_est 0.373043797 0.2461155 0.48628646
est_uninsured_rate 0.701153728 0.3748187 0.72304126
LE_imputed_2020 -0.223539877 -0.3087012 -0.41626746
est_has_disability_pct 0.250005500 0.3653316 0.43692022
est_65plus_amb_pct 0.320764630 0.3756851 0.43869416
LPA_CrudePrev 0.670587937 0.5328668 0.84666442
est_med_hh_inc -0.464574922 -0.3019500 -0.45393078
est_underemp_perc -0.001084096 -0.1951631 -0.06115531
est_belowpov_pct 0.546749770 0.4889045 0.53617372
Eviction filing rate 0.203220450 0.1460815 0.43662532
hh_support_risk_score 0.181352974 0.1628922 0.15671057
Energy Burden (% income) 0.517563073 0.3407291 0.69863313
est_lesh_perc 1.000000000 0.3897666 0.62917178
est_no_internet_perc 0.389766637 1.0000000 0.46387996
lths_pct 0.629171782 0.4638800 1.00000000
Percent Institutionalized GQ -0.003118287 -0.0224041 0.12032952
Persistent Poverty -0.029656895 0.2196679 0.12584844
Temperature_Diff 0.336250849 0.1682932 0.15203082
Percent Institutionalized GQ Persistent Poverty
public_preK_est 0.1035374907 0.083907205
est_uninsured_rate -0.0168715171 0.158321049
LE_imputed_2020 -0.0643969112 -0.233754838
est_has_disability_pct 0.1107046018 0.114223175
est_65plus_amb_pct 0.0002500681 0.164643563
LPA_CrudePrev 0.0870220766 0.111216376
est_med_hh_inc -0.0205684698 -0.089416659
est_underemp_perc -0.0099915841 0.036939876
est_belowpov_pct -0.0204059877 0.285568868
Eviction filing rate 0.0381930040 -0.003957288
hh_support_risk_score 0.0272060123 -0.023694422
Energy Burden (% income) 0.0908758966 -0.069236006
est_lesh_perc -0.0031182870 -0.029656895
est_no_internet_perc -0.0224041033 0.219667880
lths_pct 0.1203295228 0.125848440
Percent Institutionalized GQ 1.0000000000 -0.038881310
Persistent Poverty -0.0388813097 1.000000000
Temperature_Diff -0.0510776133 -0.013265470
Temperature_Diff
public_preK_est 0.07050733
est_uninsured_rate 0.26165504
LE_imputed_2020 -0.21367377
est_has_disability_pct 0.16839073
est_65plus_amb_pct 0.19486396
LPA_CrudePrev 0.33438410
est_med_hh_inc -0.43579453
est_underemp_perc -0.27466021
est_belowpov_pct 0.27022914
Eviction filing rate -0.02741075
hh_support_risk_score 0.10013756
Energy Burden (% income) 0.21545279
est_lesh_perc 0.33625085
est_no_internet_perc 0.16829323
lths_pct 0.15203082
Percent Institutionalized GQ -0.05107761
Persistent Poverty -0.01326547
Temperature_Diff 1.00000000
cor (master2[, imputed_cols])
public_preK_est_imp est_uninsured_rate_imp
public_preK_est_imp 1.00000000 0.440237016
est_uninsured_rate_imp 0.44023702 1.000000000
LE_imputed_2020_imp -0.26716877 -0.390014909
est_has_disability_pct_imp 0.23712246 0.163370748
est_65plus_amb_pct_imp 0.23718262 0.299673214
LPA_CrudePrev_imp 0.50049324 0.619440177
est_med_hh_inc_imp -0.40214148 -0.594884445
est_underemp_perc_imp 0.18437855 0.002708014
est_belowpov_pct_imp 0.34755602 0.326362528
Eviction filing rate_imp 0.29152280 0.192143770
hh_support_risk_score_imp 0.03197354 0.094709424
Energy Burden (% income)_imp 0.44072292 0.496633106
est_lesh_perc_imp 0.33084906 0.698543650
est_no_internet_perc_imp 0.23589393 0.396225567
lths_pct_imp 0.44469722 0.642829337
Percent Institutionalized GQ_imp 0.05680903 -0.029294798
Persistent Poverty_imp -0.02698461 0.098398270
Temperature_Diff_imp 0.11902275 0.296228446
LE_imputed_2020_imp est_has_disability_pct_imp
public_preK_est_imp -0.26716877 0.2371225
est_uninsured_rate_imp -0.39001491 0.1633707
LE_imputed_2020_imp 1.00000000 -0.2157128
est_has_disability_pct_imp -0.21571275 1.0000000
est_65plus_amb_pct_imp -0.29708726 0.4052309
LPA_CrudePrev_imp -0.39291597 0.5231132
est_med_hh_inc_imp 0.37845982 -0.1224407
est_underemp_perc_imp 0.09196638 0.1776450
est_belowpov_pct_imp -0.18219021 0.4965270
Eviction filing rate_imp -0.23150542 0.2540476
hh_support_risk_score_imp -0.02062798 0.1022485
Energy Burden (% income)_imp -0.22860879 0.4283241
est_lesh_perc_imp -0.22457981 0.0810751
est_no_internet_perc_imp -0.25203379 0.4729545
lths_pct_imp -0.38433132 0.5087759
Percent Institutionalized GQ_imp -0.06748717 0.4031216
Persistent Poverty_imp -0.18902894 0.0760528
Temperature_Diff_imp -0.19619495 0.1297674
est_65plus_amb_pct_imp LPA_CrudePrev_imp
public_preK_est_imp 0.23718262 0.5004932
est_uninsured_rate_imp 0.29967321 0.6194402
LE_imputed_2020_imp -0.29708726 -0.3929160
est_has_disability_pct_imp 0.40523093 0.5231132
est_65plus_amb_pct_imp 1.00000000 0.4215015
LPA_CrudePrev_imp 0.42150150 1.0000000
est_med_hh_inc_imp -0.26623460 -0.4852699
est_underemp_perc_imp 0.07636548 0.2038324
est_belowpov_pct_imp 0.35499996 0.6536837
Eviction filing rate_imp 0.13708439 0.4052103
hh_support_risk_score_imp 0.04340267 0.1812116
Energy Burden (% income)_imp 0.29547165 0.7473551
est_lesh_perc_imp 0.15466648 0.5871815
est_no_internet_perc_imp 0.27529434 0.5478003
lths_pct_imp 0.32212030 0.8213187
Percent Institutionalized GQ_imp 0.15616287 0.2952144
Persistent Poverty_imp 0.14828495 0.1326349
Temperature_Diff_imp 0.12503563 0.3456429
est_med_hh_inc_imp est_underemp_perc_imp
public_preK_est_imp -0.40214148 0.184378545
est_uninsured_rate_imp -0.59488445 0.002708014
LE_imputed_2020_imp 0.37845982 0.091966382
est_has_disability_pct_imp -0.12244071 0.177645036
est_65plus_amb_pct_imp -0.26623460 0.076365480
LPA_CrudePrev_imp -0.48526991 0.203832391
est_med_hh_inc_imp 1.00000000 -0.045800776
est_underemp_perc_imp -0.04580078 1.000000000
est_belowpov_pct_imp -0.40891524 0.540089583
Eviction filing rate_imp -0.23808987 -0.041154943
hh_support_risk_score_imp -0.10239326 -0.123538226
Energy Burden (% income)_imp -0.45727395 0.304888099
est_lesh_perc_imp -0.42905086 -0.015562773
est_no_internet_perc_imp -0.24261485 -0.001622255
lths_pct_imp -0.35690916 0.063840955
Percent Institutionalized GQ_imp 0.08766797 0.131327325
Persistent Poverty_imp -0.14621087 0.189027380
Temperature_Diff_imp -0.45395696 -0.046003931
est_belowpov_pct_imp Eviction filing rate_imp
public_preK_est_imp 0.347556023 0.291522796
est_uninsured_rate_imp 0.326362528 0.192143770
LE_imputed_2020_imp -0.182190214 -0.231505419
est_has_disability_pct_imp 0.496527027 0.254047552
est_65plus_amb_pct_imp 0.354999960 0.137084390
LPA_CrudePrev_imp 0.653683718 0.405210336
est_med_hh_inc_imp -0.408915242 -0.238089869
est_underemp_perc_imp 0.540089583 -0.041154943
est_belowpov_pct_imp 1.000000000 0.168851800
Eviction filing rate_imp 0.168851800 1.000000000
hh_support_risk_score_imp -0.005163479 0.030961506
Energy Burden (% income)_imp 0.554572265 0.283828439
est_lesh_perc_imp 0.283949990 0.153948533
est_no_internet_perc_imp 0.363048861 0.134171163
lths_pct_imp 0.424880343 0.321402210
Percent Institutionalized GQ_imp 0.342353638 0.210216815
Persistent Poverty_imp 0.220944898 -0.026910648
Temperature_Diff_imp 0.283663259 -0.001515009
hh_support_risk_score_imp
public_preK_est_imp 0.031973538
est_uninsured_rate_imp 0.094709424
LE_imputed_2020_imp -0.020627979
est_has_disability_pct_imp 0.102248519
est_65plus_amb_pct_imp 0.043402665
LPA_CrudePrev_imp 0.181211633
est_med_hh_inc_imp -0.102393262
est_underemp_perc_imp -0.123538226
est_belowpov_pct_imp -0.005163479
Eviction filing rate_imp 0.030961506
hh_support_risk_score_imp 1.000000000
Energy Burden (% income)_imp 0.171380030
est_lesh_perc_imp 0.112438380
est_no_internet_perc_imp 0.227663233
lths_pct_imp 0.209858283
Percent Institutionalized GQ_imp -0.150195712
Persistent Poverty_imp 0.006254729
Temperature_Diff_imp 0.032006681
Energy Burden (% income)_imp est_lesh_perc_imp
public_preK_est_imp 0.44072292 0.330849063
est_uninsured_rate_imp 0.49663311 0.698543650
LE_imputed_2020_imp -0.22860879 -0.224579813
est_has_disability_pct_imp 0.42832411 0.081075097
est_65plus_amb_pct_imp 0.29547165 0.154666479
LPA_CrudePrev_imp 0.74735506 0.587181515
est_med_hh_inc_imp -0.45727395 -0.429050863
est_underemp_perc_imp 0.30488810 -0.015562773
est_belowpov_pct_imp 0.55457226 0.283949990
Eviction filing rate_imp 0.28382844 0.153948533
hh_support_risk_score_imp 0.17138003 0.112438380
Energy Burden (% income)_imp 1.00000000 0.394451313
est_lesh_perc_imp 0.39445131 1.000000000
est_no_internet_perc_imp 0.39851225 0.418367178
lths_pct_imp 0.63909027 0.605963762
Percent Institutionalized GQ_imp 0.07304903 0.097344488
Persistent Poverty_imp 0.08885868 -0.007464553
Temperature_Diff_imp 0.27871585 0.315620470
est_no_internet_perc_imp lths_pct_imp
public_preK_est_imp 0.235893930 0.44469722
est_uninsured_rate_imp 0.396225567 0.64282934
LE_imputed_2020_imp -0.252033791 -0.38433132
est_has_disability_pct_imp 0.472954524 0.50877593
est_65plus_amb_pct_imp 0.275294337 0.32212030
LPA_CrudePrev_imp 0.547800283 0.82131870
est_med_hh_inc_imp -0.242614850 -0.35690916
est_underemp_perc_imp -0.001622255 0.06384095
est_belowpov_pct_imp 0.363048861 0.42488034
Eviction filing rate_imp 0.134171163 0.32140221
hh_support_risk_score_imp 0.227663233 0.20985828
Energy Burden (% income)_imp 0.398512254 0.63909027
est_lesh_perc_imp 0.418367178 0.60596376
est_no_internet_perc_imp 1.000000000 0.61927417
lths_pct_imp 0.619274171 1.00000000
Percent Institutionalized GQ_imp 0.039786917 0.17400372
Persistent Poverty_imp 0.176312718 0.12425939
Temperature_Diff_imp 0.205239882 0.18166003
Percent Institutionalized GQ_imp
public_preK_est_imp 5.680903e-02
est_uninsured_rate_imp -2.929480e-02
LE_imputed_2020_imp -6.748717e-02
est_has_disability_pct_imp 4.031216e-01
est_65plus_amb_pct_imp 1.561629e-01
LPA_CrudePrev_imp 2.952144e-01
est_med_hh_inc_imp 8.766797e-02
est_underemp_perc_imp 1.313273e-01
est_belowpov_pct_imp 3.423536e-01
Eviction filing rate_imp 2.102168e-01
hh_support_risk_score_imp -1.501957e-01
Energy Burden (% income)_imp 7.304903e-02
est_lesh_perc_imp 9.734449e-02
est_no_internet_perc_imp 3.978692e-02
lths_pct_imp 1.740037e-01
Percent Institutionalized GQ_imp 1.000000e+00
Persistent Poverty_imp -6.482865e-05
Temperature_Diff_imp 3.227010e-02
Persistent Poverty_imp Temperature_Diff_imp
public_preK_est_imp -2.698461e-02 0.119022750
est_uninsured_rate_imp 9.839827e-02 0.296228446
LE_imputed_2020_imp -1.890289e-01 -0.196194948
est_has_disability_pct_imp 7.605280e-02 0.129767431
est_65plus_amb_pct_imp 1.482849e-01 0.125035632
LPA_CrudePrev_imp 1.326349e-01 0.345642866
est_med_hh_inc_imp -1.462109e-01 -0.453956962
est_underemp_perc_imp 1.890274e-01 -0.046003931
est_belowpov_pct_imp 2.209449e-01 0.283663259
Eviction filing rate_imp -2.691065e-02 -0.001515009
hh_support_risk_score_imp 6.254729e-03 0.032006681
Energy Burden (% income)_imp 8.885868e-02 0.278715849
est_lesh_perc_imp -7.464553e-03 0.315620470
est_no_internet_perc_imp 1.763127e-01 0.205239882
lths_pct_imp 1.242594e-01 0.181660032
Percent Institutionalized GQ_imp -6.482865e-05 0.032270105
Persistent Poverty_imp 1.000000e+00 0.101897111
Temperature_Diff_imp 1.018971e-01 1.000000000
sd (master2$ est_med_hh_inc, na.rm= TRUE )
sd (master2$ est_med_hh_inc_imp)
Full joint imputation using PMM preserved the correlation structure of the dataset and did not materially alter rank ordering or distributional characteristics of key socioeconomic indicators.