Emanuel Ben-David
Joint work with Tom Mule and Joe Schafer
2026-03-23
| Hhindex | pernum | sex | race | hisp | age | relate | ownership | headsex | headrace | headhisp | headage | householdsize |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 86 | 2 | 2 | 2 | 1 | 13 | 7 | 3 | 2 | 2 | 1 | 61 | 2 |
| 86 | 3 | 2 | 2 | NA | NA | 9 | 3 | 2 | 2 | 1 | 61 | 2 |
| 354 | 2 | 1 | 3 | 1 | 30 | 2 | NA | NA | NA | 1 | 57 | 3 |
| 354 | 3 | 1 | 3 | 1 | 24 | 2 | NA | NA | NA | 1 | 57 | 3 |
| 354 | 4 | 2 | NA | 1 | NA | 10 | NA | NA | NA | 1 | 57 | 3 |
| 1191 | 2 | 1 | 1 | 1 | 42 | 1 | 1 | 2 | 1 | 1 | 41 | 3 |
| 1191 | 3 | 1 | 1 | 1 | 16 | 2 | 1 | 2 | 1 | 1 | 41 | 3 |
| 1191 | 4 | 1 | 1 | 1 | 10 | 2 | 1 | 2 | 1 | 1 | 41 | 3 |
| 1246 | 2 | 1 | 1 | 1 | 45 | 1 | 1 | 2 | 1 | 1 | 41 | 3 |
| 1246 | 3 | 1 | 1 | 1 | 11 | 2 | 1 | 2 | 1 | 1 | 41 | 3 |
15,361 individuals in 7627 households • 13 variables • 23,265 total missing values • NA = missing at random (MAR)
Objective: Compare four imputation methods applied to nested household survey data
Data Source: 2012 American Community Survey (ACS) Public Use Microdata Sample
Evaluation Scope: 33 distinct patterns across 5 categories
Metrics:
| Method | Approach |
|---|---|
| Hot Deck | Traditional donor-based imputation |
| By HHSIZE | Latent class model — flattened household structure |
| Nested | Latent class model — preserves household clustering |
| Two-Level LC | Hierarchical latent class — household + individual levels |
Imputed records must satisfy household-level structural constraints. Key edit rules include:
Household Head & Spouse
Parent–Child & Adoption
Siblings & Grandparents
Source: Ben-David, Mule & Schafer (2024), Part I & II.
| Method | Mean | Median | SD | Min | Max | Rank |
|---|---|---|---|---|---|---|
| Nested | 0.0369 | 0.0170 | 0.0490 | 0.0010 | 0.1660 | 1st |
| By HHSIZE | 0.0373 | 0.0160 | 0.0486 | 0.0010 | 0.1650 | 2nd |
| Two-Level LC | 0.0401 | 0.0180 | 0.0509 | 0.0010 | 0.1660 | 3rd |
| Hot Deck | 0.0481 | 0.0180 | 0.0617 | 0.0010 | 0.2260 | 4th |
RMSE computed across all 33 patterns. Lower RMSE indicates estimates closer to the population parameter.
| Method | Patterns Covered (of 33) | Coverage Rate | Mean CI Width |
|---|---|---|---|
| Nested | 9 | 27.3% | 0.0158 |
| Hot Deck | 8 | 24.2% | 0.0166 |
| By HHSIZE | 7 | 21.2% | 0.0158 |
| Two-Level LC | 6 | 18.2% | 0.0161 |
CI coverage = proportion of 33 patterns where the 95% CI contains the population parameter. A well-calibrated method should approach 95%.
| Method | Mean RMSE | RMSE Rank | CI Coverage | CI Rank |
|---|---|---|---|---|
| Nested | 0.0369 | 1st | 27.3% (9/33) | 1st |
| By HHSIZE | 0.0373 | 2nd | 21.2% (7/33) | 3rd |
| Two-Level LC | 0.0401 | 3rd | 18.2% (6/33) | 4th |
| Hot Deck | 0.0481 | 4th | 24.2% (8/33) | 2nd |
All methods show CI coverage rates substantially below the 95% nominal level.
| Method | Highest RMSE | Lowest RMSE | Net |
|---|---|---|---|
| By HHSIZE | 4 | 10 | 6 |
| Nested | 6 | 6 | 0 |
| Two-Level LC | 4 | 3 | -1 |
| Hot Deck | 19 | 14 | -5 |
For each of the 33 patterns, the method with the lowest and highest RMSE is identified. Net = Lowest count − Highest count.
Positive net score = method had the lowest RMSE more often than the highest. Negative = converse.
Detailed Comparison Tables by Category
| Pattern | Pop | Hot Deck | By HHSIZE | Nested | 2-Level LC |
|---|---|---|---|---|---|
| All same race HH size = 2 | 0.941 | 0.0510 | 0.0220 | 0.0180 | 0.0230 |
| All same race HH size = 3 | 0.907 | 0.1580 | 0.0670 | 0.0690 | 0.0880 |
| All same race HH size = 4 | 0.900 | 0.2260 | 0.0770 | 0.0890 | 0.1320 |
| White couple | 0.578 | 0.0240 | 0.0090 | 0.0080 | 0.0080 |
| Same race couple | 0.694 | 0.0760 | 0.0470 | 0.0470 | 0.0510 |
| White-nonwhite couple | 0.034 | 0.0490 | 0.0160 | 0.0190 | 0.0200 |
| Non-White couple, homeowner | 0.072 | 0.0120 | 0.0090 | 0.0070 | 0.0080 |
🟢 Green = Lowest RMSE in row   🔴 Red = Highest RMSE in row
| Pattern | Pop | Hot Deck | By HHSIZE | Nested | 2-Level LC |
|---|---|---|---|---|---|
| Spouse present | 0.694 | 0.0180 | 0.0140 | 0.0170 | 0.0180 |
| Spouse present, HH is White | 0.609 | 0.0100 | 0.0190 | 0.0190 | 0.0200 |
| Spouse present, HH is Black | 0.152 | 0.1340 | 0.1290 | 0.1250 | 0.1250 |
| HH older than Spouse, White HH | 0.327 | 0.0070 | 0.0060 | 0.0060 | 0.0070 |
| couple with age difference less than five | 0.486 | 0.0490 | 0.0220 | 0.0090 | 0.0330 |
🟢 Green = Lowest RMSE in row   🔴 Red = Highest RMSE in row
| Pattern | Pop | Hot Deck | By HHSIZE | Nested | 2-Level LC |
|---|---|---|---|---|---|
| At least one biological child present | 0.438 | 0.1630 | 0.1650 | 0.1660 | 0.1660 |
| Only one parent | 0.171 | 0.0180 | 0.0180 | 0.0190 | 0.0190 |
| Adult female w/ at least one child under 5 | 0.327 | 0.0630 | 0.0620 | 0.0550 | 0.0540 |
| Adult Black female w/ at least one child under 18 | 0.149 | 0.1070 | 0.1110 | 0.1070 | 0.1080 |
| Adult Hisp male w/ at least one child under 10 | 0.027 | 0.0130 | 0.0160 | 0.0150 | 0.0170 |
| Hisp couple with at least one biological child | 0.025 | 0.0060 | 0.0160 | 0.0120 | 0.0170 |
| At least one stepchild | 0.019 | 0.0070 | 0.0080 | 0.0080 | 0.0070 |
| At least one adopted child, White couple | 0.008 | 0.0040 | 0.0050 | 0.0040 | 0.0050 |
| Black couple with at least two biological children | 0.006 | 0.0010 | 0.0030 | 0.0030 | 0.0030 |
🟢 Green = Lowest RMSE in row   🔴 Red = Highest RMSE in row
| Pattern | Pop | Hot Deck | By HHSIZE | Nested | 2-Level LC |
|---|---|---|---|---|---|
| At least two generations present, Hisp couple | 0.026 | 0.0060 | 0.0160 | 0.0130 | 0.0170 |
| Two generations present, Black HH | 0.030 | 0.0190 | 0.0190 | 0.0200 | 0.0200 |
| At least three generations present | 0.183 | 0.1620 | 0.1600 | 0.1610 | 0.1610 |
| Three generations present, White couple | 0.005 | 0.0020 | 0.0020 | 0.0030 | 0.0030 |
| One grandchild present | 0.034 | 0.0160 | 0.0160 | 0.0170 | 0.0160 |
🟢 Green = Lowest RMSE in row   🔴 Red = Highest RMSE in row
| Pattern | Pop | Hot Deck | By HHSIZE | Nested | 2-Level LC |
|---|---|---|---|---|---|
| Male HH, homeowner | 0.299 | 0.0210 | 0.0210 | 0.0260 | 0.0180 |
| HH over 35, no child present | 0.402 | 0.1400 | 0.1380 | 0.1390 | 0.1380 |
| White HH with Hisp origin | 0.066 | 0.0150 | 0.0120 | 0.0110 | 0.0140 |
| Black HH, home owner | 0.035 | 0.0030 | 0.0020 | 0.0020 | 0.0020 |
| Black HH under 40, home owner | 0.006 | 0.0020 | 0.0010 | 0.0010 | 0.0020 |
| Hisp HH over 50, home owner | 0.017 | 0.0040 | 0.0030 | 0.0030 | 0.0020 |
| White HH under 25, home owner | 0.006 | 0.0010 | 0.0010 | 0.0010 | 0.0010 |
🟢 Green = Lowest RMSE in row   🔴 Red = Highest RMSE in row