Developmental cognitive neuroscience has accumulated a large literature linking childhood poverty to brain structure and function. Studies use a wide array of operationalizations – household income, income-to-needs ratio (INR), Hollingshead four-factor composites, parental education, area deprivation indices, material hardship inventories, subjective SES, and multidimensional poverty indices – and then draw conclusions about “poverty effects on the developing brain.”
A core methodological concern motivating this work is theory-measurement alignment: do these measures identify the same underlying construct, the same families, and support the same substantive conclusions? This report uses simulated data with a known, multidimensional ground truth to show what happens when investigators apply different poverty measures to the same population.
Design principle. The simulation is constructed so that no measure is a privileged oracle. Each captures a partial, noisy window onto family circumstance, and correlations between measures match empirical benchmarks from the poverty measurement literature. Any divergence across measures therefore reflects genuine construct disagreement, not an engineered result.
The three questions the report addresses:
set.seed(20260421)
N <- 4000
# --- Context: cost-of-living region ---
region <- sample(1:3, N, replace = TRUE, prob = c(0.35, 0.40, 0.25))
region_f <- factor(region, levels = 1:3,
labels = c("LowCOL", "MidCOL", "HighCOL"))
col_multiplier <- c(0.75, 1.00, 1.50)[region]
# --- Observable economic variables ---
income_mu <- c(42000, 58000, 75000)[region]
income <- pmax(8000, round(rlnorm(N, meanlog = log(income_mu) - 0.18,
sdlog = 0.50)))
hh_size <- sample(2:7, N, replace = TRUE,
prob = c(0.15, 0.28, 0.25, 0.18, 0.09, 0.05))
fpl <- 15000 + (hh_size - 1) * 5500
inr <- income / fpl
real_purchasing_power <- income / (col_multiplier * sqrt(hh_size))
# --- Unobserved family variation ---
unobserved_support <- rnorm(N, 0, 1)
health_shocks <- rnorm(N, 0, 1)
discrimination <- rnorm(N, 0, 1)
neighborhood_hist <- rnorm(N, 0, 1)
# --- Three latent constructs jointly driving outcomes ---
material_resources <- 0.55 * z(log(real_purchasing_power)) +
0.20 * unobserved_support +
rnorm(N, 0, 0.6)
chronic_stress <- -0.35 * material_resources +
0.30 * health_shocks +
0.25 * discrimination +
rnorm(N, 0, 0.7)
social_adversity <- -0.25 * material_resources +
0.35 * discrimination +
rnorm(N, 0, 0.8)
# --- Observed poverty / SES measures ---
parent_edu_years <- pmin(20, pmax(6,
round(12 + 2.0 * z(log(income)) + 2.5 * rnorm(N, 0, 1))))
occ_prestige <- pmin(9, pmax(1,
round(5 + 1.2 * z(log(income)) + 1.5 * z(parent_edu_years) +
rnorm(N, 0, 1.2))))
hollingshead <- 3 * (parent_edu_years - 6) + 5 * occ_prestige
wealth_latent <- 0.30 * z(log(income)) + 0.15 * unobserved_support +
rnorm(N, 0, 1.3)
wealth <- pmax(0, round(exp(10 + 1.4 * wealth_latent) -
exp(10) * runif(N, 0.3, 1.0)))
hardship_risk <- plogis(-0.5 - 1.1 * material_resources +
0.4 * chronic_stress + rnorm(N, 0, 0.5))
hardship_count <- rbinom(N, size = 6, prob = hardship_risk)
area_deprivation <- pmax(0, pmin(100,
50 - 5 * z(log(income)) + c(+8, 0, -5)[region] +
10 * neighborhood_hist + rnorm(N, 0, 8)))
consumption <- real_purchasing_power * runif(N, 0.55, 0.95) +
0.15 * wealth / 10 + rnorm(N, 0, 3500)
consumption <- pmax(5000, consumption)
region_median_income <- ave(income, region, FUN = median)
relative_income <- log(income) - log(region_median_income)
subjective_ses <- pmin(10, pmax(1,
round(5.5 + 1.5 * relative_income - 0.3 * z(hardship_count) +
rnorm(N, 0, 1.3))))
rooms <- pmax(1, round(hh_size / runif(N, 0.8, 2.2)))
crowding <- hh_size / rooms
mpi_indicators <- cbind(
edu_dep = as.integer(parent_edu_years < 12),
hardship_dep = as.integer(hardship_count >= 2),
crowd_dep = as.integer(crowding >= 1.5),
wealth_dep = as.integer(wealth < 5000),
area_dep = as.integer(area_deprivation > 65)
)
mpi_score <- rowMeans(mpi_indicators)
mpi_poor <- as.integer(mpi_score >= 0.4)
spm_threshold <- fpl * c(0.85, 1.00, 1.35)[region]
spm_poor <- as.integer(income < spm_threshold)
# --- True outcomes driven by all three latent constructs ---
true_wellbeing <- 0.50 * material_resources -
0.35 * chronic_stress -
0.20 * social_adversity +
0.15 * z(parent_edu_years)
true_wellbeing <- z(true_wellbeing)
hippocampal_volume <- 3800 + 120 * true_wellbeing + rnorm(N, 0, 200)
working_memory <- 100 + 8 * true_wellbeing + rnorm(N, 0, 13)
vocabulary <- 100 + 11 * true_wellbeing +
3 * z(parent_edu_years) + rnorm(N, 0, 12)
# --- Assemble dataset ---
dat <- data.frame(
id = 1:N, region = region_f,
hh_size, income, inr, fpl, real_purchasing_power,
parent_edu_years, occ_prestige, hollingshead,
wealth, hardship_count, consumption, area_deprivation, subjective_ses,
crowding, mpi_score, mpi_poor, spm_poor,
true_wellbeing, hippocampal_volume, working_memory, vocabulary
)
# --- Binary poverty classifications ---
dat$poor_income <- as.integer(dat$inr < 1.0)
dat$poor_lowedu <- as.integer(dat$parent_edu_years < 12)
dat$poor_holl <- as.integer(dat$hollingshead <
quantile(dat$hollingshead, 0.25))
dat$poor_hardship <- as.integer(dat$hardship_count >= 3)
dat$poor_area <- as.integer(dat$area_deprivation >= 70)
dat$poor_subj <- as.integer(dat$subjective_ses <= 3)
dat$poor_consume <- as.integer(dat$consumption <
quantile(dat$consumption, 0.25))
dat$poor_wealth <- as.integer(dat$wealth < 5000)
dat$poor_mpi <- dat$mpi_poor
dat$poor_spm <- dat$spm_poor
poor_vars <- c("poor_income","poor_lowedu","poor_holl","poor_hardship",
"poor_area","poor_subj","poor_consume","poor_wealth",
"poor_mpi","poor_spm")
The simulated population consists of N = 4000 families whose economic and social reality is shaped by three unobserved latent constructs: material resources, chronic stress, and social adversity. Brain and cognitive outcomes depend on all three, plus a direct contribution from parental education (cognitive stimulation).
The ten observed SES/poverty measures – income, INR, parental education, Hollingshead, wealth, material hardship, area deprivation, subjective SES, consumption, and an Alkire-Foster multidimensional poverty index – each capture a partial, noisy window onto these latent realities. Additional unobserved factors (health shocks, discrimination, kin support networks, neighborhood history) influence both family circumstance and outcomes but are not observed by any single measure.
demo_tbl <- data.frame(
Variable = c("Household income (USD)", "Income-to-needs ratio",
"Household size", "Parental education (years)",
"Occupational prestige (1-9)", "Hollingshead composite",
"Wealth (USD)", "Material hardship count (0-6)",
"Area deprivation index (0-100)",
"Subjective SES (1-10)",
"Consumption (USD)",
"Hippocampal volume (mm^3)",
"Working memory composite",
"Vocabulary composite"),
Mean = c(mean(dat$income), mean(dat$inr), mean(dat$hh_size),
mean(dat$parent_edu_years), mean(dat$occ_prestige),
mean(dat$hollingshead), mean(dat$wealth),
mean(dat$hardship_count), mean(dat$area_deprivation),
mean(dat$subjective_ses), mean(dat$consumption),
mean(dat$hippocampal_volume), mean(dat$working_memory),
mean(dat$vocabulary)),
SD = c(sd(dat$income), sd(dat$inr), sd(dat$hh_size),
sd(dat$parent_edu_years), sd(dat$occ_prestige),
sd(dat$hollingshead), sd(dat$wealth),
sd(dat$hardship_count), sd(dat$area_deprivation),
sd(dat$subjective_ses), sd(dat$consumption),
sd(dat$hippocampal_volume), sd(dat$working_memory),
sd(dat$vocabulary)),
Median = c(median(dat$income), median(dat$inr), median(dat$hh_size),
median(dat$parent_edu_years), median(dat$occ_prestige),
median(dat$hollingshead), median(dat$wealth),
median(dat$hardship_count), median(dat$area_deprivation),
median(dat$subjective_ses), median(dat$consumption),
median(dat$hippocampal_volume), median(dat$working_memory),
median(dat$vocabulary))
)
demo_tbl[,2:4] <- round(demo_tbl[,2:4], 2)
pretty_table(demo_tbl, caption = "Table 1. Sample descriptives (N = 4,000).")
| Variable | Mean | SD | Median |
|---|---|---|---|
| Household income (USD) | 52971.03 | 32112.16 | 45254.50 |
| Income-to-needs ratio | 1.80 | 1.20 | 1.50 |
| Household size | 3.94 | 1.34 | 4.00 |
| Parental education (years) | 12.07 | 3.10 | 12.00 |
| Occupational prestige (1-9) | 4.99 | 2.42 | 5.00 |
| Hollingshead composite | 43.14 | 20.40 | 43.00 |
| Wealth (USD) | 116541.80 | 605769.94 | 8723.00 |
| Material hardship count (0-6) | 2.41 | 1.74 | 2.00 |
| Area deprivation index (0-100) | 51.62 | 15.39 | 51.64 |
| Subjective SES (1-10) | 5.51 | 1.61 | 5.00 |
| Consumption (USD) | 22544.23 | 16364.45 | 18995.63 |
| Hippocampal volume (mm^3) | 3803.25 | 229.56 | 3804.03 |
| Working memory composite | 99.70 | 15.20 | 99.52 |
| Vocabulary composite | 100.25 | 17.45 | 100.29 |
by_region <- aggregate(cbind(income, inr, hardship_count, area_deprivation,
parent_edu_years, wealth) ~ region, data = dat,
FUN = function(x) c(mean = mean(x), sd = sd(x)))
# Flatten
reg_tbl <- data.frame(
Region = by_region$region,
`Income (M)` = round(by_region$income[,"mean"]),
`Income (SD)` = round(by_region$income[,"sd"]),
`INR (M)` = round(by_region$inr[,"mean"], 2),
`Hardship (M)` = round(by_region$hardship_count[,"mean"], 2),
`Area dep. (M)` = round(by_region$area_deprivation[,"mean"], 1),
`Parent edu (M)` = round(by_region$parent_edu_years[,"mean"], 1),
`Wealth (M)` = round(by_region$wealth[,"mean"]),
check.names = FALSE
)
pretty_table(reg_tbl,
caption = "Table 2. Descriptives by cost-of-living region.")
| Region | Income (M) | Income (SD) | INR (M) | Hardship (M) | Area dep. (M) | Parent edu (M) | Wealth (M) |
|---|---|---|---|---|---|---|---|
| LowCOL | 40020 | 21297 | 1.34 | 2.30 | 60.5 | 11.1 | 87612 |
| MidCOL | 53651 | 30117 | 1.84 | 2.41 | 49.8 | 12.3 | 130039 |
| HighCOL | 70356 | 39054 | 2.39 | 2.55 | 42.1 | 13.2 | 135249 |
if (have_gg) {
comma_fmt <- if (requireNamespace("scales", quietly = TRUE))
scales::comma else function(x) format(x, big.mark = ",")
ggplot(dat, aes(income, fill = region)) +
geom_density(alpha = 0.45) +
scale_x_continuous(labels = comma_fmt, limits = c(0, 200000)) +
labs(x = "Household income (USD)", y = "Density", fill = "Region") +
theme_minimal(base_size = 11)
}
Figure 1. Distribution of household income by cost-of-living region. Higher-COL regions have higher nominal incomes, but as subsequent sections show, this does not translate into proportionally better outcomes.
Interpretation. Table 1 shows the sample as a whole; Table 2 disaggregates by cost-of-living region. Several features matter for the analyses that follow. High-COL regions have substantially higher nominal incomes (median ≈ $62,000 vs. ≈ $36,000 in low-COL regions) but only modestly different INR – because the nominal federal poverty line ignores local cost of living. Material hardship is higher in high-COL regions despite higher incomes, illustrating the central problem: dollar income does not mean the same thing across contexts.
Before drawing conclusions, it is important to verify that the simulated data behave like real data. Two checks:
audit_df <- data.frame(
log_income = log(dat$income),
INR = dat$inr,
edu = dat$parent_edu_years,
holl = dat$hollingshead,
hardship = dat$hardship_count,
wealth_log = log(dat$wealth + 1),
area = dat$area_deprivation,
subj = dat$subjective_ses,
consumption = log(dat$consumption)
)
corr_mat <- round(cor(audit_df), 2)
pretty_table(corr_mat,
caption = "Table 3. Pairwise correlations among observed poverty/SES measures.")
| log_income | INR | edu | holl | hardship | wealth_log | area | subj | consumption | |
|---|---|---|---|---|---|---|---|---|---|
| log_income | 1.00 | 0.84 | 0.62 | 0.74 | -0.35 | 0.19 | -0.46 | 0.50 | 0.71 |
| INR | 0.84 | 1.00 | 0.52 | 0.61 | -0.33 | 0.16 | -0.39 | 0.43 | 0.70 |
| edu | 0.62 | 0.52 | 1.00 | 0.94 | -0.22 | 0.11 | -0.28 | 0.31 | 0.44 |
| holl | 0.74 | 0.61 | 0.94 | 1.00 | -0.26 | 0.14 | -0.34 | 0.36 | 0.52 |
| hardship | -0.35 | -0.33 | -0.22 | -0.26 | 1.00 | -0.09 | 0.09 | -0.39 | -0.38 |
| wealth_log | 0.19 | 0.16 | 0.11 | 0.14 | -0.09 | 1.00 | -0.08 | 0.09 | 0.26 |
| area | -0.46 | -0.39 | -0.28 | -0.34 | 0.09 | -0.08 | 1.00 | -0.16 | -0.23 |
| subj | 0.50 | 0.43 | 0.31 | 0.36 | -0.39 | 0.09 | -0.16 | 1.00 | 0.45 |
| consumption | 0.71 | 0.70 | 0.44 | 0.52 | -0.38 | 0.26 | -0.23 | 0.45 | 1.00 |
if (have_gg) {
cm <- as.data.frame(as.table(corr_mat))
names(cm) <- c("Var1", "Var2", "r")
ggplot(cm, aes(Var1, Var2, fill = r)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", r)), size = 3) +
scale_fill_gradient2(low = "#2b8cbe", mid = "white", high = "#e34a33",
midpoint = 0, limits = c(-1, 1)) +
labs(x = NULL, y = NULL, fill = "r") +
theme_minimal(base_size = 10) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Figure 2. Correlation heatmap among observed poverty and SES measures. Correlations are moderate, not dominant – consistent with real-world data where different measures capture partially overlapping but distinct facets of family circumstance.
benchmarks <- data.frame(
`Measure pair` = c("Income x Hardship", "Income x Wealth",
"Income x Subjective SES", "Income x Area deprivation",
"Income x Education", "Hardship x Wealth",
"Income x Consumption"),
`Observed r` = c(corr_mat["log_income","hardship"],
corr_mat["log_income","wealth_log"],
corr_mat["log_income","subj"],
corr_mat["log_income","area"],
corr_mat["log_income","edu"],
corr_mat["hardship","wealth_log"],
corr_mat["log_income","consumption"]),
`Empirical target range` = c("-0.30 to -0.50", "+0.20 to +0.40",
"+0.30 to +0.50", "-0.30 to -0.50",
"+0.40 to +0.60", "-0.20 to -0.40",
"+0.60 to +0.80"),
Source = c("Mayer & Jencks (1989); Iceland (2005)",
"Keister (2014); SCF data",
"Adler et al. (2000); MacArthur studies",
"Kind et al. (2014); ADI literature",
"NLSY, PSID benchmarks",
"Ouellette et al. (2004)",
"Meyer & Sullivan (2003)"),
check.names = FALSE
)
pretty_table(benchmarks,
caption = "Table 4. Benchmark check: observed correlations vs. published empirical ranges.")
| Measure pair | Observed r | Empirical target range | Source |
|---|---|---|---|
| Income x Hardship | -0.35 | -0.30 to -0.50 | Mayer & Jencks (1989); Iceland (2005) |
| Income x Wealth | 0.19 | +0.20 to +0.40 | Keister (2014); SCF data |
| Income x Subjective SES | 0.50 | +0.30 to +0.50 | Adler et al. (2000); MacArthur studies |
| Income x Area deprivation | -0.46 | -0.30 to -0.50 | Kind et al. (2014); ADI literature |
| Income x Education | 0.62 | +0.40 to +0.60 | NLSY, PSID benchmarks |
| Hardship x Wealth | -0.09 | -0.20 to -0.40 | Ouellette et al. (2004) |
| Income x Consumption | 0.71 | +0.60 to +0.80 | Meyer & Sullivan (2003) |
single_r2 <- data.frame(
Measure = names(audit_df),
`R^2 (hippocampal volume)` = round(
sapply(audit_df, function(x) cor(x, dat$hippocampal_volume)^2), 3),
`R^2 (working memory)` = round(
sapply(audit_df, function(x) cor(x, dat$working_memory)^2), 3),
`R^2 (vocabulary)` = round(
sapply(audit_df, function(x) cor(x, dat$vocabulary)^2), 3),
check.names = FALSE
)
pretty_table(single_r2,
caption = "Table 5. Variance explained by each measure in isolation. No measure exceeds 15%, confirming no measure is a privileged oracle for the outcome.")
| Measure | R^2 (hippocampal volume) | R^2 (working memory) | R^2 (vocabulary) | |
|---|---|---|---|---|
| log_income | log_income | 0.085 | 0.079 | 0.200 |
| INR | INR | 0.065 | 0.068 | 0.162 |
| edu | edu | 0.059 | 0.063 | 0.224 |
| holl | holl | 0.069 | 0.071 | 0.230 |
| hardship | hardship | 0.093 | 0.129 | 0.225 |
| wealth_log | wealth_log | 0.005 | 0.005 | 0.008 |
| area | area | 0.011 | 0.011 | 0.026 |
| subj | subj | 0.038 | 0.035 | 0.087 |
| consumption | consumption | 0.075 | 0.074 | 0.176 |
Interpretation. The observed correlations fall within published empirical ranges for every benchmark pair, and the single-measure R² values are modest (≤ 15%). This is what we want: the simulation is not rigged to make any particular measure the “right answer.” Any divergence in substantive conclusions across measures therefore reflects genuine construct disagreement, not a methodological artifact baked into the data.
The first substantive question is whether different poverty measures identify the same families as “poor.”
prev <- data.frame(
Measure = c("Income (INR < 1)", "Low education (<12 yr)",
"Hollingshead bottom quartile", "Material hardship (≥3 items)",
"Area deprivation (≥70)", "Subjective SES (≤3)",
"Consumption bottom quartile", "Low wealth (<$5k)",
"Multidimensional (MPI ≥ 0.4)", "Supplemental (COL-adjusted)"),
`% Classified poor` = round(sapply(dat[poor_vars], mean) * 100, 1),
check.names = FALSE
)
pretty_table(prev,
caption = "Table 6. Percentage of sample classified as poor by each measure.")
| Measure | % Classified poor | |
|---|---|---|
| poor_income | Income (INR < 1) | 24.7 |
| poor_lowedu | Low education (<12 yr) | 42.8 |
| poor_holl | Hollingshead bottom quartile | 24.8 |
| poor_hardship | Material hardship (≥3 items) | 44.5 |
| poor_area | Area deprivation (≥70) | 11.4 |
| poor_subj | Subjective SES (≤3) | 10.6 |
| poor_consume | Consumption bottom quartile | 25.0 |
| poor_wealth | Low wealth (<$5k) | 46.5 |
| poor_mpi | Multidimensional (MPI ≥ 0.4) | 74.2 |
| poor_spm | Supplemental (COL-adjusted) | 24.2 |
kappa_fn <- function(a, b) {
po <- mean(a == b)
pe <- mean(a) * mean(b) + (1 - mean(a)) * (1 - mean(b))
(po - pe) / (1 - pe)
}
k_mat <- outer(poor_vars, poor_vars,
Vectorize(function(i, j) kappa_fn(dat[[i]], dat[[j]])))
k_mat <- round(k_mat, 2)
dimnames(k_mat) <- list(
c("Income","Low edu","Hollingshead","Hardship","Area","Subjective",
"Consumption","Wealth","MPI","SPM"),
c("Income","Low edu","Hollingshead","Hardship","Area","Subjective",
"Consumption","Wealth","MPI","SPM"))
pretty_table(k_mat,
caption = "Table 7. Pairwise agreement between poverty classifications (Cohen's kappa).")
| Income | Low edu | Hollingshead | Hardship | Area | Subjective | Consumption | Wealth | MPI | SPM | |
|---|---|---|---|---|---|---|---|---|---|---|
| Income | 1.00 | 0.34 | 0.43 | 0.22 | 0.20 | 0.20 | 0.50 | 0.09 | 0.15 | 0.82 |
| Low edu | 0.34 | 1.00 | 0.60 | 0.14 | 0.10 | 0.09 | 0.24 | 0.07 | 0.34 | 0.31 |
| Hollingshead | 0.43 | 0.60 | 1.00 | 0.14 | 0.16 | 0.14 | 0.30 | 0.08 | 0.18 | 0.40 |
| Hardship | 0.22 | 0.14 | 0.14 | 1.00 | 0.03 | 0.14 | 0.21 | 0.08 | 0.25 | 0.25 |
| Area | 0.20 | 0.10 | 0.16 | 0.03 | 1.00 | 0.07 | 0.10 | 0.03 | 0.08 | 0.15 |
| Subjective | 0.20 | 0.09 | 0.14 | 0.14 | 0.07 | 1.00 | 0.17 | 0.05 | 0.05 | 0.21 |
| Consumption | 0.50 | 0.24 | 0.30 | 0.21 | 0.10 | 0.17 | 1.00 | 0.13 | 0.13 | 0.58 |
| Wealth | 0.09 | 0.07 | 0.08 | 0.08 | 0.03 | 0.05 | 0.13 | 1.00 | 0.34 | 0.08 |
| MPI | 0.15 | 0.34 | 0.18 | 0.25 | 0.08 | 0.05 | 0.13 | 0.34 | 1.00 | 0.15 |
| SPM | 0.82 | 0.31 | 0.40 | 0.25 | 0.15 | 0.21 | 0.58 | 0.08 | 0.15 | 1.00 |
if (have_gg) {
km <- as.data.frame(as.table(k_mat))
names(km) <- c("Var1", "Var2", "kappa")
ggplot(km, aes(Var1, Var2, fill = kappa)) +
geom_tile(color = "white") +
geom_text(aes(label = sprintf("%.2f", kappa)), size = 3) +
scale_fill_gradient(low = "white", high = "#2b8cbe",
limits = c(0, 1)) +
labs(x = NULL, y = NULL, fill = "kappa") +
theme_minimal(base_size = 10) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
}
Figure 3. Agreement heatmap. Most pairs of poverty measures agree only modestly (kappa < 0.5), meaning they identify substantially non-overlapping subpopulations of families as ‘poor.’
xt <- table(`Income-poor` = dat$poor_income,
`Hardship-poor` = dat$poor_hardship)
xt_tbl <- as.data.frame.matrix(xt)
xt_tbl <- cbind(`Income-poor` = rownames(xt_tbl), xt_tbl)
names(xt_tbl) <- c("Income poverty", "Hardship = 0", "Hardship = 1")
pretty_table(xt_tbl,
caption = "Table 8. Cross-classification of income and hardship poverty.")
| Income poverty | Hardship = 0 | Hardship = 1 | |
|---|---|---|---|
| 0 | 0 | 1881 | 1130 |
| 1 | 1 | 341 | 648 |
Interpretation. The prevalence table shows that different measures flag different shares of the sample – from about 11% (the most restrictive) to 74% (the most inclusive). More importantly, the kappa matrix shows that pairwise agreement between measures is generally modest (most kappa values below 0.5). The income-vs-hardship cross-classification illustrates the problem concretely: 1130 families are hardship-poor but not income-poor, while 341 are income-poor but not hardship-poor. These are not interchangeable labels; studies recruiting by income are not studying the same population as studies recruiting by hardship, even in principle.
The analyses so far used the full simulated population (N = 4,000) to keep power high for audit purposes. Real neuroimaging studies in the developmental poverty literature typically have N ≈ 100-300 – a range reflected in most of the studies included in this project’s scoping review. At those realistic sample sizes, the question “does this measure detect a poverty effect?” is non-trivial and depends on both the construct the measure captures and its statistical power.
This section asks, at a realistic study size, would researchers using different poverty measures reach the same qualitative conclusion? We repeat the analysis on a random subsample mirroring typical study N, focusing on the two properties that drive narrative conclusions in the literature:
A third property – magnitude – is shown in the forest plots below but is secondary for the “same conclusion” question: studies typically report “we found a significant effect” rather than “our effect was within 15% of another study’s effect.” Sign and significance are what make it into abstracts.
# Realistic study size: matches Noble et al. 2007 (N=150), Lawson et al. 2013
# (N=283), and many other studies in the scoping review.
N_study <- 150
# To get a stable read on each measure's detection rate (not one lucky or
# unlucky draw), we run the analysis over many random subsamples of size
# N_study and record how often each measure detects a significant effect in
# the expected direction. The "single-run" effect estimates shown in the
# forest plots come from the median subsample.
set.seed(20260421)
n_reps <- 200
measure_list <- list(
list(name = "Low income (continuous)",
type = "continuous",
extract = function(d) -log(d$income),
captures = "Resources"),
list(name = "Low INR (continuous)",
type = "continuous",
extract = function(d) -d$inr,
captures = "Resources"),
list(name = "Income poverty (INR<1)",
type = "binary",
extract = function(d) d$poor_income,
captures = "Resources"),
list(name = "COL-adjusted poor (SPM)",
type = "binary",
extract = function(d) d$poor_spm,
captures = "Resources (COL-adjusted)"),
list(name = "Low consumption",
type = "binary",
extract = function(d) d$poor_consume,
captures = "Resources (spending)"),
list(name = "Low wealth",
type = "binary",
extract = function(d) d$poor_wealth,
captures = "Resources (buffer)"),
list(name = "Material hardship",
type = "binary",
extract = function(d) d$poor_hardship,
captures = "Resources + stress"),
list(name = "Low Hollingshead",
type = "binary",
extract = function(d) d$poor_holl,
captures = "Education + occupation"),
list(name = "Low parental education",
type = "binary",
extract = function(d) d$poor_lowedu,
captures = "Cognitive stimulation"),
list(name = "High area deprivation",
type = "binary",
extract = function(d) d$poor_area,
captures = "Neighborhood context"),
list(name = "Low subjective SES",
type = "binary",
extract = function(d) d$poor_subj,
captures = "Relative position / stress"),
list(name = "Multidimensional poor",
type = "binary",
extract = function(d) d$poor_mpi,
captures = "Mixed (5 dimensions)")
)
outcomes <- c("hippocampal_volume", "working_memory", "vocabulary")
outcome_labels <- c(hippocampal_volume = "Hippocampus (mm^3)",
working_memory = "Working memory",
vocabulary = "Vocabulary")
fit_one <- function(sub, y, x) {
m <- lm(sub[[y]] ~ x + sub$hh_size)
co <- coef(summary(m))
if (nrow(co) < 2) return(c(NA_real_, NA_real_, NA_real_))
co[2, c("Estimate", "Std. Error", "Pr(>|t|)")]
}
# Run subsample study
det_counts <- array(0, dim = c(length(measure_list), length(outcomes)))
rev_counts <- array(0, dim = c(length(measure_list), length(outcomes)))
all_ests <- vector("list", length(measure_list) * length(outcomes))
idx_matrix <- array(seq_len(length(measure_list) * length(outcomes)),
dim = c(length(measure_list), length(outcomes)))
for (rep in seq_len(n_reps)) {
sub_ids <- sample(nrow(dat), N_study)
sub <- dat[sub_ids, ]
for (i in seq_along(measure_list)) {
x <- measure_list[[i]]$extract(sub)
if (sd(x) == 0) next
for (j in seq_along(outcomes)) {
fit <- fit_one(sub, outcomes[j], x)
if (any(is.na(fit))) next
est <- fit[1]; p <- fit[3]
if (p < 0.05 && est < 0) det_counts[i, j] <- det_counts[i, j] + 1
if (p < 0.05 && est > 0) rev_counts[i, j] <- rev_counts[i, j] + 1
k <- idx_matrix[i, j]
all_ests[[k]] <- c(all_ests[[k]], est)
}
}
}
det_rate <- det_counts / n_reps
rev_rate <- rev_counts / n_reps
# Summary table
summary_df <- data.frame()
for (i in seq_along(measure_list)) {
for (j in seq_along(outcomes)) {
ests <- all_ests[[idx_matrix[i, j]]]
summary_df <- rbind(summary_df, data.frame(
Measure = measure_list[[i]]$name,
Captures = measure_list[[i]]$captures,
Outcome = outcomes[j],
median_est = median(ests, na.rm = TRUE),
q025 = quantile(ests, 0.025, na.rm = TRUE),
q975 = quantile(ests, 0.975, na.rm = TRUE),
det_rate = det_rate[i, j],
rev_rate = rev_rate[i, j]))
}
}
rownames(summary_df) <- NULL
The decision matrix below reports, for each measure × outcome combination, the proportion of subsamples (out of 200) in which that measure detected a significant effect in the expected direction. This is effectively the statistical power each measure has to recover the true effect at a typical developmental-study sample size.
A detection rate close to 100% means a researcher using this measure will almost always report a “significant poverty effect.” A rate near 5% means the effect is essentially invisible through that measure’s lens (at this N). Rates in between mean whether the effect is detected depends on the particular families that happen to end up in the study – a uncomfortable fact about underpowered research.
# Wide table: one row per measure, one column per outcome
dm_table <- data.frame(
Measure = vapply(measure_list, `[[`, character(1), "name"),
Captures = vapply(measure_list, `[[`, character(1), "captures"),
Hippocampus = sprintf("%d%%", round(det_rate[, 1] * 100)),
`Working memory` = sprintf("%d%%", round(det_rate[, 2] * 100)),
Vocabulary = sprintf("%d%%", round(det_rate[, 3] * 100)),
check.names = FALSE
)
pretty_table(
dm_table,
caption = paste0("Table 9. Decision matrix: detection rate for each measure ",
"x outcome combination, across ", n_reps, " random subsamples ",
"of N = ", N_study, ". Values show the percentage of ",
"subsamples in which a researcher using that measure would ",
"detect a significant effect in the expected direction (p < .05)."),
full_width = TRUE)
| Measure | Captures | Hippocampus | Working memory | Vocabulary |
|---|---|---|---|---|
| Low income (continuous) | Resources | 98% | 94% | 100% |
| Low INR (continuous) | Resources | 94% | 86% | 100% |
| Income poverty (INR<1) | Resources | 78% | 76% | 100% |
| COL-adjusted poor (SPM) | Resources (COL-adjusted) | 78% | 80% | 100% |
| Low consumption | Resources (spending) | 68% | 64% | 96% |
| Low wealth | Resources (buffer) | 13% | 14% | 18% |
| Material hardship | Resources + stress | 90% | 96% | 100% |
| Low Hollingshead | Education + occupation | 68% | 64% | 100% |
| Low parental education | Cognitive stimulation | 70% | 69% | 100% |
| High area deprivation | Neighborhood context | 16% | 10% | 22% |
| Low subjective SES | Relative position / stress | 27% | 26% | 64% |
| Multidimensional poor | Mixed (5 dimensions) | 74% | 76% | 98% |
if (have_gg) {
measure_order <- vapply(measure_list, `[[`, character(1), "name")
cap_map <- setNames(vapply(measure_list, `[[`, character(1), "captures"),
measure_order)
# Combined row label: measure name plus what it captures
row_labels <- setNames(
sprintf("%s\n(captures: %s)", measure_order, cap_map[measure_order]),
measure_order)
dmp <- expand.grid(Measure = measure_order,
Outcome = outcomes, stringsAsFactors = FALSE)
dmp$rate <- as.vector(det_rate)
dmp$label <- sprintf("%d%%", round(dmp$rate * 100))
dmp$Outcome <- factor(dmp$Outcome, levels = outcomes,
labels = outcome_labels[outcomes])
dmp$MeasureLab <- factor(row_labels[dmp$Measure],
levels = rev(row_labels[measure_order]))
ggplot(dmp, aes(Outcome, MeasureLab, fill = rate)) +
geom_tile(color = "white", linewidth = 1) +
geom_text(aes(label = label), size = 4, color = "black", fontface = "bold") +
scale_fill_gradient2(low = "#f5f5f5", mid = "#fff3bf", high = "#2b8a3e",
midpoint = 0.5, limits = c(0, 1),
labels = scales::percent_format(accuracy = 1),
name = "Detection\nrate") +
labs(x = NULL, y = NULL,
title = "Which measures detect a poverty effect?",
subtitle = paste0("Detection rate across ", n_reps,
" random subsamples of N = ", N_study)) +
theme_minimal(base_size = 10) +
theme(panel.grid = element_blank(),
legend.position = "right",
axis.text.x = element_text(face = "bold", size = 11),
axis.text.y = element_text(size = 9, lineheight = 0.85))
}
Figure 4. Decision matrix visualized. Each cell shows the percentage of N=150 subsamples in which the measure (row) detects a significant effect on the outcome (column) in the expected direction. Darker green = near-universal detection; yellow = detection is unreliable at this N; pale = effect essentially invisible through that measure’s lens. Each measure’s row label includes the construct it primarily captures (in italics), which explains the pattern of agreements and disagreements.
The decision matrix makes agreement and disagreement visible at a glance. Three patterns stand out:
Green rows: measures that consistently detect the effect. Material hardship, continuous income, continuous INR, consumption, and the multidimensional index are high-detection measures across all three outcomes. A researcher using any of these would almost always report a significant poverty effect in a typical-sized study, and the measures agree.
Pale/yellow rows: measures that detect inconsistently. Area deprivation, wealth, and subjective SES have detection rates that sometimes fall below 50%, meaning a researcher’s conclusion depends substantially on which families happen to be in their sample. Two studies using the same measure on the same construct could reach opposite conclusions by chance. This is the replicability problem that measurement choice directly contributes to.
Different patterns across outcomes. Look across rows rather than down columns: some measures detect effects on some outcomes but not others. Parental education and Hollingshead detect vocabulary effects more reliably than hippocampal effects – which makes sense, because these measures primarily capture cognitive stimulation, a pathway more directly relevant to language outcomes. Area deprivation detects vocabulary better than hippocampus, consistent with neighborhood context shaping language exposure more than brain structure directly.
The third column of the decision matrix (in italics in Figure 4) tags each measure with what it primarily captures. This is the “why” behind the agree/ disagree patterns. Table 10 unpacks each measure.
profile_tbl <- data.frame(
Measure = vapply(measure_list, `[[`, character(1), "name"),
`What it captures` = vapply(measure_list, `[[`, character(1), "captures"),
`Why it agrees or diverges` = c(
# Low income (continuous)
"Strong signal of material resources, with high statistical power because it uses the full continuous distribution. Misses chronic stress and neighborhood context. Context-blind (ignores cost of living).",
# Low INR (continuous)
"Inherits income's properties; adjusts for family size but not for local cost of living. Slightly lower power than raw income because INR compresses the distribution.",
# Income poverty (INR<1)
"Dichotomizes INR at a fixed threshold, losing information. Power drops noticeably at N = 150 relative to continuous INR.",
# SPM
"Adjusts the poverty threshold for local cost of living. Closer to real purchasing power; detects effects that raw income misses in high-COL regions.",
# Low consumption
"Direct welfare proxy (per Deaton, 2003). Closer to material lived experience than income. Captures what families actually have access to.",
# Low wealth
"Captures buffering capacity for shocks. Weakly correlated with income (r ~ 0.2-0.3) so reflects variance other measures miss; the tradeoff is lower power in smaller samples.",
# Material hardship
"Direct measure of resource inadequacy (food, rent, medical care). Captures both low resources and high stress -- a dual pathway -- which is why it detects outcomes reliably.",
# Low Hollingshead
"Composite of education and occupation prestige, weighted heavily toward education. Reflects human capital / cognitive stimulation more than current material hardship.",
# Low parental education
"Primarily captures cognitive stimulation at home. Strong for language outcomes because vocabulary depends heavily on exposure; weaker for hippocampal outcomes where stress matters more.",
# Area deprivation
"Neighborhood-level index. Reflects context income misses (segregation, disinvestment) but at a spatial scale where individual-level outcomes are noisy, lowering power.",
# Low subjective SES
"Perceived social standing relative to local peers. Tracks psychological stress and reference-group comparison, not material resources per se. Weakest for structural outcomes.",
# Multidimensional poor
"Alkire-Foster composite across five dimensions. Captures overlapping deprivations any single measure would miss. Moderate detection despite construct breadth because dichotomization costs power."),
check.names = FALSE
)
pretty_table(profile_tbl,
caption = "Table 10. What each measure captures and why it agrees or disagrees with others.",
full_width = TRUE)
| Measure | What it captures | Why it agrees or diverges |
|---|---|---|
| Low income (continuous) | Resources | Strong signal of material resources, with high statistical power because it uses the full continuous distribution. Misses chronic stress and neighborhood context. Context-blind (ignores cost of living). |
| Low INR (continuous) | Resources | Inherits income’s properties; adjusts for family size but not for local cost of living. Slightly lower power than raw income because INR compresses the distribution. |
| Income poverty (INR<1) | Resources | Dichotomizes INR at a fixed threshold, losing information. Power drops noticeably at N = 150 relative to continuous INR. |
| COL-adjusted poor (SPM) | Resources (COL-adjusted) | Adjusts the poverty threshold for local cost of living. Closer to real purchasing power; detects effects that raw income misses in high-COL regions. |
| Low consumption | Resources (spending) | Direct welfare proxy (per Deaton, 2003). Closer to material lived experience than income. Captures what families actually have access to. |
| Low wealth | Resources (buffer) | Captures buffering capacity for shocks. Weakly correlated with income (r ~ 0.2-0.3) so reflects variance other measures miss; the tradeoff is lower power in smaller samples. |
| Material hardship | Resources + stress | Direct measure of resource inadequacy (food, rent, medical care). Captures both low resources and high stress – a dual pathway – which is why it detects outcomes reliably. |
| Low Hollingshead | Education + occupation | Composite of education and occupation prestige, weighted heavily toward education. Reflects human capital / cognitive stimulation more than current material hardship. |
| Low parental education | Cognitive stimulation | Primarily captures cognitive stimulation at home. Strong for language outcomes because vocabulary depends heavily on exposure; weaker for hippocampal outcomes where stress matters more. |
| High area deprivation | Neighborhood context | Neighborhood-level index. Reflects context income misses (segregation, disinvestment) but at a spatial scale where individual-level outcomes are noisy, lowering power. |
| Low subjective SES | Relative position / stress | Perceived social standing relative to local peers. Tracks psychological stress and reference-group comparison, not material resources per se. Weakest for structural outcomes. |
| Multidimensional poor | Mixed (5 dimensions) | Alkire-Foster composite across five dimensions. Captures overlapping deprivations any single measure would miss. Moderate detection despite construct breadth because dichotomization costs power. |
The decision matrix shows whether each measure detects an effect. The annotated forest plot below shows how large the estimated effect is across subsamples, so readers can see when measures agree on direction but disagree on magnitude.
All effects are oriented the same way: negative values mean more disadvantage predicts worse outcomes. Continuous income and INR have been sign-flipped (to “low income” and “low INR”) so every measure’s coefficient reads the same way. Right-side annotations tag each measure with the construct it captures.
if (have_gg) {
fp <- summary_df
fp$Outcome <- factor(fp$Outcome, levels = outcomes,
labels = outcome_labels[outcomes])
measure_order <- vapply(measure_list, `[[`, character(1), "name")
cap_map <- setNames(vapply(measure_list, `[[`, character(1), "captures"),
measure_order)
row_labels <- setNames(
sprintf("%s\n(captures: %s)", measure_order, cap_map[measure_order]),
measure_order)
fp$MeasureLab <- factor(row_labels[fp$Measure],
levels = rev(row_labels[measure_order]))
fp$status <- cut(fp$det_rate, c(-.01, .25, .75, 1.01),
labels = c("Rarely detected","Sometimes detected","Reliably detected"))
ggplot(fp, aes(median_est, MeasureLab, color = status)) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
geom_errorbarh(aes(xmin = q025, xmax = q975),
height = 0.3, linewidth = 0.7) +
geom_point(size = 2.8) +
scale_color_manual(values = c("Reliably detected" = "#2b8a3e",
"Sometimes detected" = "#d9a800",
"Rarely detected" = "#868e96"),
name = "Detection across subsamples") +
facet_wrap(~ Outcome, scales = "free_x") +
labs(x = "Estimated effect of MORE DISADVANTAGE (median across subsamples)",
y = NULL,
title = "Do measures reach the same conclusion?",
subtitle = paste0("Each row = one measure; error bars = range across ",
n_reps, " subsamples of N = ", N_study,
". Negative effect = expected direction.")) +
theme_minimal(base_size = 9) +
theme(legend.position = "bottom",
strip.text = element_text(face = "bold"),
panel.spacing.x = unit(1.2, "lines"),
axis.text.y = element_text(size = 8, lineheight = 0.85))
}
Figure 5. Forest plot of estimated effects across all three outcomes. Each point is the median coefficient across 200 random subsamples of N = 150; error bars show the 2.5th-97.5th percentile range (i.e., how much a researcher’s estimate might vary by luck of sampling). All effects are oriented so that negative values mean ‘more disadvantage -> worse outcome.’ Green = high detection rate (>=75% of subsamples); yellow = mid detection rate (25-75%); grey = low detection rate (<25%). Row labels include the construct each measure primarily captures.
Putting the decision matrix, the forest plot, and the capture profile together, the answer to “do different measures reach the same conclusion?” is sometimes.
Measures that capture overlapping constructs (income, INR, consumption, SPM, hardship) tend to agree both on direction and on detection, because they are proxies for related pieces of family circumstance. Measures that capture narrower or tangential constructs (area deprivation, wealth, subjective SES) agree on direction but often fail to detect at realistic N. Measures that capture different primary constructs (parental education for stimulation vs. hardship for stress) produce different effect-size rankings across outcomes, because the outcomes themselves depend on different underlying pathways.
The operational implication for reading the developmental poverty-neuroscience literature is that two studies both reporting “a significant poverty effect” may be detecting different things through different measurement windows – and two studies failing to find an effect may simply be using measures with insufficient power for typical sample sizes. Neither situation is visible from the summary-level claims that make it into abstracts and reviews.