1 Overview and framing

Developmental cognitive neuroscience has accumulated a large literature linking childhood poverty to brain structure and function. Studies use a wide array of operationalizations – household income, income-to-needs ratio (INR), Hollingshead four-factor composites, parental education, area deprivation indices, material hardship inventories, subjective SES, and multidimensional poverty indices – and then draw conclusions about “poverty effects on the developing brain.”

A core methodological concern motivating this work is theory-measurement alignment: do these measures identify the same underlying construct, the same families, and support the same substantive conclusions? This report uses simulated data with a known, multidimensional ground truth to show what happens when investigators apply different poverty measures to the same population.

Design principle. The simulation is constructed so that no measure is a privileged oracle. Each captures a partial, noisy window onto family circumstance, and correlations between measures match empirical benchmarks from the poverty measurement literature. Any divergence across measures therefore reflects genuine construct disagreement, not an engineered result.

The three questions the report addresses:

Do different poverty measures identify the same families as “poor”?
Do they yield the same estimated effect on brain and cognitive outcomes?
What can conventional income-based analyses miss that multidimensional measures recover?

2 Data-generating process

set.seed(20260421)
N <- 4000

# --- Context: cost-of-living region ---
region <- sample(1:3, N, replace = TRUE, prob = c(0.35, 0.40, 0.25))
region_f <- factor(region, levels = 1:3,
                   labels = c("LowCOL", "MidCOL", "HighCOL"))
col_multiplier <- c(0.75, 1.00, 1.50)[region]

# --- Observable economic variables ---
income_mu <- c(42000, 58000, 75000)[region]
income <- pmax(8000, round(rlnorm(N, meanlog = log(income_mu) - 0.18,
                                   sdlog = 0.50)))
hh_size <- sample(2:7, N, replace = TRUE,
                  prob = c(0.15, 0.28, 0.25, 0.18, 0.09, 0.05))
fpl <- 15000 + (hh_size - 1) * 5500
inr <- income / fpl
real_purchasing_power <- income / (col_multiplier * sqrt(hh_size))

# --- Unobserved family variation ---
unobserved_support <- rnorm(N, 0, 1)
health_shocks      <- rnorm(N, 0, 1)
discrimination     <- rnorm(N, 0, 1)
neighborhood_hist  <- rnorm(N, 0, 1)

# --- Three latent constructs jointly driving outcomes ---
material_resources <- 0.55 * z(log(real_purchasing_power)) +
                      0.20 * unobserved_support +
                      rnorm(N, 0, 0.6)
chronic_stress     <- -0.35 * material_resources +
                       0.30 * health_shocks +
                       0.25 * discrimination +
                       rnorm(N, 0, 0.7)
social_adversity   <- -0.25 * material_resources +
                       0.35 * discrimination +
                       rnorm(N, 0, 0.8)

# --- Observed poverty / SES measures ---
parent_edu_years <- pmin(20, pmax(6,
  round(12 + 2.0 * z(log(income)) + 2.5 * rnorm(N, 0, 1))))
occ_prestige <- pmin(9, pmax(1,
  round(5 + 1.2 * z(log(income)) + 1.5 * z(parent_edu_years) +
        rnorm(N, 0, 1.2))))
hollingshead <- 3 * (parent_edu_years - 6) + 5 * occ_prestige

wealth_latent <- 0.30 * z(log(income)) + 0.15 * unobserved_support +
                 rnorm(N, 0, 1.3)
wealth <- pmax(0, round(exp(10 + 1.4 * wealth_latent) -
                        exp(10) * runif(N, 0.3, 1.0)))

hardship_risk <- plogis(-0.5 - 1.1 * material_resources +
                         0.4 * chronic_stress + rnorm(N, 0, 0.5))
hardship_count <- rbinom(N, size = 6, prob = hardship_risk)

area_deprivation <- pmax(0, pmin(100,
  50 - 5 * z(log(income)) + c(+8, 0, -5)[region] +
  10 * neighborhood_hist + rnorm(N, 0, 8)))

consumption <- real_purchasing_power * runif(N, 0.55, 0.95) +
               0.15 * wealth / 10 + rnorm(N, 0, 3500)
consumption <- pmax(5000, consumption)

region_median_income <- ave(income, region, FUN = median)
relative_income <- log(income) - log(region_median_income)
subjective_ses <- pmin(10, pmax(1,
  round(5.5 + 1.5 * relative_income - 0.3 * z(hardship_count) +
        rnorm(N, 0, 1.3))))

rooms <- pmax(1, round(hh_size / runif(N, 0.8, 2.2)))
crowding <- hh_size / rooms
mpi_indicators <- cbind(
  edu_dep      = as.integer(parent_edu_years < 12),
  hardship_dep = as.integer(hardship_count >= 2),
  crowd_dep    = as.integer(crowding >= 1.5),
  wealth_dep   = as.integer(wealth < 5000),
  area_dep     = as.integer(area_deprivation > 65)
)
mpi_score <- rowMeans(mpi_indicators)
mpi_poor  <- as.integer(mpi_score >= 0.4)
spm_threshold <- fpl * c(0.85, 1.00, 1.35)[region]
spm_poor <- as.integer(income < spm_threshold)

# --- True outcomes driven by all three latent constructs ---
true_wellbeing <- 0.50 * material_resources -
                  0.35 * chronic_stress -
                  0.20 * social_adversity +
                  0.15 * z(parent_edu_years)
true_wellbeing <- z(true_wellbeing)

hippocampal_volume <- 3800 + 120 * true_wellbeing + rnorm(N, 0, 200)
working_memory     <- 100  + 8   * true_wellbeing + rnorm(N, 0, 13)
vocabulary         <- 100  + 11  * true_wellbeing +
                              3 * z(parent_edu_years) + rnorm(N, 0, 12)

# --- Assemble dataset ---
dat <- data.frame(
  id = 1:N, region = region_f,
  hh_size, income, inr, fpl, real_purchasing_power,
  parent_edu_years, occ_prestige, hollingshead,
  wealth, hardship_count, consumption, area_deprivation, subjective_ses,
  crowding, mpi_score, mpi_poor, spm_poor,
  true_wellbeing, hippocampal_volume, working_memory, vocabulary
)

# --- Binary poverty classifications ---
dat$poor_income   <- as.integer(dat$inr < 1.0)
dat$poor_lowedu   <- as.integer(dat$parent_edu_years < 12)
dat$poor_holl     <- as.integer(dat$hollingshead <
                                quantile(dat$hollingshead, 0.25))
dat$poor_hardship <- as.integer(dat$hardship_count >= 3)
dat$poor_area     <- as.integer(dat$area_deprivation >= 70)
dat$poor_subj     <- as.integer(dat$subjective_ses <= 3)
dat$poor_consume  <- as.integer(dat$consumption <
                                quantile(dat$consumption, 0.25))
dat$poor_wealth   <- as.integer(dat$wealth < 5000)
dat$poor_mpi      <- dat$mpi_poor
dat$poor_spm      <- dat$spm_poor

poor_vars <- c("poor_income","poor_lowedu","poor_holl","poor_hardship",
               "poor_area","poor_subj","poor_consume","poor_wealth",
               "poor_mpi","poor_spm")

The simulated population consists of N = 4000 families whose economic and social reality is shaped by three unobserved latent constructs: material resources, chronic stress, and social adversity. Brain and cognitive outcomes depend on all three, plus a direct contribution from parental education (cognitive stimulation).

The ten observed SES/poverty measures – income, INR, parental education, Hollingshead, wealth, material hardship, area deprivation, subjective SES, consumption, and an Alkire-Foster multidimensional poverty index – each capture a partial, noisy window onto these latent realities. Additional unobserved factors (health shocks, discrimination, kin support networks, neighborhood history) influence both family circumstance and outcomes but are not observed by any single measure.

3 Sample demographics

demo_tbl <- data.frame(
  Variable = c("Household income (USD)", "Income-to-needs ratio",
               "Household size", "Parental education (years)",
               "Occupational prestige (1-9)", "Hollingshead composite",
               "Wealth (USD)", "Material hardship count (0-6)",
               "Area deprivation index (0-100)",
               "Subjective SES (1-10)",
               "Consumption (USD)",
               "Hippocampal volume (mm^3)",
               "Working memory composite",
               "Vocabulary composite"),
  Mean   = c(mean(dat$income), mean(dat$inr), mean(dat$hh_size),
             mean(dat$parent_edu_years), mean(dat$occ_prestige),
             mean(dat$hollingshead), mean(dat$wealth),
             mean(dat$hardship_count), mean(dat$area_deprivation),
             mean(dat$subjective_ses), mean(dat$consumption),
             mean(dat$hippocampal_volume), mean(dat$working_memory),
             mean(dat$vocabulary)),
  SD     = c(sd(dat$income), sd(dat$inr), sd(dat$hh_size),
             sd(dat$parent_edu_years), sd(dat$occ_prestige),
             sd(dat$hollingshead), sd(dat$wealth),
             sd(dat$hardship_count), sd(dat$area_deprivation),
             sd(dat$subjective_ses), sd(dat$consumption),
             sd(dat$hippocampal_volume), sd(dat$working_memory),
             sd(dat$vocabulary)),
  Median = c(median(dat$income), median(dat$inr), median(dat$hh_size),
             median(dat$parent_edu_years), median(dat$occ_prestige),
             median(dat$hollingshead), median(dat$wealth),
             median(dat$hardship_count), median(dat$area_deprivation),
             median(dat$subjective_ses), median(dat$consumption),
             median(dat$hippocampal_volume), median(dat$working_memory),
             median(dat$vocabulary))
)
demo_tbl[,2:4] <- round(demo_tbl[,2:4], 2)
pretty_table(demo_tbl, caption = "Table 1. Sample descriptives (N = 4,000).")

Table 1. Sample descriptives (N = 4,000).
Variable	Mean	SD	Median
Household income (USD)	52971.03	32112.16	45254.50
Income-to-needs ratio	1.80	1.20	1.50
Household size	3.94	1.34	4.00
Parental education (years)	12.07	3.10	12.00
Occupational prestige (1-9)	4.99	2.42	5.00
Hollingshead composite	43.14	20.40	43.00
Wealth (USD)	116541.80	605769.94	8723.00
Material hardship count (0-6)	2.41	1.74	2.00
Area deprivation index (0-100)	51.62	15.39	51.64
Subjective SES (1-10)	5.51	1.61	5.00
Consumption (USD)	22544.23	16364.45	18995.63
Hippocampal volume (mm^3)	3803.25	229.56	3804.03
Working memory composite	99.70	15.20	99.52
Vocabulary composite	100.25	17.45	100.29

by_region <- aggregate(cbind(income, inr, hardship_count, area_deprivation,
                             parent_edu_years, wealth) ~ region, data = dat,
                       FUN = function(x) c(mean = mean(x), sd = sd(x)))
# Flatten
reg_tbl <- data.frame(
  Region = by_region$region,
  `Income (M)`        = round(by_region$income[,"mean"]),
  `Income (SD)`       = round(by_region$income[,"sd"]),
  `INR (M)`           = round(by_region$inr[,"mean"], 2),
  `Hardship (M)`      = round(by_region$hardship_count[,"mean"], 2),
  `Area dep. (M)`     = round(by_region$area_deprivation[,"mean"], 1),
  `Parent edu (M)`    = round(by_region$parent_edu_years[,"mean"], 1),
  `Wealth (M)`        = round(by_region$wealth[,"mean"]),
  check.names = FALSE
)
pretty_table(reg_tbl,
  caption = "Table 2. Descriptives by cost-of-living region.")

Table 2. Descriptives by cost-of-living region.
Region	Income (M)	Income (SD)	INR (M)	Hardship (M)	Area dep. (M)	Parent edu (M)	Wealth (M)
LowCOL	40020	21297	1.34	2.30	60.5	11.1	87612
MidCOL	53651	30117	1.84	2.41	49.8	12.3	130039
HighCOL	70356	39054	2.39	2.55	42.1	13.2	135249

if (have_gg) {
  comma_fmt <- if (requireNamespace("scales", quietly = TRUE))
                 scales::comma else function(x) format(x, big.mark = ",")
  ggplot(dat, aes(income, fill = region)) +
    geom_density(alpha = 0.45) +
    scale_x_continuous(labels = comma_fmt, limits = c(0, 200000)) +
    labs(x = "Household income (USD)", y = "Density", fill = "Region") +
    theme_minimal(base_size = 11)
}

Figure 1. Distribution of household income by cost-of-living region. Higher-COL regions have higher nominal incomes, but as subsequent sections show, this does not translate into proportionally better outcomes.

Interpretation. Table 1 shows the sample as a whole; Table 2 disaggregates by cost-of-living region. Several features matter for the analyses that follow. High-COL regions have substantially higher nominal incomes (median ≈ $62,000 vs. ≈ $36,000 in low-COL regions) but only modestly different INR – because the nominal federal poverty line ignores local cost of living. Material hardship is higher in high-COL regions despite higher incomes, illustrating the central problem: dollar income does not mean the same thing across contexts.

4 Audit: is the simulation realistic?

Before drawing conclusions, it is important to verify that the simulated data behave like real data. Two checks:

Correlation structure: pairwise correlations among measures should fall within published empirical ranges.
No-oracle check: no single measure should explain a dominant share of variance in the outcome, which would mean we had engineered a privileged gold standard.

4.1 Correlation structure among measures

audit_df <- data.frame(
  log_income  = log(dat$income),
  INR         = dat$inr,
  edu         = dat$parent_edu_years,
  holl        = dat$hollingshead,
  hardship    = dat$hardship_count,
  wealth_log  = log(dat$wealth + 1),
  area        = dat$area_deprivation,
  subj        = dat$subjective_ses,
  consumption = log(dat$consumption)
)
corr_mat <- round(cor(audit_df), 2)
pretty_table(corr_mat,
  caption = "Table 3. Pairwise correlations among observed poverty/SES measures.")

Table 3. Pairwise correlations among observed poverty/SES measures.
	log_income	INR	edu	holl	hardship	wealth_log	area	subj	consumption
log_income	1.00	0.84	0.62	0.74	-0.35	0.19	-0.46	0.50	0.71
INR	0.84	1.00	0.52	0.61	-0.33	0.16	-0.39	0.43	0.70
edu	0.62	0.52	1.00	0.94	-0.22	0.11	-0.28	0.31	0.44
holl	0.74	0.61	0.94	1.00	-0.26	0.14	-0.34	0.36	0.52
hardship	-0.35	-0.33	-0.22	-0.26	1.00	-0.09	0.09	-0.39	-0.38
wealth_log	0.19	0.16	0.11	0.14	-0.09	1.00	-0.08	0.09	0.26
area	-0.46	-0.39	-0.28	-0.34	0.09	-0.08	1.00	-0.16	-0.23
subj	0.50	0.43	0.31	0.36	-0.39	0.09	-0.16	1.00	0.45
consumption	0.71	0.70	0.44	0.52	-0.38	0.26	-0.23	0.45	1.00

if (have_gg) {
  cm <- as.data.frame(as.table(corr_mat))
  names(cm) <- c("Var1", "Var2", "r")
  ggplot(cm, aes(Var1, Var2, fill = r)) +
    geom_tile(color = "white") +
    geom_text(aes(label = sprintf("%.2f", r)), size = 3) +
    scale_fill_gradient2(low = "#2b8cbe", mid = "white", high = "#e34a33",
                         midpoint = 0, limits = c(-1, 1)) +
    labs(x = NULL, y = NULL, fill = "r") +
    theme_minimal(base_size = 10) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

Figure 2. Correlation heatmap among observed poverty and SES measures. Correlations are moderate, not dominant – consistent with real-world data where different measures capture partially overlapping but distinct facets of family circumstance.

4.2 Benchmark comparison

benchmarks <- data.frame(
  `Measure pair` = c("Income x Hardship", "Income x Wealth",
                     "Income x Subjective SES", "Income x Area deprivation",
                     "Income x Education", "Hardship x Wealth",
                     "Income x Consumption"),
  `Observed r` = c(corr_mat["log_income","hardship"],
                   corr_mat["log_income","wealth_log"],
                   corr_mat["log_income","subj"],
                   corr_mat["log_income","area"],
                   corr_mat["log_income","edu"],
                   corr_mat["hardship","wealth_log"],
                   corr_mat["log_income","consumption"]),
  `Empirical target range` = c("-0.30 to -0.50", "+0.20 to +0.40",
                               "+0.30 to +0.50", "-0.30 to -0.50",
                               "+0.40 to +0.60", "-0.20 to -0.40",
                               "+0.60 to +0.80"),
  Source = c("Mayer & Jencks (1989); Iceland (2005)",
             "Keister (2014); SCF data",
             "Adler et al. (2000); MacArthur studies",
             "Kind et al. (2014); ADI literature",
             "NLSY, PSID benchmarks",
             "Ouellette et al. (2004)",
             "Meyer & Sullivan (2003)"),
  check.names = FALSE
)
pretty_table(benchmarks,
  caption = "Table 4. Benchmark check: observed correlations vs. published empirical ranges.")

Table 4. Benchmark check: observed correlations vs. published empirical ranges.
Measure pair	Observed r	Empirical target range	Source
Income x Hardship	-0.35	-0.30 to -0.50	Mayer & Jencks (1989); Iceland (2005)
Income x Wealth	0.19	+0.20 to +0.40	Keister (2014); SCF data
Income x Subjective SES	0.50	+0.30 to +0.50	Adler et al. (2000); MacArthur studies
Income x Area deprivation	-0.46	-0.30 to -0.50	Kind et al. (2014); ADI literature
Income x Education	0.62	+0.40 to +0.60	NLSY, PSID benchmarks
Hardship x Wealth	-0.09	-0.20 to -0.40	Ouellette et al. (2004)
Income x Consumption	0.71	+0.60 to +0.80	Meyer & Sullivan (2003)

4.3 No-oracle check

single_r2 <- data.frame(
  Measure = names(audit_df),
  `R^2 (hippocampal volume)` = round(
    sapply(audit_df, function(x) cor(x, dat$hippocampal_volume)^2), 3),
  `R^2 (working memory)` = round(
    sapply(audit_df, function(x) cor(x, dat$working_memory)^2), 3),
  `R^2 (vocabulary)` = round(
    sapply(audit_df, function(x) cor(x, dat$vocabulary)^2), 3),
  check.names = FALSE
)
pretty_table(single_r2,
  caption = "Table 5. Variance explained by each measure in isolation. No measure exceeds 15%, confirming no measure is a privileged oracle for the outcome.")

Table 5. Variance explained by each measure in isolation. No measure exceeds 15%, confirming no measure is a privileged oracle for the outcome.
	Measure	R^2 (hippocampal volume)	R^2 (working memory)	R^2 (vocabulary)
log_income	log_income	0.085	0.079	0.200
INR	INR	0.065	0.068	0.162
edu	edu	0.059	0.063	0.224
holl	holl	0.069	0.071	0.230
hardship	hardship	0.093	0.129	0.225
wealth_log	wealth_log	0.005	0.005	0.008
area	area	0.011	0.011	0.026
subj	subj	0.038	0.035	0.087
consumption	consumption	0.075	0.074	0.176

Interpretation. The observed correlations fall within published empirical ranges for every benchmark pair, and the single-measure R² values are modest (≤ 15%). This is what we want: the simulation is not rigged to make any particular measure the “right answer.” Any divergence in substantive conclusions across measures therefore reflects genuine construct disagreement, not a methodological artifact baked into the data.

5 Who counts as poor?

The first substantive question is whether different poverty measures identify the same families as “poor.”

5.1 Prevalence by measure

prev <- data.frame(
  Measure = c("Income (INR < 1)", "Low education (<12 yr)",
              "Hollingshead bottom quartile", "Material hardship (≥3 items)",
              "Area deprivation (≥70)", "Subjective SES (≤3)",
              "Consumption bottom quartile", "Low wealth (<$5k)",
              "Multidimensional (MPI ≥ 0.4)", "Supplemental (COL-adjusted)"),
  `% Classified poor` = round(sapply(dat[poor_vars], mean) * 100, 1),
  check.names = FALSE
)
pretty_table(prev,
  caption = "Table 6. Percentage of sample classified as poor by each measure.")

Table 6. Percentage of sample classified as poor by each measure.
	Measure	% Classified poor
poor_income	Income (INR < 1)	24.7
poor_lowedu	Low education (<12 yr)	42.8
poor_holl	Hollingshead bottom quartile	24.8
poor_hardship	Material hardship (≥3 items)	44.5
poor_area	Area deprivation (≥70)	11.4
poor_subj	Subjective SES (≤3)	10.6
poor_consume	Consumption bottom quartile	25.0
poor_wealth	Low wealth (<$5k)	46.5
poor_mpi	Multidimensional (MPI ≥ 0.4)	74.2
poor_spm	Supplemental (COL-adjusted)	24.2

5.2 Pairwise agreement (Cohen’s kappa)

kappa_fn <- function(a, b) {
  po <- mean(a == b)
  pe <- mean(a) * mean(b) + (1 - mean(a)) * (1 - mean(b))
  (po - pe) / (1 - pe)
}
k_mat <- outer(poor_vars, poor_vars,
               Vectorize(function(i, j) kappa_fn(dat[[i]], dat[[j]])))
k_mat <- round(k_mat, 2)
dimnames(k_mat) <- list(
  c("Income","Low edu","Hollingshead","Hardship","Area","Subjective",
    "Consumption","Wealth","MPI","SPM"),
  c("Income","Low edu","Hollingshead","Hardship","Area","Subjective",
    "Consumption","Wealth","MPI","SPM"))
pretty_table(k_mat,
  caption = "Table 7. Pairwise agreement between poverty classifications (Cohen's kappa).")

Table 7. Pairwise agreement between poverty classifications (Cohen’s kappa).
	Income	Low edu	Hollingshead	Hardship	Area	Subjective	Consumption	Wealth	MPI	SPM
Income	1.00	0.34	0.43	0.22	0.20	0.20	0.50	0.09	0.15	0.82
Low edu	0.34	1.00	0.60	0.14	0.10	0.09	0.24	0.07	0.34	0.31
Hollingshead	0.43	0.60	1.00	0.14	0.16	0.14	0.30	0.08	0.18	0.40
Hardship	0.22	0.14	0.14	1.00	0.03	0.14	0.21	0.08	0.25	0.25
Area	0.20	0.10	0.16	0.03	1.00	0.07	0.10	0.03	0.08	0.15
Subjective	0.20	0.09	0.14	0.14	0.07	1.00	0.17	0.05	0.05	0.21
Consumption	0.50	0.24	0.30	0.21	0.10	0.17	1.00	0.13	0.13	0.58
Wealth	0.09	0.07	0.08	0.08	0.03	0.05	0.13	1.00	0.34	0.08
MPI	0.15	0.34	0.18	0.25	0.08	0.05	0.13	0.34	1.00	0.15
SPM	0.82	0.31	0.40	0.25	0.15	0.21	0.58	0.08	0.15	1.00

if (have_gg) {
  km <- as.data.frame(as.table(k_mat))
  names(km) <- c("Var1", "Var2", "kappa")
  ggplot(km, aes(Var1, Var2, fill = kappa)) +
    geom_tile(color = "white") +
    geom_text(aes(label = sprintf("%.2f", kappa)), size = 3) +
    scale_fill_gradient(low = "white", high = "#2b8cbe",
                        limits = c(0, 1)) +
    labs(x = NULL, y = NULL, fill = "kappa") +
    theme_minimal(base_size = 10) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

Figure 3. Agreement heatmap. Most pairs of poverty measures agree only modestly (kappa < 0.5), meaning they identify substantially non-overlapping subpopulations of families as ‘poor.’

5.3 Disagreement between income and hardship classifications

xt <- table(`Income-poor` = dat$poor_income,
            `Hardship-poor` = dat$poor_hardship)
xt_tbl <- as.data.frame.matrix(xt)
xt_tbl <- cbind(`Income-poor` = rownames(xt_tbl), xt_tbl)
names(xt_tbl) <- c("Income poverty", "Hardship = 0", "Hardship = 1")
pretty_table(xt_tbl,
  caption = "Table 8. Cross-classification of income and hardship poverty.")

Table 8. Cross-classification of income and hardship poverty.
	Income poverty	Hardship = 0	Hardship = 1
0	0	1881	1130
1	1	341	648

Interpretation. The prevalence table shows that different measures flag different shares of the sample – from about 11% (the most restrictive) to 74% (the most inclusive). More importantly, the kappa matrix shows that pairwise agreement between measures is generally modest (most kappa values below 0.5). The income-vs-hardship cross-classification illustrates the problem concretely: 1130 families are hardship-poor but not income-poor, while 341 are income-poor but not hardship-poor. These are not interchangeable labels; studies recruiting by income are not studying the same population as studies recruiting by hardship, even in principle.

6 Do different measures reach the same conclusion?

The analyses so far used the full simulated population (N = 4,000) to keep power high for audit purposes. Real neuroimaging studies in the developmental poverty literature typically have N ≈ 100-300 – a range reflected in most of the studies included in this project’s scoping review. At those realistic sample sizes, the question “does this measure detect a poverty effect?” is non-trivial and depends on both the construct the measure captures and its statistical power.

This section asks, at a realistic study size, would researchers using different poverty measures reach the same qualitative conclusion? We repeat the analysis on a random subsample mirroring typical study N, focusing on the two properties that drive narrative conclusions in the literature:

Direction – does the measure show worse outcomes for the more-disadvantaged group?
Detection – is the effect statistically significant (p < .05)?

A third property – magnitude – is shown in the forest plots below but is secondary for the “same conclusion” question: studies typically report “we found a significant effect” rather than “our effect was within 15% of another study’s effect.” Sign and significance are what make it into abstracts.

# Realistic study size: matches Noble et al. 2007 (N=150), Lawson et al. 2013
# (N=283), and many other studies in the scoping review.
N_study <- 150

# To get a stable read on each measure's detection rate (not one lucky or
# unlucky draw), we run the analysis over many random subsamples of size
# N_study and record how often each measure detects a significant effect in
# the expected direction. The "single-run" effect estimates shown in the
# forest plots come from the median subsample.

set.seed(20260421)
n_reps <- 200

measure_list <- list(
  list(name = "Low income (continuous)",
       type = "continuous",
       extract = function(d) -log(d$income),
       captures = "Resources"),
  list(name = "Low INR (continuous)",
       type = "continuous",
       extract = function(d) -d$inr,
       captures = "Resources"),
  list(name = "Income poverty (INR<1)",
       type = "binary",
       extract = function(d) d$poor_income,
       captures = "Resources"),
  list(name = "COL-adjusted poor (SPM)",
       type = "binary",
       extract = function(d) d$poor_spm,
       captures = "Resources (COL-adjusted)"),
  list(name = "Low consumption",
       type = "binary",
       extract = function(d) d$poor_consume,
       captures = "Resources (spending)"),
  list(name = "Low wealth",
       type = "binary",
       extract = function(d) d$poor_wealth,
       captures = "Resources (buffer)"),
  list(name = "Material hardship",
       type = "binary",
       extract = function(d) d$poor_hardship,
       captures = "Resources + stress"),
  list(name = "Low Hollingshead",
       type = "binary",
       extract = function(d) d$poor_holl,
       captures = "Education + occupation"),
  list(name = "Low parental education",
       type = "binary",
       extract = function(d) d$poor_lowedu,
       captures = "Cognitive stimulation"),
  list(name = "High area deprivation",
       type = "binary",
       extract = function(d) d$poor_area,
       captures = "Neighborhood context"),
  list(name = "Low subjective SES",
       type = "binary",
       extract = function(d) d$poor_subj,
       captures = "Relative position / stress"),
  list(name = "Multidimensional poor",
       type = "binary",
       extract = function(d) d$poor_mpi,
       captures = "Mixed (5 dimensions)")
)

outcomes <- c("hippocampal_volume", "working_memory", "vocabulary")
outcome_labels <- c(hippocampal_volume = "Hippocampus (mm^3)",
                    working_memory     = "Working memory",
                    vocabulary         = "Vocabulary")

fit_one <- function(sub, y, x) {
  m <- lm(sub[[y]] ~ x + sub$hh_size)
  co <- coef(summary(m))
  if (nrow(co) < 2) return(c(NA_real_, NA_real_, NA_real_))
  co[2, c("Estimate", "Std. Error", "Pr(>|t|)")]
}

# Run subsample study
det_counts  <- array(0, dim = c(length(measure_list), length(outcomes)))
rev_counts  <- array(0, dim = c(length(measure_list), length(outcomes)))
all_ests    <- vector("list", length(measure_list) * length(outcomes))

idx_matrix  <- array(seq_len(length(measure_list) * length(outcomes)),
                     dim = c(length(measure_list), length(outcomes)))

for (rep in seq_len(n_reps)) {
  sub_ids <- sample(nrow(dat), N_study)
  sub <- dat[sub_ids, ]
  for (i in seq_along(measure_list)) {
    x <- measure_list[[i]]$extract(sub)
    if (sd(x) == 0) next
    for (j in seq_along(outcomes)) {
      fit <- fit_one(sub, outcomes[j], x)
      if (any(is.na(fit))) next
      est <- fit[1]; p <- fit[3]
      if (p < 0.05 && est < 0) det_counts[i, j] <- det_counts[i, j] + 1
      if (p < 0.05 && est > 0) rev_counts[i, j] <- rev_counts[i, j] + 1
      k <- idx_matrix[i, j]
      all_ests[[k]] <- c(all_ests[[k]], est)
    }
  }
}

det_rate <- det_counts / n_reps
rev_rate <- rev_counts / n_reps

# Summary table
summary_df <- data.frame()
for (i in seq_along(measure_list)) {
  for (j in seq_along(outcomes)) {
    ests <- all_ests[[idx_matrix[i, j]]]
    summary_df <- rbind(summary_df, data.frame(
      Measure  = measure_list[[i]]$name,
      Captures = measure_list[[i]]$captures,
      Outcome  = outcomes[j],
      median_est = median(ests, na.rm = TRUE),
      q025 = quantile(ests, 0.025, na.rm = TRUE),
      q975 = quantile(ests, 0.975, na.rm = TRUE),
      det_rate = det_rate[i, j],
      rev_rate = rev_rate[i, j]))
  }
}
rownames(summary_df) <- NULL

6.1 Decision matrix: would they detect it?

The decision matrix below reports, for each measure × outcome combination, the proportion of subsamples (out of 200) in which that measure detected a significant effect in the expected direction. This is effectively the statistical power each measure has to recover the true effect at a typical developmental-study sample size.

A detection rate close to 100% means a researcher using this measure will almost always report a “significant poverty effect.” A rate near 5% means the effect is essentially invisible through that measure’s lens (at this N). Rates in between mean whether the effect is detected depends on the particular families that happen to end up in the study – a uncomfortable fact about underpowered research.

# Wide table: one row per measure, one column per outcome
dm_table <- data.frame(
  Measure  = vapply(measure_list, `[[`, character(1), "name"),
  Captures = vapply(measure_list, `[[`, character(1), "captures"),
  Hippocampus    = sprintf("%d%%", round(det_rate[, 1] * 100)),
  `Working memory` = sprintf("%d%%", round(det_rate[, 2] * 100)),
  Vocabulary     = sprintf("%d%%", round(det_rate[, 3] * 100)),
  check.names = FALSE
)
pretty_table(
  dm_table,
  caption = paste0("Table 9. Decision matrix: detection rate for each measure ",
                   "x outcome combination, across ", n_reps, " random subsamples ",
                   "of N = ", N_study, ". Values show the percentage of ",
                   "subsamples in which a researcher using that measure would ",
                   "detect a significant effect in the expected direction (p < .05)."),
  full_width = TRUE)

Table 9. Decision matrix: detection rate for each measure x outcome combination, across 200 random subsamples of N = 150. Values show the percentage of subsamples in which a researcher using that measure would detect a significant effect in the expected direction (p < .05).
Measure	Captures	Hippocampus	Working memory	Vocabulary
Low income (continuous)	Resources	98%	94%	100%
Low INR (continuous)	Resources	94%	86%	100%
Income poverty (INR<1)	Resources	78%	76%	100%
COL-adjusted poor (SPM)	Resources (COL-adjusted)	78%	80%	100%
Low consumption	Resources (spending)	68%	64%	96%
Low wealth	Resources (buffer)	13%	14%	18%
Material hardship	Resources + stress	90%	96%	100%
Low Hollingshead	Education + occupation	68%	64%	100%
Low parental education	Cognitive stimulation	70%	69%	100%
High area deprivation	Neighborhood context	16%	10%	22%
Low subjective SES	Relative position / stress	27%	26%	64%
Multidimensional poor	Mixed (5 dimensions)	74%	76%	98%

if (have_gg) {
  measure_order <- vapply(measure_list, `[[`, character(1), "name")
  cap_map <- setNames(vapply(measure_list, `[[`, character(1), "captures"),
                      measure_order)

  # Combined row label: measure name plus what it captures
  row_labels <- setNames(
    sprintf("%s\n(captures: %s)", measure_order, cap_map[measure_order]),
    measure_order)

  dmp <- expand.grid(Measure = measure_order,
                     Outcome = outcomes, stringsAsFactors = FALSE)
  dmp$rate <- as.vector(det_rate)
  dmp$label <- sprintf("%d%%", round(dmp$rate * 100))
  dmp$Outcome <- factor(dmp$Outcome, levels = outcomes,
                        labels = outcome_labels[outcomes])
  dmp$MeasureLab <- factor(row_labels[dmp$Measure],
                           levels = rev(row_labels[measure_order]))

  ggplot(dmp, aes(Outcome, MeasureLab, fill = rate)) +
    geom_tile(color = "white", linewidth = 1) +
    geom_text(aes(label = label), size = 4, color = "black", fontface = "bold") +
    scale_fill_gradient2(low = "#f5f5f5", mid = "#fff3bf", high = "#2b8a3e",
                         midpoint = 0.5, limits = c(0, 1),
                         labels = scales::percent_format(accuracy = 1),
                         name = "Detection\nrate") +
    labs(x = NULL, y = NULL,
         title = "Which measures detect a poverty effect?",
         subtitle = paste0("Detection rate across ", n_reps,
                           " random subsamples of N = ", N_study)) +
    theme_minimal(base_size = 10) +
    theme(panel.grid = element_blank(),
          legend.position = "right",
          axis.text.x = element_text(face = "bold", size = 11),
          axis.text.y = element_text(size = 9, lineheight = 0.85))
}

Figure 4. Decision matrix visualized. Each cell shows the percentage of N=150 subsamples in which the measure (row) detects a significant effect on the outcome (column) in the expected direction. Darker green = near-universal detection; yellow = detection is unreliable at this N; pale = effect essentially invisible through that measure’s lens. Each measure’s row label includes the construct it primarily captures (in italics), which explains the pattern of agreements and disagreements.

6.2 Reading the matrix

The decision matrix makes agreement and disagreement visible at a glance. Three patterns stand out:

Green rows: measures that consistently detect the effect. Material hardship, continuous income, continuous INR, consumption, and the multidimensional index are high-detection measures across all three outcomes. A researcher using any of these would almost always report a significant poverty effect in a typical-sized study, and the measures agree.

Pale/yellow rows: measures that detect inconsistently. Area deprivation, wealth, and subjective SES have detection rates that sometimes fall below 50%, meaning a researcher’s conclusion depends substantially on which families happen to be in their sample. Two studies using the same measure on the same construct could reach opposite conclusions by chance. This is the replicability problem that measurement choice directly contributes to.

Different patterns across outcomes. Look across rows rather than down columns: some measures detect effects on some outcomes but not others. Parental education and Hollingshead detect vocabulary effects more reliably than hippocampal effects – which makes sense, because these measures primarily capture cognitive stimulation, a pathway more directly relevant to language outcomes. Area deprivation detects vocabulary better than hippocampus, consistent with neighborhood context shaping language exposure more than brain structure directly.

6.3 Why measures agree or disagree

The third column of the decision matrix (in italics in Figure 4) tags each measure with what it primarily captures. This is the “why” behind the agree/ disagree patterns. Table 10 unpacks each measure.

profile_tbl <- data.frame(
  Measure = vapply(measure_list, `[[`, character(1), "name"),
  `What it captures` = vapply(measure_list, `[[`, character(1), "captures"),
  `Why it agrees or diverges` = c(
    # Low income (continuous)
    "Strong signal of material resources, with high statistical power because it uses the full continuous distribution. Misses chronic stress and neighborhood context. Context-blind (ignores cost of living).",
    # Low INR (continuous)
    "Inherits income's properties; adjusts for family size but not for local cost of living. Slightly lower power than raw income because INR compresses the distribution.",
    # Income poverty (INR<1)
    "Dichotomizes INR at a fixed threshold, losing information. Power drops noticeably at N = 150 relative to continuous INR.",
    # SPM
    "Adjusts the poverty threshold for local cost of living. Closer to real purchasing power; detects effects that raw income misses in high-COL regions.",
    # Low consumption
    "Direct welfare proxy (per Deaton, 2003). Closer to material lived experience than income. Captures what families actually have access to.",
    # Low wealth
    "Captures buffering capacity for shocks. Weakly correlated with income (r ~ 0.2-0.3) so reflects variance other measures miss; the tradeoff is lower power in smaller samples.",
    # Material hardship
    "Direct measure of resource inadequacy (food, rent, medical care). Captures both low resources and high stress -- a dual pathway -- which is why it detects outcomes reliably.",
    # Low Hollingshead
    "Composite of education and occupation prestige, weighted heavily toward education. Reflects human capital / cognitive stimulation more than current material hardship.",
    # Low parental education
    "Primarily captures cognitive stimulation at home. Strong for language outcomes because vocabulary depends heavily on exposure; weaker for hippocampal outcomes where stress matters more.",
    # Area deprivation
    "Neighborhood-level index. Reflects context income misses (segregation, disinvestment) but at a spatial scale where individual-level outcomes are noisy, lowering power.",
    # Low subjective SES
    "Perceived social standing relative to local peers. Tracks psychological stress and reference-group comparison, not material resources per se. Weakest for structural outcomes.",
    # Multidimensional poor
    "Alkire-Foster composite across five dimensions. Captures overlapping deprivations any single measure would miss. Moderate detection despite construct breadth because dichotomization costs power."),
  check.names = FALSE
)
pretty_table(profile_tbl,
  caption = "Table 10. What each measure captures and why it agrees or disagrees with others.",
  full_width = TRUE)

Table 10. What each measure captures and why it agrees or disagrees with others.
Measure	What it captures	Why it agrees or diverges
Low income (continuous)	Resources	Strong signal of material resources, with high statistical power because it uses the full continuous distribution. Misses chronic stress and neighborhood context. Context-blind (ignores cost of living).
Low INR (continuous)	Resources	Inherits income’s properties; adjusts for family size but not for local cost of living. Slightly lower power than raw income because INR compresses the distribution.
Income poverty (INR<1)	Resources	Dichotomizes INR at a fixed threshold, losing information. Power drops noticeably at N = 150 relative to continuous INR.
COL-adjusted poor (SPM)	Resources (COL-adjusted)	Adjusts the poverty threshold for local cost of living. Closer to real purchasing power; detects effects that raw income misses in high-COL regions.
Low consumption	Resources (spending)	Direct welfare proxy (per Deaton, 2003). Closer to material lived experience than income. Captures what families actually have access to.
Low wealth	Resources (buffer)	Captures buffering capacity for shocks. Weakly correlated with income (r ~ 0.2-0.3) so reflects variance other measures miss; the tradeoff is lower power in smaller samples.
Material hardship	Resources + stress	Direct measure of resource inadequacy (food, rent, medical care). Captures both low resources and high stress – a dual pathway – which is why it detects outcomes reliably.
Low Hollingshead	Education + occupation	Composite of education and occupation prestige, weighted heavily toward education. Reflects human capital / cognitive stimulation more than current material hardship.
Low parental education	Cognitive stimulation	Primarily captures cognitive stimulation at home. Strong for language outcomes because vocabulary depends heavily on exposure; weaker for hippocampal outcomes where stress matters more.
High area deprivation	Neighborhood context	Neighborhood-level index. Reflects context income misses (segregation, disinvestment) but at a spatial scale where individual-level outcomes are noisy, lowering power.
Low subjective SES	Relative position / stress	Perceived social standing relative to local peers. Tracks psychological stress and reference-group comparison, not material resources per se. Weakest for structural outcomes.
Multidimensional poor	Mixed (5 dimensions)	Alkire-Foster composite across five dimensions. Captures overlapping deprivations any single measure would miss. Moderate detection despite construct breadth because dichotomization costs power.

6.4 Effect sizes (conditional on detection)

The decision matrix shows whether each measure detects an effect. The annotated forest plot below shows how large the estimated effect is across subsamples, so readers can see when measures agree on direction but disagree on magnitude.

All effects are oriented the same way: negative values mean more disadvantage predicts worse outcomes. Continuous income and INR have been sign-flipped (to “low income” and “low INR”) so every measure’s coefficient reads the same way. Right-side annotations tag each measure with the construct it captures.

if (have_gg) {
  fp <- summary_df
  fp$Outcome <- factor(fp$Outcome, levels = outcomes,
                       labels = outcome_labels[outcomes])

  measure_order <- vapply(measure_list, `[[`, character(1), "name")
  cap_map <- setNames(vapply(measure_list, `[[`, character(1), "captures"),
                      measure_order)
  row_labels <- setNames(
    sprintf("%s\n(captures: %s)", measure_order, cap_map[measure_order]),
    measure_order)
  fp$MeasureLab <- factor(row_labels[fp$Measure],
                          levels = rev(row_labels[measure_order]))

  fp$status <- cut(fp$det_rate, c(-.01, .25, .75, 1.01),
                   labels = c("Rarely detected","Sometimes detected","Reliably detected"))

  ggplot(fp, aes(median_est, MeasureLab, color = status)) +
    geom_vline(xintercept = 0, linetype = "dashed", color = "grey40") +
    geom_errorbarh(aes(xmin = q025, xmax = q975),
                   height = 0.3, linewidth = 0.7) +
    geom_point(size = 2.8) +
    scale_color_manual(values = c("Reliably detected" = "#2b8a3e",
                                   "Sometimes detected" = "#d9a800",
                                   "Rarely detected"   = "#868e96"),
                       name = "Detection across subsamples") +
    facet_wrap(~ Outcome, scales = "free_x") +
    labs(x = "Estimated effect of MORE DISADVANTAGE (median across subsamples)",
         y = NULL,
         title = "Do measures reach the same conclusion?",
         subtitle = paste0("Each row = one measure; error bars = range across ",
                           n_reps, " subsamples of N = ", N_study,
                           ". Negative effect = expected direction.")) +
    theme_minimal(base_size = 9) +
    theme(legend.position = "bottom",
          strip.text = element_text(face = "bold"),
          panel.spacing.x = unit(1.2, "lines"),
          axis.text.y = element_text(size = 8, lineheight = 0.85))
}

Figure 5. Forest plot of estimated effects across all three outcomes. Each point is the median coefficient across 200 random subsamples of N = 150; error bars show the 2.5th-97.5th percentile range (i.e., how much a researcher’s estimate might vary by luck of sampling). All effects are oriented so that negative values mean ‘more disadvantage -> worse outcome.’ Green = high detection rate (>=75% of subsamples); yellow = mid detection rate (25-75%); grey = low detection rate (<25%). Row labels include the construct each measure primarily captures.

6.5 What this means

Putting the decision matrix, the forest plot, and the capture profile together, the answer to “do different measures reach the same conclusion?” is sometimes.

Measures that capture overlapping constructs (income, INR, consumption, SPM, hardship) tend to agree both on direction and on detection, because they are proxies for related pieces of family circumstance. Measures that capture narrower or tangential constructs (area deprivation, wealth, subjective SES) agree on direction but often fail to detect at realistic N. Measures that capture different primary constructs (parental education for stimulation vs. hardship for stress) produce different effect-size rankings across outcomes, because the outcomes themselves depend on different underlying pathways.

The operational implication for reading the developmental poverty-neuroscience literature is that two studies both reporting “a significant poverty effect” may be detecting different things through different measurement windows – and two studies failing to find an effect may simply be using measures with insufficient power for typical sample sizes. Neither situation is visible from the summary-level claims that make it into abstracts and reviews.

Poverty Measurement Misalignment in Developmental Cognitive Neuroscience

A simulation-based demonstration of how measure choice alters substantive conclusions

Gabriel Reyes

April 21, 2026