An Objective Machine Learning Redesign of India’s Multidimensional Poverty Index

Beyond the k=1/3 Cutoff, Subjective Weights, and Binary Classifications

Author

Affiliation

Dr. A. Bonnerjee

Policymetrika

Published

July 7, 2025

Abstract

India’s Multidimensional Poverty Index (MPI), despite its widespread use and policy relevance, rests on methodologically fragile assumptions that may significantly distort poverty measurement and indicate overachievement in poverty reduction efforts. This paper highlights critical flaws in the orthodox application of the Alkire-Foster methodology in India—particularly the arbitrary k = 1/3 cutoff threshold and predetermined indicator weights—through comprehensive sensitivity analysis and machine learning alternatives. Using data from India’s National Family Health Survey (2019–21), we show that the conventional threshold lies within a zone of maximum statistical instability, where small shifts lead to dramatic changes in poverty estimates. Our unsupervised machine learning approach, using K-means clustering, identifies optimal separation points between k = 0.22 and 0.24—far below the conventional threshold—yielding MPI estimates that are 69–127% higher than official figures. In other words, the progress that India has been making with respect to multidimensional poverty, though laudable, is being overstated. These data-driven classifications bypass the need for arbitrary cutoffs, offering a more empirically grounded basis for identifying deprivation. Multi-tier models based on this approach, challenge the binary classification and reveal a nuanced spectrum of poverty, identifying a substantial “medium deprivation” population (32–38% of households) that remains invisible under binary poor/non-poor categorizations. Furthermore, conventional weighting schemes distort results in ways that undermine both accuracy and policy relevance. These findings point to a measurement crisis – millions of genuinely deprived households remain excluded from recognition and support due to a methodological orthodoxy that privileges theoretical consistency over empirical validity. The evidence calls for a methodological transition toward objective, data-driven approaches and assumptions that better align with the realities of deprivation and the demands of evidence-based policymaking.

Keywords

Multidimensional Poverty Index, India, K-means clustering, Multiple Correspondence Analysis, machine learning, poverty measurement, sensitivity analysis, Alkire-Foster method, deprivation thresholds, policy targeting

Tip

Welcome to an interactive and immersive document. All visualizations respond to your actions — hover over the figures to reveal tooltips, click and select to explore deeper insights, and watch cross-references come alive as you navigate. Dive in and explore!

1 Introduction

The multidimensional poverty index (MPI) has become a cornerstone of global poverty measurement, codified in the SDGs, and widely adopted for national monitoring. Built on a systematic axiomatic framework, it transforms how we understand and quantify deprivation (Alkire et al. 2014).

The architecture of India’s MPI is built upon 12 meticulously selected indicators spanning three critical dimensions: health, education, and standard of living. These indicators operate within an elegant weighting system where each dimension contributes precisely one-third to the overall index, creating a balanced yet purposeful measurement framework (Government of India 2023). Annex Table 3 defines the deprivations and their weights.¹

The weighting structure reveals strategic priorities: nutrition commands a substantial 1/6 weight, while child mortality and maternal health each receive 1/12, collectively ensuring health indicators capture their full one-third share. Education’s two indicators each carry 1/6 weight, demonstrating the framework’s commitment to human capital development. Remarkably, nutrition and education indicators together comprise 50% of the total weight, underscoring the methodology’s deliberate emphasis on human capital formation. In contrast, the seven standard of living indicators—encompassing sanitation, drinking water, housing quality, basic assets, electricity access, cooking fuel safety, and financial inclusion—each receive a more modest 1/21 weight. This distribution therefore creates a bias towards health and education which have fewer indicators.

The methodology then employs a dual-cutoff technique that operates with mathematical precision. The first cutoff establishes binary deprivation classifications (0/1) for each indicator, while the second applies a critical threshold of k=0.33 (1/3) to the weighted deprivation score - calculated by applying indicator weights to the binary deprivation indicators. Households exceeding this threshold are classified as MPI deprived, purportedly creating a clear demarcation between the poor and non-poor. A schematic diagram of the process is shown in Figure 5 in the Annex.

Applying this framework to India’s National Family Health Survey 2019/21 (NFHS 2019/21)² yields remarkable insights: India’s MPI stands at 6.2%—a product of a 14.4% headcount ratio and a 43.1% average deprivation score among the MPI poor. These results depict laudable improvements in the MPI compared to 2015/16. The striking variations across urban-rural divides, geographical boundaries, and household characteristics including religion, caste, disability status and female headship, as well as households with children under five, paints a vivid portrait of India’s heterogeneous poverty landscape.

Yet, while conceptually rigorous, the method rests on normative assumptions—notably its fixed weighting and arbitrary deprivation threshold—that may no longer serve today’s richer databases and computational tools. In fact, given the raw data from NFHS (2019/21), these assumptions collectively distort the empirical outcomes observed in the data, as also observed in prior methodological critiques (Roche 2013; Whelan, Nolan, and Maitre 2014). Despite recommendations to examine the data carefully prior to the selection of the methodological parameters (Alkire et al. 2015), this is rarely done and without any defensible justification.

1.1 The Paradox of Assumptions and Solutions

The typical application of the orthodox approach suffers from being “methodologically locked-in” to predetermined weights, binary grouping, and the arbitrary k=1/3 cutoff threshold. This raises some pivotal questions:

Does this threshold create two genuinely distinct, comparable groups? Does it represent the data faithfully? How was it selected? For meaningful classification to occur, we need maximum between-group differences and minimum within-group differences. In other words, the two groups must be distinct, and within any group, households have to be similar. Only then, can meaningful comparisons be made between the groups. We demonstrate that selecting the k=1/3 threshold may not correspond to the natural inflection point where data organically separates into distinct clusters with optimal differentiation.
Furthermore, the predetermined weighting scheme creates systematic distortions in indicator contributions. Despite unsafe cooking fuel affecting a larger proportion of households, its minimal weight (1/21) significantly diminishes its contribution to the overall MPI. This pattern repeats across sanitation and other standard of living indicators, potentially masking critical deprivation patterns. Is there any empirical basis behind the weighting scheme being applied?
Finally, the conventional MPI framework presumes that poverty can be cleanly split into just two groups—“poor” and “non-poor”—based on a threshold. But why only two? Is this being driven by empirical data? This binary assumption may flatten a far more complex deprivation landscape, especially in large, diverse populations like India’s. By contrast, unsupervised machine learning algorithms such as K-means allow the data itself to determine the number of meaningful groups—maximizing separation and interpretability. This unlocks the possibility of identifying intermediate deprivation segments that binary models cannot capture, offering a more nuanced and actionable poverty typology.

To transcend these limitations, we propose an objective unsupervised machine learning framework that liberates MPI computation from subjective weights and thresholds through four innovative approaches (see Annex Figure 4). Similar methods have been successfully applied in recent studies. For example, Dotter and Klasen (2017) suggested the use of latent deprivation indices to capture multidimensional poverty while Muñetón-Santa and Manrique-Ruiz (2023) used spatial machine learning models in predicting MPI for Colombia. Machine Learning clustering was also effectively used in Kumar et al. (2023) and Rahman, Chen, and Li (2021) to derive efficient clusters for MPI calculations. Furthermore, an application of unsupervised K-Means techniques as well as latent multiple cluster analyses has already been applied in the context of India to derive the under-five MPI with robust results (Bonnerjee 2025). These precedents validate the methodological trajectory that our integrated K‑means + MCA framework represents.

1.1.1 Model 1: Data-Driven Threshold Optimization

We unlock the rigid k=1/3 assumption, employing unsupervised K-means clustering to identify natural separation points in deprivation scores. This approach ensures maximum between-cluster separation and minimum within-cluster variation—a truly data-driven solution that lets the evidence speak. The number of clusters is still fixed at 2 and the deprivation score provides the signal for the K-means algorithms to sort the data into the two groups consisting of households that are multidimensional deprived and those that are not. The pipeline for this model is shown in Figure 6 in the Annex.

1.1.2 Model 2: Weight-Agnostic Latent Analysis

Moving beyond weighted linear combinations, we construct a continuous latent deprivation index using Multiple Correspondence Analysis (MCA) of all 12 binary indicators.³ MCA transforms binary indicators into continuous latent dimensions that capture underlying deprivation patterns. The predetermined weights of indicators are not used in this model. Neither is a linear combination used to construct the deprivation score. MCA achieves efficiency by dividing the households into 24 groups depending on whether they were deprived in any of the twelve indicators or not. The result is a continuous latent field that captures complex co-deprivation patterns that transcend simple additive relationships. K-means clustering is then applied to this latent space to classify households into two clusters denoting their multidimensional deprivation status. The pipeline for this model is illustrated in the Annex - Figure 7 .

1.1.3 Model 3: Multi-Tier Classification (Deprivation Score)

Recognizing that poverty exists on a continuum, we adopt a multi-cluster solution—with the optimal number of clusters are determined through a combination of elbow and silhouette methods applied to our K-means clustering algorithms. This yields three distinctly separable clusters based on their deprivation score: Low (L), Medium (M), and High (H) deprivation. The model extends the standard binary classification in Model 1 by using a data-driven approach to determine the number of deprivation groups but retains the use of the weighted deprivation score for clustering.

1.1.4 Model 4: Multi-Tier Classification (Latent Index)

The most sophisticated approach combines latent deprivation analysis with multi-tier classification, applying the K-means multi-cluster framework to the MCA-derived latent deprivation index. The optimization process (using elbow and silhouette methods) reveals three distinct comparable clusters. This generates Low (L), Medium (M) and High (H) categories based on the latent deprivation index. As in model 2, this methodology is indicator weight-agnostic but extends it by relaxing the strict binary classification and determining the number of clusters through optimization.

Table 1: Comparative Framework: Traditional vs. Machine Learning Approaches to MPI Calculation

Model Comparison: MPI Variants
Model	Methodological Assumptions			Analytical Framework
Model	Threshold Assumption	Weighting Scheme	Number of Groups	Methodological Approach	Key Innovation
Base Model: Traditional MPI	Fixed k = 1/3	Predetermined weights	Two (Poor vs. Non-poor)	Theory-driven	Established benchmark
Model 1: K-means on Deprivation Score	Data-driven optimal cutoff	Predetermined weights	Two (High, Low: Optimally separated)	Data-driven clustering	Optimal binary separation
Model 2: K-means on Latent Index	Data-driven optimal cutoff	Weight-agnostic (MCA-derived)	Two (High, Low: Otimally separated)	Latent space + clustering	Complex co-deprivation patterns
Model 3: Three-Tier Kmeans Deprivation Score	Data-driven optimal cutoffs	Predetermined weights	Three (High/Medium/Low: optimally determined)	Multi-tier + clustering	Poverty spectrum recognition
Model 4: Three-Tier Kmeans Latent Index	Data-driven optimal cutoffs	Weight-agnostic (MCA-derived)	Three (High/Medium/Low: optimally determined)	Latent space + multi-tier	Comprehensive methodological freedom

This objective machine learning framework promises to unlock new insights into India’s poverty landscape, moving toward evidence-based, data-driven poverty measurement that truly reflects the complex realities of multidimensional deprivation.

2 Comparing results: Empirical evidence versus the Orthodox MPI

The empirical evidence when different MPI models are compared is nothing short of striking. Our machine learning approach doesn’t merely tweak the traditional MPI—it challenges its fundamental assumptions and exposes deep structural fault lines in measuring multidimensional deprivation in India, and potentially elsewhere.

Table 2: Comparison of results: Traditional versus Machine Learning Approaches

Comparision of results MPI Variants
Models Used	Cut-off thresholds	MPI Calculations			Clustering Efficiency
Models Used	Cut-off thresholds	MPI H*A	Intensity A	Headcount H	Total TSS	Within WSS	Between BSS	Explained BSS/TSS
Base Model: Traditional MPI	k = 0.33 (Fixed)	6.2	43.1	14.4	14336.16	6591.81	7744.35	54.02%
Model 1: K-means on Deprivation Score	k = 0.22	10.5	35.2	29.9	14336.16	4527.93	9808.23	68.42%
Model 2: K-means on Latent Index	k = 0.24	14.1	39.5	35.6	19569.09	5425.68	14143.41	72.27%
Model 3: Three-Tier K-means on Deprivation Score	k = 0.15, k = 0.37 (Low, High)	n.a	4.8/25.1/48.0 (Low/Med/High)	53.6/37.6/8.81 (Low/Med/High)	14336.16	2293.10	12043.06	84.00%
Model 4: Three-Tier K-means on Latent Index	k = 0.15, k = 0.37 (Low, High)	n.a	5.0/25.7/48.2 (Low/Med/High)	50.0/31.6/18.4 (Low/Med/High)	19569.09	2501.85	17067.24	87.22%
Source: Author's calculations.

Table 2 presents a compelling evidence that raises serious questions about current assumptions underpinning India’s poverty landscape. The findings challenge the status quo as they reveal a significant underestimation that could have far-reaching implications for millions of households.

2.1 The Great Threshold Uprising

When we liberate the cutoff threshold from its arbitrary k=0.33 prison, the data reveals its true voice. Our unsupervised K-means (2 cluster) algorithms, trained on actual household deprivation patterns, identify optimal separation points at k=0.22 and k=0.24—dramatically lower than the conventional threshold. This is empirical evidence screaming that the traditional approach may have been systematically excluding genuinely deprived households from poverty classification.

The orthodox threshold (k = 0.33) takes a toll on clustering efficiency, blunting its ability to meaningfully separate deprivation levels. To meaningfully distinguish deprivation groups, clustering must do more than simply assign households into bins—it must create groups that are substantively different from one another while maintaining internal coherence. This is where clustering efficiency becomes critical. A high ratio of between-group to total variance (BSS/TSS) ensures that the groups represent genuine structural divisions in deprivation patterns rather than arbitrary or superficial splits. Without this, we risk creating clusters that are not so different thereby masking the very disparities we aim to uncover. The goal, then, is not just classification—it’s separation with integrity. Efficient clustering is the statistical backbone that allows our multidimensional poverty classifications to be both accurate and actionable.

The efficiency gains delivered by these machine learning models are nothing short of dramatic. While the traditional MPI explains just over half the variation in deprivation clustering (a modest 54%), the K-means approach on deprivation scores pushes this up to 68%, and the latent index model leaps even further to 72%. The K-means algorithm by design, renders the groups distinct and hence comparable. Within each group, households are relatively homogeneous but between groups there is a substantive difference.

But the real breakthrough comes with the three-tier (optimal clustering) models. These models achieves an extraordinary 84-87% explained variance—more than 1.5 times the efficiency of the orthodox MPI. This is not a marginal improvement; it’s a seismic leap in how well the three-tier classifications align with the underlying deprivation structure. The message is clear: when we let the data speak, we get sharper, smarter, and far more truthful representations of poverty.

2.2 The MPI Magnitude Shock

The consequences on the MPI are staggering. While the traditional MPI reports 6.2% multidimensional poverty, our data-driven approaches explode this estimate to 10.5% and 14.1%—increases of 69% and 127% respectively. These aren’t marginal adjustments; they represent millions of additional households whose deprivation has been rendered invisible by orthodox assumptions.

The headcount ratios tell an even more dramatic story. The traditional approach identifies just 14.4% of households as deprived, while our models capture 29.9% and 35.6%—more than doubling the recognized scope of multidimensional poverty. The K-means clustering on deprivation score suggests that as many as 1/6 households escape the orthodox MPI poor classification but are considered deprived when an objective data-driven cutoff is utilized. Similarly, the latent index clustering model shows that more than 1/5 households are in fact deprived according to the latent index, but not captured under the orthodox MPI Poor category. This reveals the traditional threshold’s fundamental failure: it creates an artificial poverty ceiling that bears no relationship to empirical deprivation patterns.

2.3 The Multi-Tier Solution: Beyond Binary Thinking

Our three-tier models disrupt the simplistic binary classification that has long constrained poverty analysis. Identifying a substantial “medium deprivation” population (32–38% of households) exposes the inadequacy of rigid poor/non-poor dichotomies. These households experience real deprivation yet are routinely overlooked—a blind spot that weakens precision in targeted interventions.

2.3.1 The Threshold Convergence Phenomenon

Remarkably, both three-tier models converge on identical cutoff points: k=0.15 for Low deprivation and k=0.37 for High deprivation. This convergence across fundamentally different measurement constructs—deprivation scores versus latent indices—provides compelling evidence that these thresholds reflect genuine structural breaks in the data rather than methodological artifacts.

2.3.2 The Hidden Poverty Spectrum

The three-tier results expose the inadequacy of binary poverty classification. The “Medium” deprivation group—comprising 31.6% to 37.6% of households—represents a massive population segment that traditional methodology either misclassifies or ignores entirely. These households face genuine deprivation (average intensity around 25-26%) but fall through the cracks of binary classification systems.

Most shocking of all: the “High” deprivation group, representing the most severely deprived households, constitutes 8.8% to 18.4% of the population with devastating average deprivation intensities approaching 48%. These households face deprivation levels that should trigger immediate policy intervention.

3 Examining the Root Faults

The empirical evidence reveals not one or two, but three critical faults at the heart of India’s MPI framework. Our systematic analysis uncovers a deeply constrained model — one that suffers from (1) extreme threshold sensitivity and clustering inefficiency, (2) weight-induced distortion, and (3) an unexamined assumption that deprivation can be cleanly split into just two categories: “poor” and “non-poor.”

This trifecta of rigid assumptions severely limits the ability of MPI to reflect the real-world complexity of deprivation. The consequence is a poverty measurement framework that is not just miscalibrated but systematically exclusionary — flattening nuanced realities into binary outcomes and blinding policy responses to entire segments of vulnerable households.

What follows is a detailed breakdown of each of these three fault lines — each one empirically demonstrated and collectively pointing to the urgent need for a data-driven, objective alternative.

3.1 The First Fault: Threshold Sensitivity Crisis

Figure 1 delivers a stunning revelation that fundamentally challenges the robustness of India’s MPI framework with respect to the selection of the cutoff threshold. Through systematic recalculation across varying deprivation cutoffs, we uncover that the methodological assumptions used in India’s MPI calculations do not account for extreme threshold sensitivity emerging from the data.

The visualization exposes a dramatic cliff-edge effect in MPI values as the deprivation cutoff (k) increases from 0 to 1. This isn’t merely statistical variation; it represents a fundamental methodological crisis where seemingly minor threshold adjustments trigger massive shifts in poverty estimates.

The graph reveals three distinct phases of MPI behavior, each with profound policy implications:

Phase 1: The Stable Plateau (k = 0.0 to 0.2) At low cut-off thresholds, the MPI declines slowly. As k increases from 0 to 0.2, MPI declines from 0.17 to 0.12.

Phase 2: The Cliff Drop Zone (k = 0.20 to 0.36) - This is the most volatile and sensitive region. A subtle increase in k beyond 0.20 triggers a dramatic inflection followed by a steep decline in MPI values — from ~0.12 down to ~0.04 by k = 0.36. The inflection point occurs just before 0.22 — the ML-based threshold — and the curve plunges further past the conventional 0.33 thresh. The most alarming revelation centers around the conventional k = 0.33 threshold. Here, the curve exhibits maximum volatility, where infinitesimal changes in the cutoff value precipitate disproportionate swings in poverty estimates.

Phase 3: The Flattening Plateau (k > 0.36) Beyond k = 0.36, the curve dramatically flattens. This plateau effect suggests that higher thresholds create artificial stability by excluding potentially deprived households.

Tip

This is an interactive figure. Hover anywhere in the figures below to get descriptive values. Especially the dots. Select/Zoom by clicking and selecting. Reset by clicking the Home icon.

Figure 1: Sensitivity of MPI to Deprivation Cut-off

3.2 The Second Fault: Weight-Induced Distribution Distortion

But threshold sensitivity is only a part of the story. Figure 2 exposes an even more insidious methodological failure: how predetermined weights systematically distort the entire deprivation distribution, creating artificial scarcity in poverty measurement. This figure compares the cumulative distributions of the orthodox deprivation index which is derived by multiplying indicator deprivations by their weights, with a weight-agnostic, non-linear latent deprivation index derived by using Multiple Correspondence Analysis (MCA) on the 12 binary indicators.

The cumulative distribution comparison delivers uncovers the second fault - the selection of weights for indicators. The blue line (latent index) consistently dominates the red line (weighted deprivation score) across the entire distribution spectrum providing empirical evidence that the orthodox weighting scheme artificially suppresses deprivation recognition at every threshold level.

Tip

This is an interactive figure. Hover anywhere in the figures below to get descriptive values. Especially the dots. Select/Zoom by clicking and selecting. Reset by clicking the Home icon.

Figure 2: Sensitivity of Headcount to Deprivation Cut-off & Weights

The Distribution Divergence: The gap between these curves represents millions of households whose deprivation is rendered invisible by theoretical weight preferences. At every possible threshold choice, the weight-agnostic latent index identifies substantially more deprived households than the predetermined weighting system. This reveals that the methodological crisis runs deeper than threshold selection—the very foundation of weighted aggregation systematically undercounts multidimensional poverty.

The Systematic Undercounting Pattern: The horizontal reference lines (dark for orthodox, orange for K-Means and blue for MCA) expose the magnitude of this distortion. Where the traditional approach might identify 14% of households as deprived at the given threshold of k=0.33 (red dot), the K-Means and Latent index reveals the true figure lies between 30% - 36% (orange and blue dots respectively). This 22-percentage-point plus gap represents approximately 60 millions households whose deprivation ‘disappears’ under orthodox methodology—not due to genuine improvements in their living conditions, but due to methodological artifacts.

The Weight Preference Distortion

Figure 3 displays how each binary indicator—both deprived (0) and non-deprived (1) states—contributes to the first dimension of the Multiple Correspondence Analysis (MCA), offering a data-driven picture of which deprivations dominate the latent structure of poverty which is not restricted by preassigned weights for indicators.

Tip

This is an interactive figure. Hover anywhere in the figures below to get descriptive values.Select/Zoom by clicking and selecting. Reset by clicking the Home icon.

Figure 3: Contributions to Latent Deprivation Index (MCA)

The evidence is unambiguous. While the orthodox assumptions assigns the lack of basic assets, inadequate housing, unsafe cooking fuel and sanitation a paltry 1/21 weight, the data screams a different truth: these emerge as the dominant contributor to the latent deprivation dimension, commanding over 40% of the total contribution. Meanwhile, nutrition, despite its privileged 1/6 theoretical weight, contributes proportionally less to the actual deprivation patterns households experience. This isn’t statistical noise—it’s empirical data clashing against theoretical assumptions.

This visualization exposes a fundamental flaw at the heart of orthodox MPI methodological assumptions: the systematic suppression of standard-of-living deprivations. The seven standard-of-living indicators, collectively assigned just 1/3 of total weight, actually drive the primary axis of variation in India’s deprivation landscape.

The mirror-image structure of the contributions—where both presence and absence of each deprivation contribute to the latent dimension—reveals something profound: MCA identifies a genuine underlying deprivation spectrum that predetermined weights obliterate. This isn’t a matter of methodological preference—it’s empirical discovery of how deprivations actually cluster and co-occur in Indian households.

This issue has profound significance when we recognize the policy implications. Every time a household suffering from unsafe cooking fuel, poor housing, and inadequate sanitation gets excluded from poverty classification because these deprivations carry artificially low weights, we witness methodology approaches with fealty towards academic tradition rather than any attempt to describe the reality of millions of households.

3.3 The Third Fault: The Binary Classification Fallacy

While much of the MPI framework hinges on identifying a “poor” population based on a single deprivation cutoff, this rests on an untested assumption — that deprivation exists as a binary state. But our empirical evidence shows that this is not how deprivation clusters in real households.

Using unsupervised machine learning — specifically, K-means clustering on both the weighted deprivation score and latent MCA dimension — we identify not two, but three naturally occurring clusters of deprivation. These groupings do not align neatly with the orthodox poor/non-poor divide.

Most striking is the emergence of a substantial “medium deprivation” population, comprising 32–38% of all households. These households are genuinely deprived across several dimensions but fall short of the k = 1/3 threshold. As a result, they are excluded from poverty recognition, targeting, and policy intervention — not because their lives are better, but because the binary model has no place to put them.

This structural limitation is not just a theoretical inconvenience. It translates into millions of misclassified households — people who are deprived but invisible to the system. The MPI, in its current form, lacks the granularity to differentiate between deep, moderate, and marginal deprivation — thereby rendering large parts of the deprivation spectrum statistically irrelevant and politically invisible.

The three-cluster model — validated across both the weighted and latent deprivation measures — restores this lost nuance and provides a more realistic and actionable poverty typology.

3.4 The Compounding of Errors: When Three Faults Merge

The true problem emerges when we recognize these aren’t separate problems—they compound each other catastrophically. The orthodox MPI as applied in India, suffers from:

Threshold arbitrariness that places the cutoff at the point of maximum instability, resulting in groupings that are least comparable and analytically fragile.
Weight-induced distortion that systematically suppresses deprivation recognition across the entire distribution
Binary grouping that makes millions of deprived households invisible.

Together, these failures create a perfect storm of inadequacy. Not only does the traditional approach choose the wrong threshold—it operates on a fundamentally distorted deprivation distribution that artificially deflates poverty estimates at every possible threshold choice.

The convergence of evidence is overwhelming: The sensitivity analysis reveals k=1/3 sits at the point of maximum instability. The CDF comparison shows systematic undercounting across all possible thresholds: the latent index consistently identifies 25-35% more deprived households than weighted approaches; the multi-cluster models suggest a substantial population of those who are not counted in the binary framework.

The triple fault lines expose why incremental MPI refinements cannot succeed. You cannot fix a methodology that suffers from both systematic distributional distortion and maximum threshold instability through minor adjustments—especially when institutional inertia actively resists necessary changes.

Our machine learning alternatives don’t just address one or the other of these failures—they solve all simultaneously: - K-means clustering identifies natural threshold points that maximize between-group differences rather than imposing arbitrary cutoffs - Multiple Correspondence Analysis constructs latent deprivation indices that reflect empirical patterns rather than theoretical weight preferences. Cluser optimization models identify the number of comparable clusters that can be meaningfully sampled from the data.

The data-driven revolution becomes not just preferable but absolutely essential. The subjective application of the orthodox approach doesn’t suffer from minor technical limitations—it exhibits fundamental fault lines that compounds across multiple dimensions simultaneously. This alternative offers genuine international comparability: measurement frameworks that adapt to each country’s empirical deprivation patterns while maintaining consistent methodological principles grounded in the original orthodox axiomatic principles of the MPI but freed from ‘convenient’ or ‘subjective’ assumptions. This approach achieves both validity and comparability—something the orthodox framework delivers neither of.

4 Conclusion: The Methodological Revolution Begins Now

The evidence presented in this paper is unambiguous and demands immediate action. India’s multidimensional poverty measurement—the foundation upon which billions of dollars in policy interventions and international comparisons rest—suffers from fundamental flaws in assumptions that systematically distort our understanding of deprivation. This is a moral imperative, not just a statistical one.

4.1 The Orthodox Assumptions Have Failed

Our sensitivity analysis delivers a telling blow to methodological complacency. The conventional k=1/3 threshold doesn’t represent careful calibration—it sits precariously at the point of maximum instability, where infinitesimal changes trigger seismic shifts in poverty estimates. This isn’t measurement; it’s methodological roulette with millions of lives hanging in the balance.

The predetermined weighting scheme compounds this failure by imposing theoretical preferences onto empirical realities. When unsafe cooking fuel affects more households than child mortality but receives half the weight (1/12 compared to 1/21), we witness methodology divorced from evidence—a system that serves academic tradition rather than poverty reduction.

4.2 The Data-Driven Methodology Succeeds

Our machine learning alternatives don’t merely offer incremental improvements—they represent a fundamental paradigm shift toward measurement frameworks that respect empirical reality. When unsupervised algorithms consistently identify optimal thresholds at k=0.22-0.24, we witness data speaking truth to methodological power.

The results are staggering: poverty estimates increase by 69-127%, revealing millions of households whose deprivation has been rendered invisible by orthodox methodology. This isn’t a statistical aberration—it’s systematic undercounting with profound policy consequences.

Our three-tier models extends the simplistic binary classification that has constrained poverty analysis for decades. The identification of a substantial “medium deprivation” population—representing 32–38% of households—exposes the inadequacy of rigid poor/non-poor dichotomies. These households face genuine deprivation but fall through the cracks of conventional measurement—a methodological blind spot that undermines targeted intervention strategies.

4.3 The Policy Imperative Is Clear

The implications transcend academic methodology. If our findings are correct—and the convergent evidence strongly suggests they are—then India’s poverty reduction achievements may be significantly overstated, while millions of genuinely deprived households remain excluded from recognition and support.

Policymakers can no longer afford to base financial interventions on measurement frameworks that prioritize theoretical elegance over empirical accuracy. The three-tier classification offers revolutionary precision: distinct deprivation segments requiring differentiated policy responses, from preventive measures for medium-deprivation households to intensive support for the severely deprived.

4.4 Removing the Final Barrier: Institutional Inertia

If the evidence is this clear, why does flawed assumptions persist? In fact, the global MPI community has long recognized that fixed thresholds and weights should be tested for sensitivity. Leading practitioners—including Alkire and colleagues—have repeatedly recommended exploring alternative k values and weighting schemes (Alkire et al. 2015). Yet in practice, these recommendations are rarely implemented. The reason lies not in technical constraints, but in a deeper institutional inertia.

Two arguments are commonly invoked to preserve the status quo—yet both collapse under scrutiny.

The Computational Convenience Argument suggests that testing alternative assumptions is too technically demanding. But with today’s open-source tools and efficient pipelines—as demonstrated in this study using large-scale NFHS data—such analyses can be performed in under an hour. The real challenge is not computational capacity, but the willingness to depart from legacy systems ((Bourguignon and Chakravarty 2003)).

The International Comparability Myth claims that fixed thresholds are necessary to compare across countries. But comparability without validity is meaningless. Forcing diverse deprivation realities into identical methodological straightjackets ensures not consistency—but consistent mismeasurement. True comparability requires that each country’s MPI reflects its own empirical deprivation patterns, not assumptions imported for convenience. Not doing so results in comparison of errors or mismeasurments ((Ravallion 2011)).

Until these institutional constraints are addressed, no amount of statistical refinement will resolve the deeper problem. This is not just a technical issue—it’s a question of political and organizational will.

4.5 The International Development Community Must Act

This ‘assumptions’ crisis extends far beyond India’s borders. If the world’s largest democracy suffers from systematic poverty measurement distortions, what confidence can we place in international comparisons, SDG monitoring, or global development finance allocation? The k=1/3 threshold infects poverty measurement worldwide, creating a global measurement scandal that demands immediate attention.

4.6 The Path Forward

The solution is neither complex nor distant—it requires embracing data-driven assumptions that already exist and perform demonstrably better than orthodox alternatives. Machine learning approaches offer objective, evidence-based measurement that adapts to empirical realities rather than forcing reality into predetermined theoretical assumptions.

We call upon:

National statistical offices to abandon arbitrary thresholds in favor of data-driven optimal cutoffs
International development organizations to reassess poverty measurement frameworks that may systematically under count deprivation
Policy researchers to embrace machine learning approaches that reveal rather than obscure poverty patterns
Development practitioners to demand measurement systems that serve intervention effectiveness rather than methodological tradition

4.7 The Revolution Begins

The age of methodologically locked-in poverty measurement must end. The evidence for change is overwhelming, the alternatives are available, and the stakes could not be higher. Every day we persist with flawed measurement frameworks, we perpetuate a system that renders millions of deprived households invisible to policy intervention.

The data has spoken. The methodology must follow. The revolution in poverty measurement begins now—not with incremental refinement of failed approaches, but with fundamental transformation toward evidence-based, objective, data-driven frameworks that serve human dignity rather than academic convenience.

The choice is stark: continue with methodological orthodoxy that systematically distorts poverty measurement, or embrace empirical revolution that reveals poverty’s true face. For the millions of households whose deprivation remains hidden by arbitrary thresholds and predetermined weights, this choice will determine whether they remain invisible or finally receive the recognition and support they desperately need. The methodological revolution has begun. The only question is whether the policy community will lead it or be swept aside by it.

5 Annex

5.1 Models used

%%{init: {'theme':'base', 'themeVariables': { 'fontSize': '14px'}}}%%
flowchart TD
    A[MODELS<br/>USED]
    
    A --> B1[Binary Models<br/>2 Groups]
    A --> B2[Multi-cluster<br>Models]
    
    B1 --> C1[Base Model<br/>Orthodox MPI]
    
    C1 --> D1[Model 1<br/>K-Means]
    C1 --> D2[Model 2<br/>MCA + K-Means]
    
    B2 --> C2[Model 3<br/>K-Means]
    B2 --> C3[Model 4<br/>MCA + K-Means]
    
    classDef default fill:#e1f5fe,stroke:#01579b,stroke-width:2px,color:#000

Figure 4: Models used for MPI Classifications

5.2 Schematic representation of models

Figure 5: Pipeline for standard MPI calculations

5.3 Summary of deprivation indicators used

The data for the analyses have been derived from the publicly available NFHS 2019/21 data files. An unruly caravan of indicators is used in the construction of India’s MPI which combine to create a web of deprivations for households. Notably, maternal health and financial inclusion were added to the indicator list for the most recent MPI calculations in India, enriching the indicator list but also necessitating a recalibration of indicator weights.⁴ The indicators used, the variable names, their definitions and pre-determined weights are shown in Table 3. The sample size consisted of 636,699 households having valid and consistent data across these indicators.

Table 3: Indicators, definitions and weights

Dimension	Indicator	Definition	Indicator Weight
Health	hh_nut_d	If any child is stunted or underweight or any adult, male or female as low bmi	1/6
Health	hh_cmort_d	Household has had a child (under 18) death in the 5 years preceding the survey	1/12
Health	hh_mh_d	Any women in the household who has given birth in the last 5 years and not had 4 ANC visits or did not receive assistance from trained/skilled medical personnel during most recent childbirth	1/12
Education	hh_yos_d	Not even one member of the household age 10 or older has completed 6 years of schooling	1/6
Education	hh_satt_d	Any school-aged child is not attending school up to the age at which he or she would complete class 8	1/6
Standard of Living	hh_san_d	Household has unimproved or not sanitation facilities or improved facilities but sharing	1/21
Standard of Living	hh_h20_d	Household has no access to improved drinking water sources or safe drinking water is at least 30 minutes walk (round-trip)	1/21
Standard of Living	hh_cf_d	Household uses dung, agricultural crops, shrubs, wood, charcoal, coal or kerosene for cooking	1/21
Standard of Living	hh_elec_d	Household has no electricity	1/21
Standard of Living	hh_house_d	Household has inadequate housing: floor is made of natural materials or the roof and wall are made from rudimentary materials	1/21
Standard of Living	hh_asset_d	The household does not own more than 1 of the following assets: radio, TV, telephone, computer, animal truck, bicycle, motorbike, or refrigerator, and does not own a car or truck	1/21
Standard of Living	hh_bacct_d	No household member has a bank or post office account	1/21

6 Disclosure and Ethical Statement

This research was conducted without any financial support from public or private institutions. The author declares no conflicts of interest.

The data used in this study were publicly available and obtained with permission from the Demographic and Health Surveys (DHS) Program. All datasets are anonymized and ethically cleared by the data providers. No additional ethical approval was required.

All analysis was conducted using open-source and freely available software, including R, Python, and associated libraries (e.g., polars, ggplot2, plotly, FactorMineR). Reproducible code and workflows were developed using Quarto and are available upon request.

References

Alkire, Sabina, Mihika Chatterjee, Adriana Conconi, Suman Seth, and Ana Vaz. 2014. “Global Multidimensional Poverty Index 2014.” OPHI Report. Oxford Poverty; Human Development Initiative (OPHI). https://doi.org/10.35648/20.500.12413/11781/ii039.

Alkire, Sabina, James Foster, Suman Seth, Maria Emma Santos, Jose Manuel Roche, and Paola Ballon. 2015. Multidimensional Poverty Measurement and Analysis. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199689491.001.0001.

Bonnerjee, A. 2025. “Unfolding Under-Five Multidimensional Poverty in India: A Machine Learning Approach.” SSRN. https://doi.org/10.2139/ssrn.5149987.

Bourguignon, François, and Satya R Chakravarty. 2003. “The Measurement of Multidimensional Poverty.” Journal of Economic Inequality 1 (1): 25–49. https://doi.org/10.1023/A:1023913831342.

Dotter, Caroline, and Stephan Klasen. 2017. “The Multidimensional Poverty Index: Achievements, Conceptual and Empirical Issues.” OPHI Working Paper 112. Oxford Poverty; Human Development Initiative (OPHI). https://www.econstor.eu/bitstream/10419/162856/1/893991872.pdf.

Government of India. 2023. “National Multidimensional Poverty Index: A Progress Review 2023.” NITI Aayog. https://www.niti.gov.in/sites/default/files/2023-07/National_MPI_2023_Report_0.pdf.

Kumar, Sanjay et al. 2023. “Multidimensional Poverty: CMPI Development, Spatial Analysis and Clustering.” Social Indicators Research 167: 1–24. https://doi.org/10.1007/s11205-023-03181-y.

Muñetón-Santa, Guberney, and Luis Carlos Manrique-Ruiz. 2023. “Predicting Multidimensional Poverty with Machine Learning Algorithms: An Open Data Source Approach Using Spatial Data.” Social Sciences 12 (5): 296. https://www.mdpi.com/2076-0760/12/5/296.

Rahman, Abdul, Wei Chen, and Xia Li. 2021. “A Clustering Approach to Identify Multidimensional Poverty Indicators for the Bottom 40 Percent Group.” PLOS ONE 16: e0255312. https://doi.org/10.1371/journal.pone.0255312.

Ravallion, Martin. 2011. “On Multidimensional Indices of Poverty.” The Journal of Economic Inequality 9 (2): 235–48. https://doi.org/10.1007/s10888-011-9173-4.

Roche, J. M. 2013. “Monitoring Multidimensional Poverty in India: Insights from Recent Household Surveys.” Social Indicators Research 112 (2): 417–46.

Whelan, Christopher T., Brian Nolan, and Bertrand Maitre. 2014. “Multidimensional Poverty Measurement in Europe: An Application of the Adjusted Headcount Approach.” Journal of European Social Policy 24 (2): 183–97. https://journals.sagepub.com/doi/10.1177/0958928713517914.

Footnotes

Two new indicators were used in the latest MPI calculations for India (2023). Financial inclusion was proxied through any family member having a bank account (with a weight of 1/21) and maternal health was proxied through a combination of double lacks - not having at least 4 ANC visits or not having skilled birth attendant present during delivery.↩︎
The data were available for free upon registration at the DHS program website.↩︎
The technique is identical to the Principal Components Analysis (PCA), which is appropriate for continuous data. For binary data, Multiple Correspondence Analysis (MCA) is the preferred option.↩︎
The data for the MPI can all be derived from the household recode file with the exception of the indicator for maternal health deprivations which can be picked up from the individual women’s recode file. The data extraction and analyses were done in Python and R using optimized libraries to handle large data frames.↩︎