Urbanisation and Assam Floods
Abstract
This dashboard presents the results for the MSc dissertation project submitted by Ms. Shruti Pareek, in partial fulfillment for the Degree of M.Sc. Economics, TERI School of Advanced Studies, New Delhi, India.
The dashboard presents all visualizations essential for analyzing and understand the Flood Risk Dynamics of Assam - and the impact of Urbanization over the last 24 years. The study focuses on how proxy indicators such as electricity consumption, urban road construction, drainage expansion, and land use change influence flood frequency. The objective is to quantify these relationships and generate explainable, policy-relevant insights through a data-driven modelling framework supported by Explainable Artificial Intelligence (XAI).
The dissertation revolves around the core research question – “How does urbanization impact Assam Floods?” addressing it using three segments:
- How does urbanization and its proxy variables impact floods in Assam?
- How can XAI techniques be effectively used to build an understanding of the decision-making process of complex statistical and AI models?
- What are some key insights, the policymakers can derive for data-driven policy enhancements?
The study begins with the development of a blend of statistical and machine learning models, including ARIMA, ARIMAX, Generalized Linear Models (GLMs), Generalized Additive Models (GAMs), and Random Forests, to model flood counts based on proxy indicators of urbanisation. Model performance is assessed using accuracy metrics, and interpretability is achieved through XAI tools such as SHAP summary plots, SHAP-over-time heatmaps, rule extraction, and waterfall additive breakdowns.
Findings reveal that increased electricity use and built-up land consistently elevate flood risk, while drainage infrastructure and GDP exhibit more complex or model-specific roles. A supplementary primary survey of 70 Guwahati residents further supports these conclusions, capturing local risk perception, preparedness behaviors, and social vulnerability indices. The triangulation of model predictions and public perception highlights systemic gaps in institutional preparedness and urban planning.
The research culminates in a set of data-informed policy recommendations, advocating for better drainage maintenance, community-level preparedness, and the establishment of a centralised, high-resolution data infrastructure. These findings aim to guide climate-resilient flood management strategies in Assam and similar urbanising regions.
Dashboard Structure
- 📋 Primary Data Analysis – maps, plots and indices from the field survey
- 📊 EDA – trend, density & correlation analysis
- 📈 Modelling – GLM, GAM, ARIMA/X & RF results
- 🧠 XAI – SHAP summaries, heat-maps, waterfalls
To collect Primary data, a questionnaire was designed to capture detailed information across four key thematic areas: (i) flood risk awareness and threat perception, (ii) historical flood experience, (iii) preparedness behaviors and adaptive capacity, and (iv) trust in government and community responses.
Demographic Profile
Interactive Map showing spatial distribution of respondents
The geographical spread of respondents ensures that the data captures perspectives from both core urban areas and peripheral zones, thus, covering the spatial diversity in lived flood experience and infrastructure access.
Flood Risk Awareness
To evaluate how residents interpret and respond to the risks of urban flooding, a composite Flood Risk Awareness and Threat Perception Index (FRATPI) was constructed using six survey questions that captured self-reported exposure, likelihood, severity, and preparedness. Responses were numerically encoded and standardized using z-scores to ensure comparability across components. The preparedness score was reverse-coded to align directionally with risk perception—so that lower preparedness increases the final index value.
Interactive Map showing spatial distribution of FRATPI Scores
Spatial patterns of FRATPI reveal that respondents from areas historically affected by drainage congestion and water-logging report notably higher average index values. These neighbourhoods—often located in low-lying central parts of Guwahati—exhibit greater concern about future flood events, suggesting that local experience and lived exposure play a key role in shaping public risk awareness.
Historical Experience & Preparedness
Personal exposure to urban flooding is closely linked to higher perceived risk. Figure i shows that respondents who reported direct flood experience consistently recorded higher FRATPI scores than those who had not, highlighting how lived events intensify awareness and amplify perceived vulnerability. This reinforces the behavioural hypothesis that threat perception is shaped not just by abstract knowledge but also by the memory of actual disruption a
However, this heightened awareness does not consistently translate into comprehensive preparedness. As seen in Figure ii, while a majority of respondents report having basic emergency supplies such as flashlights and first aid kits, nearly one-third lack these essentials—pointing to substantial gaps in household-level readiness. Similarly, only 31 out of 70 respondents expressed willingness to purchase flood insurance (Figure iii), suggesting limited uptake of formal financial risk-transfer tools despite rising flood anxieties. Perhaps most concerning is the evident deficit in community-level planning: fewer than one-third of respondents knew the location of their nearest flood shelter (Figure iv), exposing critical blind spots in emergency awareness. Additionally, although direct flood experience strengthens individual threat perception, the adoption of protective behaviours—whether material, financial, or informational—remains inconsistent. This asymmetry suggests that awareness alone does not guarantee preparedness and underscores the need for coordinated risk communication and institutional outreach.
Interactive Map showing spatial distribution of Vulnerability Index
To complement perceptual data with a synthetic measure of household vulnerability, a Social Vulnerability Index (SoVI) was constructed following Cutter et al. (2003). This index combines multiple dimensions—flood exposure (e.g., past incidents, property damage), preparedness (e.g., supplies, shelter awareness), and socioeconomic status (e.g., income brackets)—each standardized using z-scores. The resulting composite index reveals spatial clusters of heightened vulnerability concentrated in Guwahati’s central and low-lying zones (Figure 27). These hotspots overlap with both areas of high FRATPI and secondary modelled risk, affirming the convergence of perceived, experienced, and structural vulnerability in the city.
Government Trust
Institutional trust is a key determinant of community resilience and response to urban flood risk. To assess this, respondents were asked to rate their trust in (i) government agencies, (ii) local community leaders, and (iii) existing flood infrastructure on a scale of 1 (low trust) to 5 (high trust).
The distribution of responses (Figure iii) indicates a general leaning toward moderate or low levels of trust, with relatively few respondents expressing high confidence in institutional capacity. This pattern of scepticism is further substantiated by Figure ii, which shows that 50 out of 70 respondents reported receiving no support from local leaders or organizations during past flood events, highlighting a perceived absence of localized assistance.
In open-ended responses regarding how authorities could improve flood management, most participants pointed to drainage enhancement, better solid waste management, and more robust infrastructure planning. These suggestions, shown in Figure i, reveal a clear public consensus on the need for basic service delivery reforms, particularly around water drainage and better environment management.
In this section presents the insights into the dataset’s structure and underlying patterns. EDA is instrumental in understanding underlying dependencies, identifying trends, and potential anomalies that could influence model performance.
Based on below analysis, the following insights pave the path forward the modelling stage –
From the correlation figures, it is clear that Electricity, Drainage and GDP are likely the strongest predictors of flood counts.
The mean and variance of flood counts (mean = 171, variance = 36023) indicate that there is over-dispersion (variance > mean). This implies that the regression modelling approaches must consider a Negative Binomial Family to model the flood count variable.
From the sample density plots (in the Pairs plot), variables like Flood, Electricity, GDP, Registered companies must be normalized or log-transformed given the strong right skewness.
The problem of multi-collinearity must be dealt with before any models are created to avoid overestimation of errors and ensuring coefficients are stable & interpretable.
Correlation Matrix
The correlation visualization aims to evaluate the correlation between the variables and represent the matrix as a heatmap. Focusing on these correlation values helps in defining the pathway for the modelling stage.
Based on the above figures, urbanization proxies (like Electricity, GDP & non-agricultural land) show strong correlation (>0.80) with floods – supporting the hypothesis that urban expansion is linked with increased flood counts. The lower correlation with registered companies (0.60) suggests that formal business growth alone may not directly cause floods – maybe their impact takes years to emerge.
Variables like ‘Domestic Electricity Consumption’, ‘Stage GDP’ and ‘Drainage Channels’ have a strong positive correlation (>0.8) with flood count. However, from the Correlation heatmap it is evident that these variables are highly inter-correlated (>0.9). This highlights the problem of multicollinearity which must be dealt with before the modelling stage.
Scatter Plots
The scatter plots visualize the distribution for each variable in relation to the flood counts, highlighting the underlying correlation structure. These plots reflect the above underlying correlation structure.
Time Series Plots
The time series plots aim to graph out each variable across time – highlight underlying trends and seasonal patterns. Thus enabling us to better understand the evolutionary dynamics of each variable over the years.
Sample Density Plots
The sample density plots visualize the distribution of each underlying variable.
Pairs Plots
Pairs plot is a combination of scatter plots, density plots and correlations values between all the variables. Thus, providing an easy overview of underlying dependencies
Pairs plot is a combination of scatter plots, density plots and correlations values between all the variables. Thus, providing an easy overview of underlying dependencies -
Electricity, GDP, and Floods : Urban expansion meaning more housing, commerce & services, closely tracks flood count — reinforcing the hypothesis that urbanisation is a key driver of urban floods.
Drainage Channels & Floods: The unexpected positive correlation with flood suggests drainage could be reactive but is not strong enough to act as a preventive force.
Land Conversion & Food Production: The shift in land-use pattern and increased agricultural footprint contribute towards a reduction in permeable landscape thus raising the flood risk.
Company registrations: Lower correlation highlighted by weaker scatter — supports the idea that informal growth and infrastructure precede formal sector growth and may have a stronger bearing on the flood risk.
To build parsimonious models, one must select a subset of features which best explain the variation in the data. In this study, a random-forest based feature selection method is deployed to select the final variables.
Selected variables: State GDP, Domestic Electricity Consumption, Drainage Channels, Non-Agricultural Land & Urban Road Construction.
Modelling Philosophy
- Time Series - Given the dataset is defined across the years 2000 to 2024, the first modelling approach utilizes Time Series Forecasting models like ARIMA and ARIMAX. The time series models are chosen for the ability to account for the temporal evolution of the flood risk dynamics.
- GLM/ GAM - However, they lack in their ability to consider the count nature of the response variables, thus, creating the space for GLMs and GAM models . These models are known for their ability to capture linear and non-linear relationships.
- Random Forest - The study concludes with building a Random Forest model to capture more complex underlying relationships, also providing a rule-based structure to guide policymakers.
Model Comparison
ARIMA(0,1,3)
The best model found, ARIMA(0,1,3) is governed by the following equation: \(\nabla X_t = e_t-0.25*e_{t-1} + 0.7e_{t-2} -0.56e_{t-3}\) with \(e_t\) ~ \(N(0,6484)\)
ARIMAX with Urban Roads
The best ARIMAX model is found by combining all possible exogenous variables combinations with all possible ARIMA configurations and comparing them based on AIC. The best model found, ARIMA(1,1,0) with Urban Roads, is governed by the following equation: \(∇X_t=-0.51∇X_{t-1} + 0.0376*urban_road + e_t\) where, \(e_t\) ~ \(N(0,7864)\)
Here, with each unit increase in construction is associated with a ~3.8% rise in flood counts (holding AR structure constant).
Generalized Linear Models (GLM)
The best model (from Negative Binomial family) found is governed by the following equation: \(log(floods) =2.04+14.15*Electricity-1.74*GDP-8.65*(Electricity∶ GDP)\).
From the above equation, the beta coefficients highlight the impact of the respective variables. For example, ‘Domestic Electricity Consumption’ has a positive impact on ‘log(Floods)’ while ‘State GDP’ has a negative impact. The interaction term between ‘Domestic Electricity Consumption’ and ‘State GDP’ implies that the model believes that Assam’s domestic electricity consumption varies based on the state’s economic prosperity measured by GDP. This interaction term too has a negative impact on the response variable.
Generalized Additive Models (GAM)
The best model with Negative Binomial family is governed by the following equation: \(log(μ_i) = α + s_1(GDP_i) + s_2(Electricity_i) + s_3(Drainage_i) + s_4(Non.Agri.Land_i) + s_5(Roads_i)\)
Here, \(floods_i\) ~ \(NB\) characterized by \(Var(y_i)\)= \(μ_i + μ_i^2/θ\) ; ‘θ’ being the dispersion parameter, estimated as 1.57.
Random Forest Fit
The random forest (RF) model, trained with 500 trees achieves an explained variance of 85.2%, indicating a strong overall fit to the data.
GAM Smooth Curve Plots
In GAMs, each predictor is associated with a smooth spline function \(s(x_j)\), which captures the variable’s contribution to the response on the log scale. The GAM Smooth Curve plots graphically represent this relationship, with rug marks along the x-axis indicating the distribution of observed data.
From the above plot, the smooth term for GDP and Drainage show a non-linear decreasing trend, suggesting a risk-mitigation effect as infrastructure and economic prosperity increases. The smooth for Domestic Electricity Consumption increases sharply, particularly at higher values, indicating the strong influence of urban expansion. The contribution of Non-Agricultural Land follows a hill-shaped curve—moderate levels correspond to heightened flood risk, while very low or high values exert a dampening effect. Urban Road Construction displays a non-linear upward trajectory with marked variation, consistent with episodic infrastructure expansion amplifying flood vulnerability.
RF-Rule Extraction
The model consistently associates low flood regimes with combinations reflecting limited urban activity. For instance, flood counts remain as low as 4 per year when non-agricultural land exceeds 1,000,000 hectares but urban road construction remains below 3,200 km—indicating that land conversion alone does not elevate flood risk unless accompanied by road-driven urban expansion. Similarly, low electricity consumption (under 2,400 MU) and sparse road networks produce predictions near 7, capturing low-density, pre-urban contexts. Even under moderate increases in road construction (above 3,800 km), if non-agricultural land remains relatively low (under 1,270,000 hectares), flood counts increase only modestly. In transitional drainage scenarios—specifically between 865 and 880 km—the model predicts moderate floods (~54), suggesting a nonlinear role of drainage as both a mitigating and compounding factor depending on urban intensity.
In contrast, high flood scenarios are learned when infrastructure and land-use variables surpass critical thresholds. When non-agricultural land use exceeds 427,000 hectares, flood predictions rise sharply to nearly 388 per year—capturing patterns present in over a third of historical observations. Road construction beyond 5,400 km also triggers flood counts above 420, reinforcing the model’s sensitivity to urban sprawl. Notably, even with high drainage coverage (e.g., above 877 km), flood risk remains elevated if urban land use remains intensive, highlighting that mitigation infrastructure alone is insufficient without constraints on expansion.
Collectively, these rules demonstrate that the Random Forest model internalises flood dynamics through the recognition of interaction effects and threshold behaviour. Rather than attributing risk to isolated predictors, the model identifies flood vulnerability as a function of interdependent infrastructural and spatial variables—particularly the simultaneous scaling of urban land use and road density.
SHAP Summary Plots
The SHAP summary plot provides a global explanation of how individual features influence model predictions across all observations. Each dot in the plot represents a SHAP value corresponding to a single observation for a given feature, with colour gradients indicating the magnitude of the feature’s actual value (from low in yellow to high in purple). The horizontal position shows the direction and magnitude of the feature’s impact on the prediction, and features are ranked by overall importance from top to bottom.
Both models exhibit strong agreement in ranking Domestic Electricity Consumption and State GDP as the most influential contributors to flood risk. High electricity consumption consistently raises flood predictions, reinforcing its role as a proxy for urban density and impervious surface expansion. Conversely, higher GDP is associated with reduced flood risk, likely reflecting improved infrastructure and institutional capacity. This alignment suggests a robust, model-independent understanding that urban growth heightens exposure, while economic development offers partial mitigation.
However, the models diverge in their treatment of secondary features. The GAM model captures more nuanced, non-linear behaviours—particularly in how it models Drainage Channels and Urban Road Construction. It learns that drainage effectiveness varies with urban context and that roads introduce sharp risk escalations beyond specific thresholds. Random Forest, by contrast, applies a threshold-based rule logic that flattens these relationships, prioritising frequent feature co-occurrences over marginal effects. This leads to a more stable but less temporally dynamic representation of risk factors. These distinctions underscore the complementary nature of the models: GAM excels in uncovering smooth, continuous risk patterns, while Random Forest identifies stable, interpretable thresholds driving flood dynamics.
SHAP Over Time
Both models exhibit strong alignment in identifying domestic electricity consumption and non-agricultural land as the most influential predictors of flood risk. SHAP values for these variables rise steadily from 2011 onward in both GAM and Random Forest, indicating a shared interpretation that rapid urbanisation—reflected in energy demand and land conversion—drives increasing flood exposure. This consistency suggests that both models learn a common underlying pattern linking spatial expansion to flood vulnerability.
However, the models diverge in their interpretation of risk-mitigating variables and temporal responsiveness. The GAM model attributes increasingly negative SHAP values to GDP and drainage infrastructure from 2016 onward, signalling recognition of their moderating effects as infrastructure matures. In contrast, Random Forest maintains neutral or positive SHAP values for these features, reflecting a lower sensitivity to mitigation dynamics—likely due to its reliance on threshold-based splits. Additionally, GAM captures urban road construction with more temporal precision, showing sharp SHAP fluctuations aligned with construction pulses, whereas Random Forest averages these effects over time, offering a more static representation.
Together, the models offer complementary perspectives—one capturing nuanced transitions, the other emphasising accumulated exposure.
Waterfall Additive Breakdown 2020
The Waterfall Additive Breakdown Plot provides a local explanation of how individual features contribute to a single model prediction. Starting from the model’s baseline output—typically the mean prediction across all observations—the plot sequentially adds or subtracts the contribution of each feature to arrive at the final forecast. Bars extending to the right indicate positive contributions that increase the prediction, while bars to the left represent negative contributions that reduce it. Feature values are annotated on each bar, allowing for a precise interpretation of the model’s reasoning. This visualisation is particularly well-suited to additive models such as GAMs, where the total prediction is decomposed into smooth additive components. For ensemble models like Random Forest, SHAP-based waterfall plots offer a similar additive interpretability by attributing the prediction to the marginal contribution of each input feature relative to the mean.
In 2020, both the GAM and Random Forest models identify Domestic Electricity Consumption as the most influential driver of flood risk. Both also agree on the positive contribution of Urban Road Construction and Non-Agricultural Land, reflecting the role of urban expansion in amplifying runoff and reducing permeability. In both models, State GDP emerges as an important economic-scale factor, and the final predictions (GAM: 542, RF: 385) exceed their respective baselines, confirming 2020 as a high-flood year shaped by cumulative urban pressures.
Despite surface-level similarity, the models diverge in their treatment of drainage and GDP. The GAM model attributes a significant negative effect to GDP and Drainage Channels, interpreting these features as resilience-building mechanisms. In contrast, the Random Forest model assigns positive SHAP values to both, suggesting it interprets these as signals of urban intensity rather than flood mitigation. Additionally, the GAM model produces a more precise estimate closely aligned with the observed flood count, while RF underpredicts the outcome—implying that the latter may underrepresent latent variables such as climatic extremities.
Waterfall Additive Breakdown 2024
By 2024, both models continue to rank Domestic Electricity Consumption and Urban Road Construction among the top contributors to rising flood risk. Feature influence intensifies in this year across both models, especially for electricity use, confirming that both interpret 2024 as a period of heightened urban demand. Both models also register contributions from Non-Agricultural Land and GDP, maintaining structural consistency with prior years.
The models once again diverge in their treatment of mitigation. GAM learns a dominant negative effect from Drainage Channels (−1588), interpreting improved infrastructure as the key reason for the reduction in predicted flood count. RF, however, assigns a modest positive contribution (+25), indicating that it does not detect any substantial mitigation from drainage expansion. Additionally, the GAM model interprets GDP as a resilience signal, while RF continues to associate it with increased exposure. These contrasts result in differing reasoning paths, even as both models arrive at final predictions that are close to observed outcomes.
Drawing from the combined insights of our machine learning models, explainable AI (XAI) techniques, and a primary household-level survey in Guwahati, the following recommendations are proposed for policy action. These are further reinforced by alignment with recent government memorandums, NGO programs, and disaster management protocols in Assam.
Integrate Urban Infrastructure Planning with Flood Risk Assessments - Both the Generalized Additive Model (GAM) and Random Forest models consistently rank Domestic Electricity Consumption and Urban Road Construction as top contributors to rising flood counts. GAM spline curves show non-linear upward trends, especially after 2010, indicating a structural shift toward heightened urban flood exposure. Flood perception is significantly higher in neighbourhoods undergoing rapid infrastructure development, as reflected in elevated FRATPI scores in central zones of Guwahati. Urban planning authorities should integrate flood risk indices and infrastructure stress indicators—such as those derived from electricity and road construction proxies—into building approvals and zoning regulations.
Supporting Initiatives:
The Assam State Disaster Management Plan (ASDMA, 2022) calls for integrating risk sensitivity into urban master plans.
The National Disaster Management Authority (NDMA) flood guidelines promote floodplain zoning and urban risk audits.
Enhance and Maintain Drainage Infrastructure - GAM and SHAP-over-time analysis show that Drainage Channels, when improved post-2016, contributed negatively to flood counts, acting as effective risk mitigators. In GAM waterfall plots for 2024, drainage reduced expected flood counts by over 1,500—demonstrating its dominant role in mitigation. Drainage improvements were the most suggested intervention by surveyed respondents (Figure 26-i), and narratives from the field repeatedly referenced waterlogging due to blocked or absent drainage. Increase both the coverage and maintenance frequency of urban drainage networks, especially in flood-prone inner-city wards.
Supporting Initiatives:
The Assam State Disaster Management Plan (ASDMA, 2022) outlines the importance of developing and maintaining effective drainage systems to manage urban flooding.
Vision Assam 2020 and the AMRUT (Atal Mission for Rejuvenation and Urban Transformation) initiative identify drainage infrastructure as a critical need.
Promote Community-Based Disaster Risk Reduction (CBDRR) - While 70% of households possess basic emergency supplies (Figure 25-ii), only 44% are willing to purchase flood insurance, and just 31% know the location of the nearest flood shelter. This indicates high individual awareness but low community-level coordination and preparedness. Invest in ward-level CBDRR cells to organize drills, awareness sessions, and shelter mapping. Use tools like the FRATPI index to target high-risk localities.
Supporting Initiatives:
The Catholic Relief Services’ Assam Disaster Risk Reduction project successfully engaged communities in developing and implementing local resilience plans.
SEWA’s Disaster Risk Reduction Programme emphasizes community involvement in flood preparedness and response.
Strengthen Institutional Trust and Support Mechanisms - Only 3 out of 70 respondents reported receiving any form of institutional or community support during floods (Figure 26-ii), and trust ratings toward government agencies and infrastructure adequacy cluster around a score of 2 to 3 out of 5 (Figure 26-iii), reflecting institutional scepticism. GAM interprets State GDP as a mitigating factor (−588 in 2020), while Random Forest views it as a sign of exposure, not resilience—indicating the perception gap between institutional capacity and public trust. Rebuild institutional trust by demonstrating timely, visible, and localised interventions—including transparent communication of budget allocations, early warnings, and post-flood support.
Supporting Initiatives:
The GO-NGO Protocol for Emergency Management in Assam aims to improve coordination between government agencies and NGOs, enhancing community support during disasters.
The Assam State Disaster Management Plan advocates for transparent communication and community engagement to build trust.
Implement Early Warning Systems and Improve Risk Communication - Despite elevated FRATPI scores across several wards, community-level preparedness remains fragmented, highlighting a gap in anticipatory action. Moreover, experience with flooding strongly correlates with higher perceived risk, suggesting that early alerts could increase proactive behaviour. SHAP-over-time plots from both models show predictive signals for flood spikes 1–2 years in advance, based on urban expansion variables—indicating a powerful opportunity for predictive early warning systems. Use model-informed thresholds (e.g., electricity > 0.7, road expansion > 0.6) as triggers for issuing local flood warnings. Integrate these systems with ASDMA’s mobile alert platforms and community radio.
Supporting Initiatives:
Aaranyak, in collaboration with the International Centre for Integrated Mountain Development (ICIMOD), has implemented community-based flood early warning systems in Assam.
The ASDMA 2022, includes digital early warning communication protocols in its core flood response plan.
Establish Centralized Flood Risk Data Infrastructure for AI-driven Governance - The modelling and explainability analyses in this report are based on annual data spanning just 25 years. While these models yield actionable insights, the temporal granularity and data gaps limit the reliability of early warnings, risk forecasting, and fine-tuned policy recommendations. For instance, the SHAP-over-time analysis highlights year-specific spikes, but monthly-level resolution could reveal seasonal dynamics, intra-annual surges, and lagged policy effects. The government should establish a dedicated flood and environmental data department under the Assam State Disaster Management Authority (ASDMA) or a state research bureau. This body should be tasked with the systematic collection, standardisation, and open dissemination of multi-dimensional flood-related datasets. Specifically:
Hydrological records: Daily/monthly rainfall, river levels, groundwater saturation and flash flood reports.
Land use and built-up area statistics: Annual satellite-derived updates on impervious surfaces, agricultural-to-urban transitions, and deforestation.
Infrastructure development: Quarterly data on roads, drainage construction, encroachments, and new settlements.
Socioeconomic indicators: Census updates, ward-level electricity consumption, income distribution, access to shelters.
Disaster incidence and impact data: Number of affected households, economic losses, aid distribution, and evacuation records.
This would significantly elevate the potential of machine learning and XAI-based policy evaluation in Assam. Without such data, even the most advanced models are constrained to extrapolate patterns on incomplete baselines. Better data will not only sharpen model performance, but also allow policy simulations, targeted interventions, and resource optimisation at sub-city scales.