Flooding is one of the most costly and catastrophic natural hazards,
especially in urban environments where impervious surfaces and
population density increase exposure. Inundation risks are also
projected to rise due to climate change, continued urban development,
and aging infrastructure. In light of these risks, predictive modelling
of flooding inundation can inform strategic planning and identify
vulnerable area in Portland in advance.
This project develops a spatial logistic regression model using the
2013 Calgary flood as a reference event. The model includes geographical
and environmental variables - including elevation, slope, flow
accumulation, land cover, and distance to rivers - to estimate the
probability of flood inundation across a city. These variables were
derived using GIS-based pre-processing and extracted at the grid-cell
level for analysis in R. The model was trained on 70% of Calgary data
and tested on the remaining 30%. It was then applied to a comparable
city - Portland, Oregon to evaluate flood inundation probability.
Key results indicate that all variables were statistically significant, and the model performed well in predicting inundated and non-inundated areas, especially good at identifying non-inundated zones, but with some underestimation of flood zones. The confusion matrix and ROC analysis confirmed the model’s ability to generalize beyond the training data. When applied to Portland, the model successfully identifies high inundation probability areas.
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE,
cache = TRUE
)
knitr::opts_chunk$set(echo = TRUE)
options(scipen=999)
library(knitr)
library(kableExtra)
library(terra)
library(sf)
library(dplyr)
library(ggplot2)
library(raster)
library(scales)
library(cowplot)
library(grid)
library(gridExtra)
library(caret)
library(pscl)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(kableExtra)
library(tigris)
# Define custom color gradient
my_colors <- c("#033E56", "#9AC3BB", "#D3D477", "#FAAE41", "#ED6328")
my_colors_f <- c("#ED6328", "#FAAE41", "#D3D477", "#9AC3BB", "#033E56")
To develop the model, spatial data sets are loaded for both Calgary and Portland, including Elevation, Slope, Flow Accumulation, Distance to River, and Land Cover (Developed, Forest, and Grassland).
The coordinate systems used in this study is:
GS_1984_Web_Mercator_Auxiliary_Sphere
When scaling elevation to a range of 0–300, the rationale comes from the elevation ranges of Calgary (967 – 1,290 m) and Portland (3 – 370 m), which have a difference of approximately 300 meters. This standardization ensures comparability across regions. Similarly, scaling flow accumulation is critical because, when applying the model across diverse regions with varying hydrologic networks, normalizing these continuous variables prevents large-magnitude predictors from dominating the model. This maintains consistent variable influence between Calgary and Portland, ensuring balanced and interpretable results.
fishnet_cal$Nor_Elevation <- rescale(fishnet_cal$Elevation, to = c(0, 300))
fishnet_cal$Nor_Flow_accu <- rescale(fishnet_cal$Flow_accu, to = c(0, 10000))
fishnet_por$Nor_Elevation <- rescale(fishnet_por$Elevation, to = c(0, 300))
fishnet_por$Nor_Flow_accu <- rescale(fishnet_por$Flow_accu, to = c(0, 10000))
To predict flood inundation risk, we selected five spatially derived variables:
The maps below visualize these predictor variables over Calgary’s predicted model:
Elevation_cal_plot <- ggplot() +
geom_sf(data = fishnet_cal, aes(fill = Nor_Elevation), color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Elevation (m)") +
theme_void()
Slope_cal_plot <- ggplot() +
geom_sf(data = fishnet_cal, aes(fill = Slope), color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Slope (%)") +
theme_void()
Dis_River_cal_plot <- ggplot() +
geom_sf(data = fishnet_cal, aes(fill = Dis_River), color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Distance (m)") +
theme_void()
plot_grid(
ggdraw() + draw_label("Elevation", x = 0.5, hjust = 0.5),
ggdraw() + draw_label("Slope", x = 0.5, hjust = 0.5),
ggdraw() + draw_label("Distance to River", x = 0.5, hjust = 0.5),
Elevation_cal_plot, Slope_cal_plot, Dis_River_cal_plot,
ncol = 3, rel_heights = c(0.1, 1),
align = "v", axis = "lr"
)
Flow_accu_cal_plot <- ggplot() +
geom_sf(data = fishnet_cal, aes(fill = Nor_Flow_accu), color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Flow Accumulation") +
theme_void() +
theme(
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)
)
Landcover_cal_plot <- ggplot() +
geom_sf(data = city_boundary_cal, fill = "#e9e9e9", color = "grey80") +
geom_sf(data = developed_cal, aes(fill = "Developed"), color = NA) +
geom_sf(data = forest_cal, aes(fill = "Forest"), color = NA) +
geom_sf(data = grassland_cal, aes(fill = "Grassland"), color = NA) +
scale_fill_manual(
name = "Land Cover Type",
values = c("Developed" = "#033E56", "Forest" = "#9AC3BB", "Grassland" = "#D3D477")
) +
theme_void() +
theme(
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)
)
plot_grid(
ggdraw() + draw_label("Landcover", x = 0.5, hjust = 0.5, size = 11),
ggdraw() + draw_label("Flow Accumulation", x = 0.5, hjust = 0.5, size = 11),
Landcover_cal_plot, Flow_accu_cal_plot,
ncol = 2, rel_heights = c(0.1, 1),
align = "v", axis = "lr"
)
# Reshape fishnet data into long format for group mean calculation
floodEDA <- fishnet_cal %>%
as.data.frame() %>%
dplyr::select(inundated_binary, Elevation, Dis_River, Slope, Flow_accu, Developed, Forest, Grassland) %>%
pivot_longer(cols = -inundated_binary, names_to = "variable", values_to = "value")
# Plot grouped bar charts of mean values
floodEDA %>%
group_by(inundated_binary, variable) %>%
summarize(mean = mean(value, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = as.factor(inundated_binary), y = mean, fill = as.factor(inundated_binary))) +
geom_bar(stat = "identity") +
facet_wrap(~variable, scales = "free") +
scale_fill_manual(values = c("#033E56", "#9AC3BB"),
labels = c("Not Inundated", "Inundated"),
name = "") +
labs(title = "Mean Predictor Values by Inundation Status",
x = "Inundation Status",
y = "Mean Predictor Value") +
theme_minimal()
Group bar plot compares the mean of each predictor between flooded and non-flooded cells. Useful for identifying variables that may distinguish 0 vs. 1.
floodEDA_violin <- floodEDA
# Violin plot to show full variable distribution by binary outcome
ggplot(floodEDA_violin) +
geom_violin(aes(x = as.factor(inundated_binary), y = value, fill = as.factor(inundated_binary)), alpha = 0.7) +
facet_wrap(~variable, scales = "free") +
scale_fill_manual(values = c("#033E56", "#9AC3BB"),
labels = c("Not Inundated", "Inundated"),
name = "") +
labs(title = "Distribution of Predictor Variables by Inundation Status",
x = "Inundation Status",
y = "Value") +
theme_minimal()
Violin plot shows distribution shape and spread of each variable across classes, helping with detecting skewness or overlap.
In our model, we used 70% of the Calgary fishnet grid cells as the training set to build the logistic regression model and the remaining 30% as the test set to evaluate its performance. This ensures that we can further assess how well our model generalizes to new, unseen data.
set.seed(3456)
trainIndex <- createDataPartition(fishnet_cal$inundated_binary, p = .70,
list = FALSE,
times = 1)
fishnetTrain <- fishnet_cal[trainIndex, ]
fishnetTest <- fishnet_cal[-trainIndex, ]
The logistic regression model estimates the probability of flood inundation in Calgary based on several variables, including elevation (normalized), distance to river, slope, flow accumulation(normalized), and land cover types (developed, forest, grassland). The model was fitted using the glm() function with a binomial logit link.
# Logistic regression model using normalized values
floodModel <- glm(inundated_binary ~ Nor_Elevation + Dis_River + Slope + Nor_Flow_accu +
Developed + Forest + Grassland,
family = binomial(link = "logit"),
data = fishnetTrain %>%
as.data.frame() %>%
dplyr::select(-geometry))
summary(floodModel)
##
## Call:
## glm(formula = inundated_binary ~ Nor_Elevation + Dis_River +
## Slope + Nor_Flow_accu + Developed + Forest + Grassland, family = binomial(link = "logit"),
## data = fishnetTrain %>% as.data.frame() %>% dplyr::select(-geometry))
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.033682464 0.183742508 11.068 < 0.0000000000000002 ***
## Nor_Elevation -0.015811750 0.001006577 -15.708 < 0.0000000000000002 ***
## Dis_River -0.000440021 0.000047496 -9.264 < 0.0000000000000002 ***
## Slope -0.063419559 0.020538595 -3.088 0.00202 **
## Nor_Flow_accu 0.000462045 0.000050111 9.220 < 0.0000000000000002 ***
## Developed -0.000073550 0.000004461 -16.488 < 0.0000000000000002 ***
## Forest 0.000045883 0.000010964 4.185 0.0000285 ***
## Grassland -0.000061328 0.000004846 -12.656 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8477.4 on 14492 degrees of freedom
## Residual deviance: 6587.8 on 14485 degrees of freedom
## AIC: 6603.8
##
## Number of Fisher Scoring iterations: 6
# Extract coefficients from the model summary
model_summary <- coef(summary(floodModel))
# Convert to a data frame and round decimals for readability
model_table <- as.data.frame(model_summary) %>%
mutate(
Estimate = round(Estimate, 4),
`Std. Error` = round(`Std. Error`, 4),
`z value` = round(`z value`, 4),
`Pr(>|z|)` = round(`Pr(>|z|)`, 4)
)
# Display the table using kable
kable(model_table, caption = "Logistic Regression Model Summary", col.name=c("Variables", 'Estimate','Standard Deviation Error','Z value', 'Pr')) %>%
kable_styling("striped", full_width = T)
| Variables | Estimate | Standard Deviation Error | Z value | Pr |
|---|---|---|---|---|
| (Intercept) | 2.0337 | 0.1837 | 11.0681 | 0.000 |
| Nor_Elevation | -0.0158 | 0.0010 | -15.7084 | 0.000 |
| Dis_River | -0.0004 | 0.0000 | -9.2644 | 0.000 |
| Slope | -0.0634 | 0.0205 | -3.0878 | 0.002 |
| Nor_Flow_accu | 0.0005 | 0.0001 | 9.2204 | 0.000 |
| Developed | -0.0001 | 0.0000 | -16.4883 | 0.000 |
| Forest | 0.0000 | 0.0000 | 4.1849 | 0.000 |
| Grassland | -0.0001 | 0.0000 | -12.6559 | 0.000 |
The result indicate that all variables included in the model are statistically significant, with p-values below 0.01, suggesting strong associations with flood risk. In addition, elevation, distance to river, slope, developed land, and grassland all have negative coefficients, meaning they are associated with decreased odds of inundation. In contrast, flow accumulation and forest area are positively associated with higher flood likelihood.
classProbs <- predict(floodModel, newdata = fishnetTest, type = "response")
hist(classProbs,
main = "Distribution of Predicted Flood Probabilities",
xlab = "Predicted Probability",
col = "#033E56", border = "white")
The overall distribution of predicted flood probabilities generated by the logistic regression model for the test set. Most of the predicted probabilities are clustered near zero, indicating that the model classifies the majority of cells as not inundated.
testProbs <- data.frame(obs = fishnetTest$inundated_binary, pred = classProbs)
# Density plot
ggplot(testProbs, aes(x = pred, fill = as.factor(obs))) +
geom_density(alpha=0.6) +
facet_grid(obs ~ .) +
xlab("Predicted Probability of Inundation") +
ylab("Density") +
geom_vline(xintercept = 0.5, linetype="dashed") +
scale_fill_manual(values = c("#033E56", "#9AC3BB"),
labels = c("Not Inundated", "Inundated"),
name = "Observed") +
ggtitle("Predicted Probability vs. Actual Flood Inundation") +
theme_minimal()
This density plot shows the distribution of predicted flood probabilities generated by the logistic regression model, separated by the actual observed inundation status (0 = Not Inundated, 1 = Inundated). It reveals that non-flooded cells cluster around low probabilities, while flooded cells have a broader spread, though many still fall below 0.5. Together, these plots suggest the model is good at identifying non-inundated areas but less confident for flooded zones, with some overlap between classes.
testProbs$predClass <- ifelse(testProbs$pred > 0.2, 1, 0)
confusionMatrix(reference = as.factor(testProbs$obs),
data = as.factor(testProbs$predClass),
positive = "1")
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 5438 305
## 1 225 243
##
## Accuracy : 0.9147
## 95% CI : (0.9074, 0.9215)
## No Information Rate : 0.9118
## P-Value [Acc > NIR] : 0.2175389
##
## Kappa : 0.4322
##
## Mcnemar's Test P-Value : 0.0006002
##
## Sensitivity : 0.44343
## Specificity : 0.96027
## Pos Pred Value : 0.51923
## Neg Pred Value : 0.94689
## Prevalence : 0.08823
## Detection Rate : 0.03912
## Detection Prevalence : 0.07535
## Balanced Accuracy : 0.70185
##
## 'Positive' Class : 1
##
For this model, a threshold of 0.2 was chosen to classify areas as flood-inundated. This lower threshold was selected to prioritize sensitivity. While this increases the chance of false positives (predicting flood where none occurs), it reduces the risk of missing true flood zones, which is important in disaster preparation and planning.
The model achieved an overall accuracy of 91.47%. The sensitivity (true positive) is 44.34%, indicating that the model correctly identified approximately 44% of truly inundated locations. In contrast, the specificity is 96.03%, showing its strong performance in correctly identifying non-inundated areas. The Kappa statistic of 0.43 suggests moderate agreement between predicted and actual classes beyond chance.
True Positive (TP): The model correctly predicted an area as inundated, and it was actually inundated.
False Positive (FP): The model predicted an area as inundated, but it was not actually inundated.
True Negative (TN): The model correctly predicted an area as not inundated, and it was truly not inundated.
False Negative (FN): The model predicted an area as not inundated, but it was actually inundated.
# Table of confusion counts
conf_mat <- confusionMatrix(reference = as.factor(testProbs$obs),
data = as.factor(testProbs$predClass),
positive = "1")
# Table of confusion counts
cm_table <- conf_mat$table
TN <- cm_table[1,1]; FP <- cm_table[2,1]
FN <- cm_table[1,2]; TP <- cm_table[2,2]
confusion_counts <- data.frame(
Outcome = c("True Positive (TP)", "False Positive (FP)",
"True Negative (TN)", "False Negative (FN)"),
Count = c(TP, FP, TN, FN)
)
kable(confusion_counts, caption = "Confusion Matrix Breakdown") %>%
kable_styling("striped", full_width = T)
| Outcome | Count |
|---|---|
| True Positive (TP) | 243 |
| False Positive (FP) | 225 |
| True Negative (TN) | 5438 |
| False Negative (FN) | 305 |
The map visualizes the spatial distribution of model prediction outcomes for each fishnet cell in Calgary. This full-area map is based on predictions using a threshold of 0.2, providing insights into where the model is performing well and where it is making errors.
fishnet_cal$predicted_prob <- predict(floodModel, newdata = fishnet_cal, type = "response")
fishnet_cal$predicted_class <- ifelse(fishnet_cal$predicted_prob > 0.2, 1, 0)
fishnet_cal <- fishnet_cal %>%
mutate(confusion_type = case_when(
inundated_binary == 1 & predicted_class == 1 ~ "True Positive",
inundated_binary == 0 & predicted_class == 0 ~ "True Negative",
inundated_binary == 0 & predicted_class == 1 ~ "False Positive",
inundated_binary == 1 & predicted_class == 0 ~ "False Negative",
TRUE ~ NA_character_
))
confusion_colors <- c(
"True Positive" = "#9AC3BB",
"True Negative" = "#033E56",
"False Positive" = "#ED6328",
"False Negative" = "#FAAE41"
)
# Map
ggplot(data = fishnet_cal) +
geom_sf(aes(fill = confusion_type), color = NA, size = 0.01) +
scale_fill_manual(values = confusion_colors, name = "Results") +
labs(title = "Confusion Metrics",
subtitle = "Spatial Classification Based") +
theme_void() +
theme(legend.position = "right",
plot.title = element_text(size = 14, face = "bold"))
Purple is the area correctly identified as inundated, while light gray area indicates correct non-flood predictions. Blue shows where the model predicted flooding but no inundation occurred, and red is missed flood areas that were incorrectly classified as dry. The map shows that many True Positives align with river-adjacent zones, but there are still notable clusters of False Negatives, highlighting areas where the model underestimates the risk — a key concern for flood inundation risk mitigation.
The ROC curve visualizes the model’s ability to distinguish between flooded (positive) and non-flooded (negative) areas across all classification thresholds.
ggplot(testProbs, aes(d = obs, m = pred)) +
geom_roc(n.cuts = 50, labels = FALSE, color = "#033E56") +
style_roc(theme = theme_grey) +
geom_abline(slope = 1, intercept = 0, size = 1.2, color = 'grey', linetype = "dashed") +
labs(title = "ROC Curve of Flood Inundation Model",
x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensitivity)") +
theme_minimal()
auc_value <- auc(testProbs$obs, testProbs$pred)
auc_value
## Area under the curve: 0.8059
The ROC curvature and distance from the diagonal “random guess” line indicate strong discriminatory power. The blue line shows high sensitivity (true positive rate) at low false positive rates, suggesting that the model performs well in identifying flooded areas. The AUC is 0.8059, reflecting a good discriminatory ability.
To assess the model’s generalizability and reduce overfitting risk, we applied 5-fold cross-validation. This method involves splitting the training dataset into five equal parts (folds). The model is trained on four of the folds and validated on the fifth.
# Set up cross-validation control
train_control <- trainControl(method = "cv", number = 5) # 5-fold cross-validation
# Fit logistic regression model with normalized inputs
floodModel_cv <- train(as.factor(inundated_binary) ~ Nor_Elevation + Dis_River + Slope + Nor_Flow_accu + Developed + Forest + Grassland,
data = fishnetTrain %>% as.data.frame() %>% dplyr::select(-geometry), # disambiguate select()
method = "glm",
family = "binomial",
trControl = train_control)
print(floodModel_cv)
## Generalized Linear Model
##
## 14493 samples
## 7 predictor
## 2 classes: '0', '1'
##
## No pre-processing
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 11594, 11594, 11595, 11595, 11594
## Resampling results:
##
## Accuracy Kappa
## 0.9249984 0.283532
In our model, cross-validation generated an accuracy of 92.5% and a Kappa statistic of 0.284. The high accuracy confirms the model’s strong ability to correctly predict both inundated and non-inundated areas overall. However, the moderate Kappa indicates room for improvement in distinguishing the inundated area from the non-inundated class.
The map displays the spatial distribution of predicted flood risk in Calgary, generated by applying the trained logistic regression model to the entire Calgary fishnet data set.
fishnet_cal$calgaryPredictions <- predict(floodModel,
newdata = fishnet_cal %>%
as.data.frame() %>%
dplyr::select(Nor_Elevation, Dis_River, Slope, Nor_Flow_accu,
Developed, Forest, Grassland),
type = "response") * 100
fishnet_cal$risk_quantile <- ntile(fishnet_cal$calgaryPredictions, 5)
fishnet_cal <- fishnet_cal %>%
mutate(risk_label = case_when(
risk_quantile == 1 ~ "Very Low",
risk_quantile == 2 ~ "Low",
risk_quantile == 3 ~ "Moderate",
risk_quantile == 4 ~ "High",
risk_quantile == 5 ~ "Very High"
))
ggplot() +
geom_sf(data = fishnet_cal, aes(fill = risk_label), colour = NA) +
scale_fill_manual(
values = my_colors_f,
name = "Flood Risk (Quantile-Based)"
) +
theme_void() +
labs(
title = "Predicted Flood Inundation Risk in Calgary",
) +
theme(legend.position = "bottom")
Elevation_por_plot <- ggplot() +
geom_sf(data = fishnet_por, aes(fill = Nor_Elevation), col = "transparent", color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
name = "Elevation (m)") +
theme_void()
Slope_por_plot <- ggplot() +
geom_sf(data = fishnet_por, aes(fill = Slope), col = "transparent", color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
name = "Slope (%)") +
theme_void()
Dis_River_por_plot <- ggplot() +
geom_sf(data = fishnet_por, aes(fill = Dis_River), col = "transparent", color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
name = "Distance (m)") +
theme_void()
plot_grid(
ggdraw() + draw_label("Elevation", x = 0.5, hjust = 0.5),
ggdraw() + draw_label("Slope", x = 0.5, hjust = 0.5),
ggdraw() + draw_label("Distance to River", x = 0.5, hjust = 0.5),
Elevation_por_plot, Slope_por_plot, Dis_River_por_plot,
ncol = 3, rel_heights = c(0.1, 1),
align = "v", axis = "lr"
)
Flow_accu_por_plot <- ggplot() +
geom_sf(data = fishnet_por, aes(fill = Nor_Flow_accu), color = NA) +
scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Flow Accumulation") +
theme_void() +
theme(
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)
)
Landcover_por_plot <- ggplot() +
geom_sf(data = city_boundary_por, fill = "#e9e9e9", color = "grey80") +
geom_sf(data = developed_por, aes(fill = "Developed"), color = NA) +
geom_sf(data = forest_por, aes(fill = "Forest"), color = NA) +
geom_sf(data = grassland_por, aes(fill = "Grassland"), color = NA) +
scale_fill_manual(
name = "Land Cover Type",
values = c("Developed" = "#033E56", "Forest" = "#9AC3BB", "Grassland" = "#D3D477")
) +
theme_void() +
theme(
legend.title = element_text(size = 8),
legend.text = element_text(size = 8)
)
plot_grid(
ggdraw() + draw_label("Flow Accumulation", x = 0.5, hjust = 0.5, size = 11),
ggdraw() + draw_label("Landcover", x = 0.5, hjust = 0.5, size = 11),
Flow_accu_por_plot, Landcover_por_plot,
ncol = 2, rel_heights = c(0.1, 1),
align = "v", axis = "lr"
)
fishnet_por$portlandPredictions <- predict(floodModel,
newdata = fishnet_por %>%
as.data.frame() %>%
dplyr::select(Nor_Elevation, Dis_River, Slope, Nor_Flow_accu, Developed, Forest, Grassland),
type = "response") * 100
fishnet_por$risk_quantile <- ntile(fishnet_por$portlandPredictions, 5)
fishnet_por <- fishnet_por %>%
mutate(risk_label = case_when(
risk_quantile == 1 ~ "Very Low",
risk_quantile == 2 ~ "Low",
risk_quantile == 3 ~ "Moderate",
risk_quantile == 4 ~ "High",
risk_quantile == 5 ~ "Very High"
))
ggplot() +
geom_sf(data = fishnet_por, aes(fill = risk_label), colour = NA) +
scale_fill_manual(
values = my_colors_f,
name = "Flood Risk (Quantile-Based)"
) +
theme_void() +
labs(
title = "Predicted Flood Inundation Risk in Portland",
) +
theme(legend.position = "bottom")
This map presents the predicted flood inundation risk in Portland, derived by applying the logistic regression model trained on Calgary’s 2013 flood data. The predicted probabilities were scaled and categorized into five quantile-based risk levels: Very Low, Low, Moderate, High, and Very High.
The Calgary-Portland inundation prediction model uses logistic regression to estimate flood inundation risk based on key environmental and spatial predictors, including elevation, slope, distance to river, flow accumulation, and land cover. The model was trained on Calgary’s 2013 flood data, then validated and applied to both Calgary and Portland.
The model showed strong overall performance, with an accuracy of
91.47% on the test data and an AUC score of 0.8059, indicating good
discriminatory power. The confusion matrix analysis revealed a trade-off
between sensitivity (44.34%) and specificity (96.03%), reflecting the
model’s greater strength in identifying non-flooded areas than inundated
areas. The model is more uncertain in distinguishing flooded zones — an
important consideration when prioritizing areas for flood inundation
mitigation and preparation.
When applied to Portland, the model generated a flood risk map,
successfully transferring Calgary-trained insights to a different city.
The prediction map identified flood-prone areas aligned with low-lying
regions and rivers in Portland. This demonstrates the practical value of
spatial logistic regression in urban flood risk planning, especially
when inundation data may be limited in the target region.
Overall, the model provides a useful, interpretative, and transferable tool for flood inundation probability prediction, especially in cities with similar environmental and urban characteristics. Future study may include incorporating temporal rainfall data and more localized ground data to better detect inundation risk and support urban planning efforts.