Predict Inundation in Portland Using Inundation Model Trained on Calgary

1. Introduction

Flooding is one of the most costly and catastrophic natural hazards, especially in urban environments where impervious surfaces and population density increase exposure. Inundation risks are also projected to rise due to climate change, continued urban development, and aging infrastructure. In light of these risks, predictive modelling of flooding inundation can inform strategic planning and identify vulnerable area in Portland in advance.

This project develops a spatial logistic regression model using the 2013 Calgary flood as a reference event. The model includes geographical and environmental variables - including elevation, slope, flow accumulation, land cover, and distance to rivers - to estimate the probability of flood inundation across a city. These variables were derived using GIS-based pre-processing and extracted at the grid-cell level for analysis in R. The model was trained on 70% of Calgary data and tested on the remaining 30%. It was then applied to a comparable city - Portland, Oregon to evaluate flood inundation probability.

Key results indicate that all variables were statistically significant, and the model performed well in predicting inundated and non-inundated areas, especially good at identifying non-inundated zones, but with some underestimation of flood zones. The confusion matrix and ROC analysis confirmed the model’s ability to generalize beyond the training data. When applied to Portland, the model successfully identifies high inundation probability areas.

2. Setup

knitr::opts_chunk$set(
  echo = TRUE,
  message = FALSE,
  warning = FALSE,
  cache = TRUE
)

knitr::opts_chunk$set(echo = TRUE)
options(scipen=999)
library(knitr)
library(kableExtra)
library(terra)
library(sf)
library(dplyr)
library(ggplot2)
library(raster)
library(scales)
library(cowplot)
library(grid)
library(gridExtra)
library(caret)
library(pscl)
library(plotROC)
library(pROC)
library(sf)
library(tidyverse)
library(kableExtra)
library(tigris)


# Define custom color gradient
my_colors <- c("#033E56", "#9AC3BB", "#D3D477",  "#FAAE41", "#ED6328")
my_colors_f <- c("#ED6328", "#FAAE41", "#D3D477",  "#9AC3BB", "#033E56")

2.1 Load spatial data, including city boundary and variables

To develop the model, spatial data sets are loaded for both Calgary and Portland, including Elevation, Slope, Flow Accumulation, Distance to River, and Land Cover (Developed, Forest, and Grassland).

The coordinate systems used in this study is:

GS_1984_Web_Mercator_Auxiliary_Sphere

2.2 Normalize elevation and flow accumulation data in two cities

When scaling elevation to a range of 0–300, the rationale comes from the elevation ranges of Calgary (967 – 1,290 m) and Portland (3 – 370 m), which have a difference of approximately 300 meters. This standardization ensures comparability across regions. Similarly, scaling flow accumulation is critical because, when applying the model across diverse regions with varying hydrologic networks, normalizing these continuous variables prevents large-magnitude predictors from dominating the model. This maintains consistent variable influence between Calgary and Portland, ensuring balanced and interpretable results.

fishnet_cal$Nor_Elevation <- rescale(fishnet_cal$Elevation, to = c(0, 300))
fishnet_cal$Nor_Flow_accu <- rescale(fishnet_cal$Flow_accu, to = c(0, 10000))

fishnet_por$Nor_Elevation <- rescale(fishnet_por$Elevation, to = c(0, 300))
fishnet_por$Nor_Flow_accu <- rescale(fishnet_por$Flow_accu, to = c(0, 10000))

3. Exploratory analysis

3.1 Selected Spatial Predictors: Feature Mapping in Calgary

To predict flood inundation risk, we selected five spatially derived variables:

Elevation: Lower elevations are more flood-prone due to more likely to accumulate water. In Calgary, areas inundated during the 2013 flood were closely aligned with low elevation zones.
Flow Accumulation: It estimates the amount of upstream area contributing to flow at any given cell. Higher values indicate potential overland flow convergence.
Distance to River: Calculated using Euclidean distance from the river network. Proximity to water bodies significantly influences inundation risk.
Slope: Steeper areas are less likely to retain water. Flat slopes may accumulate runoff.
Land Cover: Reclassified into three dominant categories — Developed, Forest, and Grassland — to capture infiltration capacity and runoff characteristics.

The maps below visualize these predictor variables over Calgary’s predicted model:

Elevation_cal_plot <- ggplot() + 
  geom_sf(data = fishnet_cal, aes(fill = Nor_Elevation), color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Elevation (m)") + 
  theme_void()

Slope_cal_plot <- ggplot() + 
  geom_sf(data = fishnet_cal, aes(fill = Slope), color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Slope (%)") +
  theme_void()

Dis_River_cal_plot <- ggplot() + 
  geom_sf(data = fishnet_cal, aes(fill = Dis_River), color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Distance (m)") + 
  theme_void()

plot_grid(
   ggdraw() + draw_label("Elevation", x = 0.5, hjust = 0.5),
   ggdraw() + draw_label("Slope", x = 0.5, hjust = 0.5),
   ggdraw() + draw_label("Distance to River", x = 0.5, hjust = 0.5),
   Elevation_cal_plot, Slope_cal_plot, Dis_River_cal_plot,
   ncol = 3, rel_heights = c(0.1, 1),
   align = "v", axis = "lr"
 )

Flow_accu_cal_plot <- ggplot() + 
  geom_sf(data = fishnet_cal, aes(fill = Nor_Flow_accu), color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Flow Accumulation") + 
  theme_void() +
  theme(
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 8)
  )

Landcover_cal_plot <- ggplot() +
  geom_sf(data = city_boundary_cal, fill = "#e9e9e9", color = "grey80") +
  geom_sf(data = developed_cal, aes(fill = "Developed"), color = NA) +
  geom_sf(data = forest_cal, aes(fill = "Forest"), color = NA) +
  geom_sf(data = grassland_cal, aes(fill = "Grassland"), color = NA) +
  scale_fill_manual(
    name = "Land Cover Type",
    values = c("Developed" = "#033E56", "Forest" = "#9AC3BB", "Grassland" = "#D3D477")
  ) +
  theme_void() +
  theme(
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 8)
  )

plot_grid(
   ggdraw() + draw_label("Landcover", x = 0.5, hjust = 0.5, size = 11),
   ggdraw() + draw_label("Flow Accumulation", x = 0.5, hjust = 0.5, size = 11),
   Landcover_cal_plot, Flow_accu_cal_plot, 
   ncol = 2, rel_heights = c(0.1, 1),
   align = "v", axis = "lr"
 )

4. Exploratory Data Analysis

4.1 Grouped Bar Plot

# Reshape fishnet data into long format for group mean calculation
floodEDA <- fishnet_cal %>%
  as.data.frame() %>%
  dplyr::select(inundated_binary, Elevation, Dis_River, Slope, Flow_accu, Developed, Forest, Grassland) %>%
  pivot_longer(cols = -inundated_binary, names_to = "variable", values_to = "value")

# Plot grouped bar charts of mean values
floodEDA %>%
  group_by(inundated_binary, variable) %>%
  summarize(mean = mean(value, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = as.factor(inundated_binary), y = mean, fill = as.factor(inundated_binary))) +
  geom_bar(stat = "identity") +
  facet_wrap(~variable, scales = "free") +
  scale_fill_manual(values = c("#033E56", "#9AC3BB"),
                    labels = c("Not Inundated", "Inundated"),
                    name = "") +
  labs(title = "Mean Predictor Values by Inundation Status",
       x = "Inundation Status", 
       y = "Mean Predictor Value") + 
    theme_minimal()

Group bar plot compares the mean of each predictor between flooded and non-flooded cells. Useful for identifying variables that may distinguish 0 vs. 1.

4.2 Violin Plot

floodEDA_violin <- floodEDA
# Violin plot to show full variable distribution by binary outcome
ggplot(floodEDA_violin) + 
  geom_violin(aes(x = as.factor(inundated_binary), y = value, fill = as.factor(inundated_binary)), alpha = 0.7) + 
  facet_wrap(~variable, scales = "free") +
  scale_fill_manual(values = c("#033E56", "#9AC3BB"),
                    labels = c("Not Inundated", "Inundated"),
                    name = "") +
  labs(title = "Distribution of Predictor Variables by Inundation Status",
       x = "Inundation Status", 
       y = "Value") + 
  theme_minimal()

Violin plot shows distribution shape and spread of each variable across classes, helping with detecting skewness or overlap.

5. Model Training

5.1 Splitting the Data into Training and Testing Sets

In our model, we used 70% of the Calgary fishnet grid cells as the training set to build the logistic regression model and the remaining 30% as the test set to evaluate its performance. This ensures that we can further assess how well our model generalizes to new, unseen data.

set.seed(3456)
trainIndex <- createDataPartition(fishnet_cal$inundated_binary, p = .70,
                                  list = FALSE,
                                  times = 1)
fishnetTrain <- fishnet_cal[trainIndex, ]
fishnetTest <- fishnet_cal[-trainIndex, ]

5.2 Building the Logistic Regression Model

The logistic regression model estimates the probability of flood inundation in Calgary based on several variables, including elevation (normalized), distance to river, slope, flow accumulation(normalized), and land cover types (developed, forest, grassland). The model was fitted using the glm() function with a binomial logit link.

# Logistic regression model using normalized values
floodModel <- glm(inundated_binary ~ Nor_Elevation + Dis_River + Slope + Nor_Flow_accu + 
                                  Developed + Forest + Grassland, 
                  family = binomial(link = "logit"), 
                  data = fishnetTrain %>%
                         as.data.frame() %>%
                    dplyr::select(-geometry))

summary(floodModel)

## 
## Call:
## glm(formula = inundated_binary ~ Nor_Elevation + Dis_River + 
##     Slope + Nor_Flow_accu + Developed + Forest + Grassland, family = binomial(link = "logit"), 
##     data = fishnetTrain %>% as.data.frame() %>% dplyr::select(-geometry))
## 
## Coefficients:
##                   Estimate   Std. Error z value             Pr(>|z|)    
## (Intercept)    2.033682464  0.183742508  11.068 < 0.0000000000000002 ***
## Nor_Elevation -0.015811750  0.001006577 -15.708 < 0.0000000000000002 ***
## Dis_River     -0.000440021  0.000047496  -9.264 < 0.0000000000000002 ***
## Slope         -0.063419559  0.020538595  -3.088              0.00202 ** 
## Nor_Flow_accu  0.000462045  0.000050111   9.220 < 0.0000000000000002 ***
## Developed     -0.000073550  0.000004461 -16.488 < 0.0000000000000002 ***
## Forest         0.000045883  0.000010964   4.185            0.0000285 ***
## Grassland     -0.000061328  0.000004846 -12.656 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8477.4  on 14492  degrees of freedom
## Residual deviance: 6587.8  on 14485  degrees of freedom
## AIC: 6603.8
## 
## Number of Fisher Scoring iterations: 6

# Extract coefficients from the model summary
model_summary <- coef(summary(floodModel))  

# Convert to a data frame and round decimals for readability
model_table <- as.data.frame(model_summary) %>%
  mutate(
    Estimate = round(Estimate, 4),
    `Std. Error` = round(`Std. Error`, 4),
    `z value` = round(`z value`, 4),
    `Pr(>|z|)` = round(`Pr(>|z|)`, 4)
  )
# Display the table using kable
kable(model_table, caption = "Logistic Regression Model Summary", col.name=c("Variables", 'Estimate','Standard Deviation Error','Z value', 'Pr')) %>%
  kable_styling("striped", full_width = T)

Logistic Regression Model Summary
Variables	Estimate	Standard Deviation Error	Z value	Pr
(Intercept)	2.0337	0.1837	11.0681	0.000
Nor_Elevation	-0.0158	0.0010	-15.7084	0.000
Dis_River	-0.0004	0.0000	-9.2644	0.000
Slope	-0.0634	0.0205	-3.0878	0.002
Nor_Flow_accu	0.0005	0.0001	9.2204	0.000
Developed	-0.0001	0.0000	-16.4883	0.000
Forest	0.0000	0.0000	4.1849	0.000
Grassland	-0.0001	0.0000	-12.6559	0.000

The result indicate that all variables included in the model are statistically significant, with p-values below 0.01, suggesting strong associations with flood risk. In addition, elevation, distance to river, slope, developed land, and grassland all have negative coefficients, meaning they are associated with decreased odds of inundation. In contrast, flow accumulation and forest area are positively associated with higher flood likelihood.

6. Model Validation

6.1 Density of Predicted Flood Risk by Inundation Outcome

classProbs <- predict(floodModel, newdata = fishnetTest, type = "response")

hist(classProbs,
     main = "Distribution of Predicted Flood Probabilities",
     xlab = "Predicted Probability",
     col = "#033E56", border = "white")

The overall distribution of predicted flood probabilities generated by the logistic regression model for the test set. Most of the predicted probabilities are clustered near zero, indicating that the model classifies the majority of cells as not inundated.

testProbs <- data.frame(obs = fishnetTest$inundated_binary, pred = classProbs)
# Density plot
ggplot(testProbs, aes(x = pred, fill = as.factor(obs))) + 
  geom_density(alpha=0.6) +
  facet_grid(obs ~ .) + 
  xlab("Predicted Probability of Inundation") +
  ylab("Density") +
  geom_vline(xintercept = 0.5, linetype="dashed") +
  scale_fill_manual(values = c("#033E56", "#9AC3BB"),
                    labels = c("Not Inundated", "Inundated"),
                    name = "Observed") +
  ggtitle("Predicted Probability vs. Actual Flood Inundation") +
  theme_minimal()

This density plot shows the distribution of predicted flood probabilities generated by the logistic regression model, separated by the actual observed inundation status (0 = Not Inundated, 1 = Inundated). It reveals that non-flooded cells cluster around low probabilities, while flooded cells have a broader spread, though many still fall below 0.5. Together, these plots suggest the model is good at identifying non-inundated areas but less confident for flooded zones, with some overlap between classes.

6.2 Confusion Metrics

testProbs$predClass <- ifelse(testProbs$pred > 0.2, 1, 0)
confusionMatrix(reference = as.factor(testProbs$obs),
                data = as.factor(testProbs$predClass),
                positive = "1")

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1
##          0 5438  305
##          1  225  243
##                                           
##                Accuracy : 0.9147          
##                  95% CI : (0.9074, 0.9215)
##     No Information Rate : 0.9118          
##     P-Value [Acc > NIR] : 0.2175389       
##                                           
##                   Kappa : 0.4322          
##                                           
##  Mcnemar's Test P-Value : 0.0006002       
##                                           
##             Sensitivity : 0.44343         
##             Specificity : 0.96027         
##          Pos Pred Value : 0.51923         
##          Neg Pred Value : 0.94689         
##              Prevalence : 0.08823         
##          Detection Rate : 0.03912         
##    Detection Prevalence : 0.07535         
##       Balanced Accuracy : 0.70185         
##                                           
##        'Positive' Class : 1               
##

For this model, a threshold of 0.2 was chosen to classify areas as flood-inundated. This lower threshold was selected to prioritize sensitivity. While this increases the chance of false positives (predicting flood where none occurs), it reduces the risk of missing true flood zones, which is important in disaster preparation and planning.

The model achieved an overall accuracy of 91.47%. The sensitivity (true positive) is 44.34%, indicating that the model correctly identified approximately 44% of truly inundated locations. In contrast, the specificity is 96.03%, showing its strong performance in correctly identifying non-inundated areas. The Kappa statistic of 0.43 suggests moderate agreement between predicted and actual classes beyond chance.

6.2.1 Interpretation of Confusion Matrix Outcomes

True Positive (TP): The model correctly predicted an area as inundated, and it was actually inundated.
False Positive (FP): The model predicted an area as inundated, but it was not actually inundated.
True Negative (TN): The model correctly predicted an area as not inundated, and it was truly not inundated.
False Negative (FN): The model predicted an area as not inundated, but it was actually inundated.

# Table of confusion counts
conf_mat <- confusionMatrix(reference = as.factor(testProbs$obs),
                            data = as.factor(testProbs$predClass),
                            positive = "1")
# Table of confusion counts
cm_table <- conf_mat$table
TN <- cm_table[1,1]; FP <- cm_table[2,1]
FN <- cm_table[1,2]; TP <- cm_table[2,2]

confusion_counts <- data.frame(
  Outcome = c("True Positive (TP)", "False Positive (FP)", 
              "True Negative (TN)", "False Negative (FN)"),
  Count = c(TP, FP, TN, FN)
)

kable(confusion_counts, caption = "Confusion Matrix Breakdown") %>%
  kable_styling("striped", full_width = T)

Confusion Matrix Breakdown
Outcome	Count
True Positive (TP)	243
False Positive (FP)	225
True Negative (TN)	5438
False Negative (FN)	305

6.2.2 Confusion Matrix Map: Spatial Distribution of Model Predictions

The map visualizes the spatial distribution of model prediction outcomes for each fishnet cell in Calgary. This full-area map is based on predictions using a threshold of 0.2, providing insights into where the model is performing well and where it is making errors.

fishnet_cal$predicted_prob <- predict(floodModel, newdata = fishnet_cal, type = "response")
fishnet_cal$predicted_class <- ifelse(fishnet_cal$predicted_prob > 0.2, 1, 0)
fishnet_cal <- fishnet_cal %>%
  mutate(confusion_type = case_when(
    inundated_binary == 1 & predicted_class == 1 ~ "True Positive",
    inundated_binary == 0 & predicted_class == 0 ~ "True Negative",
    inundated_binary == 0 & predicted_class == 1 ~ "False Positive",
    inundated_binary == 1 & predicted_class == 0 ~ "False Negative",
    TRUE ~ NA_character_
  ))
confusion_colors <- c(
  "True Positive" = "#9AC3BB",    
  "True Negative" = "#033E56",   
  "False Positive" = "#ED6328",   
  "False Negative" = "#FAAE41"  
)

# Map
ggplot(data = fishnet_cal) +
  geom_sf(aes(fill = confusion_type), color = NA, size = 0.01) +
  scale_fill_manual(values = confusion_colors, name = "Results") +
  labs(title = "Confusion Metrics",
       subtitle = "Spatial Classification Based") +
  theme_void() +
  theme(legend.position = "right",
        plot.title = element_text(size = 14, face = "bold"))

Purple is the area correctly identified as inundated, while light gray area indicates correct non-flood predictions. Blue shows where the model predicted flooding but no inundation occurred, and red is missed flood areas that were incorrectly classified as dry. The map shows that many True Positives align with river-adjacent zones, but there are still notable clusters of False Negatives, highlighting areas where the model underestimates the risk — a key concern for flood inundation risk mitigation.

6.3 Receiver Operating Characteristic (ROC) Curve & Area Under the Curve (AUC)

The ROC curve visualizes the model’s ability to distinguish between flooded (positive) and non-flooded (negative) areas across all classification thresholds.

ggplot(testProbs, aes(d = obs, m = pred)) + 
  geom_roc(n.cuts = 50, labels = FALSE, color = "#033E56") + 
  style_roc(theme = theme_grey) +
  geom_abline(slope = 1, intercept = 0, size = 1.2, color = 'grey', linetype = "dashed") +
  labs(title = "ROC Curve of Flood Inundation Model",
       x = "False Positive Rate (1 - Specificity)",
       y = "True Positive Rate (Sensitivity)") +
  theme_minimal()

auc_value <- auc(testProbs$obs, testProbs$pred)
auc_value

## Area under the curve: 0.8059

The ROC curvature and distance from the diagonal “random guess” line indicate strong discriminatory power. The blue line shows high sensitivity (true positive rate) at low false positive rates, suggesting that the model performs well in identifying flooded areas. The AUC is 0.8059, reflecting a good discriminatory ability.

6.4 Cross-Validation

To assess the model’s generalizability and reduce overfitting risk, we applied 5-fold cross-validation. This method involves splitting the training dataset into five equal parts (folds). The model is trained on four of the folds and validated on the fifth.

# Set up cross-validation control
train_control <- trainControl(method = "cv", number = 5)  # 5-fold cross-validation

# Fit logistic regression model with normalized inputs
floodModel_cv <- train(as.factor(inundated_binary) ~ Nor_Elevation + Dis_River + Slope + Nor_Flow_accu + Developed + Forest + Grassland,
                       data = fishnetTrain %>% as.data.frame() %>% dplyr::select(-geometry),  # disambiguate select()
                       method = "glm",
                       family = "binomial",
                       trControl = train_control)
print(floodModel_cv)

## Generalized Linear Model 
## 
## 14493 samples
##     7 predictor
##     2 classes: '0', '1' 
## 
## No pre-processing
## Resampling: Cross-Validated (5 fold) 
## Summary of sample sizes: 11594, 11594, 11595, 11595, 11594 
## Resampling results:
## 
##   Accuracy   Kappa   
##   0.9249984  0.283532

In our model, cross-validation generated an accuracy of 92.5% and a Kappa statistic of 0.284. The high accuracy confirms the model’s strong ability to correctly predict both inundated and non-inundated areas overall. However, the moderate Kappa indicates room for improvement in distinguishing the inundated area from the non-inundated class.

7. Predict Maps

7.1 Predictive Maps for Calgary

The map displays the spatial distribution of predicted flood risk in Calgary, generated by applying the trained logistic regression model to the entire Calgary fishnet data set.

fishnet_cal$calgaryPredictions <- predict(floodModel, 
                                          newdata = fishnet_cal %>% 
                                            as.data.frame() %>%
                                            dplyr::select(Nor_Elevation, Dis_River, Slope, Nor_Flow_accu,
                                                          Developed, Forest, Grassland),
                                          type = "response") * 100

fishnet_cal$risk_quantile <- ntile(fishnet_cal$calgaryPredictions, 5)
fishnet_cal <- fishnet_cal %>%
  mutate(risk_label = case_when(
    risk_quantile == 1 ~ "Very Low",
    risk_quantile == 2 ~ "Low",
    risk_quantile == 3 ~ "Moderate",
    risk_quantile == 4 ~ "High",
    risk_quantile == 5 ~ "Very High"
  ))
ggplot() + 
  geom_sf(data = fishnet_cal, aes(fill = risk_label), colour = NA) +
  scale_fill_manual(
    values = my_colors_f,
    name = "Flood Risk (Quantile-Based)"
  ) +
  theme_void() +
  labs(
    title = "Predicted Flood Inundation Risk in Calgary",
  ) +
  theme(legend.position = "bottom")

7.2 Predictive Maps for Portland

7.2.1 Input Spatial Variables in Portland

Elevation_por_plot <- ggplot() + 
  geom_sf(data = fishnet_por, aes(fill = Nor_Elevation), col = "transparent", color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
                    name = "Elevation (m)") + 
  theme_void()

Slope_por_plot <- ggplot() + 
  geom_sf(data = fishnet_por, aes(fill = Slope), col = "transparent", color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
                    name = "Slope (%)") +
  theme_void()

Dis_River_por_plot <- ggplot() + 
  geom_sf(data = fishnet_por, aes(fill = Dis_River), col = "transparent", color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9",
                    name = "Distance (m)") + 
  theme_void()

plot_grid(
   ggdraw() + draw_label("Elevation", x = 0.5, hjust = 0.5),
   ggdraw() + draw_label("Slope", x = 0.5, hjust = 0.5),
   ggdraw() + draw_label("Distance to River", x = 0.5, hjust = 0.5),
   Elevation_por_plot, Slope_por_plot, Dis_River_por_plot,
   ncol = 3, rel_heights = c(0.1, 1),
   align = "v", axis = "lr"
 )

Flow_accu_por_plot <- ggplot() + 
  geom_sf(data = fishnet_por, aes(fill = Nor_Flow_accu), color = NA) +
  scale_fill_gradientn(colors = my_colors, na.value = "#e9e9e9", name = "Flow Accumulation") + 
  theme_void() +
  theme(
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 8)
  )

Landcover_por_plot <- ggplot() +
  geom_sf(data = city_boundary_por, fill = "#e9e9e9", color = "grey80") +
  geom_sf(data = developed_por, aes(fill = "Developed"), color = NA) +
  geom_sf(data = forest_por, aes(fill = "Forest"), color = NA) +
  geom_sf(data = grassland_por, aes(fill = "Grassland"), color = NA) +
  scale_fill_manual(
    name = "Land Cover Type",
    values = c("Developed" = "#033E56", "Forest" = "#9AC3BB", "Grassland" = "#D3D477")
  ) +
  theme_void() +
  theme(
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 8)
  )

plot_grid(
   ggdraw() + draw_label("Flow Accumulation", x = 0.5, hjust = 0.5, size = 11),
   ggdraw() + draw_label("Landcover", x = 0.5, hjust = 0.5, size = 11),
   Flow_accu_por_plot, Landcover_por_plot, 
   ncol = 2, rel_heights = c(0.1, 1),
   align = "v", axis = "lr"
 )

7.2.2 Prediction of Flood Inundation Probability in Portland

fishnet_por$portlandPredictions <- predict(floodModel, 
                                           newdata = fishnet_por %>% 
                                             as.data.frame() %>% 
                                             dplyr::select(Nor_Elevation, Dis_River, Slope, Nor_Flow_accu, Developed, Forest, Grassland), 
                                           type = "response") * 100

fishnet_por$risk_quantile <- ntile(fishnet_por$portlandPredictions, 5)

fishnet_por <- fishnet_por %>%
  mutate(risk_label = case_when(
    risk_quantile == 1 ~ "Very Low",
    risk_quantile == 2 ~ "Low",
    risk_quantile == 3 ~ "Moderate",
    risk_quantile == 4 ~ "High",
    risk_quantile == 5 ~ "Very High"
  ))

ggplot() + 
  geom_sf(data = fishnet_por, aes(fill = risk_label), colour = NA) +
  scale_fill_manual(
    values = my_colors_f,
    name = "Flood Risk (Quantile-Based)"
  ) +
  theme_void() +
  labs(
    title = "Predicted Flood Inundation Risk in Portland",
  ) +
  theme(legend.position = "bottom")

This map presents the predicted flood inundation risk in Portland, derived by applying the logistic regression model trained on Calgary’s 2013 flood data. The predicted probabilities were scaled and categorized into five quantile-based risk levels: Very Low, Low, Moderate, High, and Very High.

8. Summary

The Calgary-Portland inundation prediction model uses logistic regression to estimate flood inundation risk based on key environmental and spatial predictors, including elevation, slope, distance to river, flow accumulation, and land cover. The model was trained on Calgary’s 2013 flood data, then validated and applied to both Calgary and Portland.

The model showed strong overall performance, with an accuracy of 91.47% on the test data and an AUC score of 0.8059, indicating good discriminatory power. The confusion matrix analysis revealed a trade-off between sensitivity (44.34%) and specificity (96.03%), reflecting the model’s greater strength in identifying non-flooded areas than inundated areas. The model is more uncertain in distinguishing flooded zones — an important consideration when prioritizing areas for flood inundation mitigation and preparation.

When applied to Portland, the model generated a flood risk map, successfully transferring Calgary-trained insights to a different city. The prediction map identified flood-prone areas aligned with low-lying regions and rivers in Portland. This demonstrates the practical value of spatial logistic regression in urban flood risk planning, especially when inundation data may be limited in the target region.

Overall, the model provides a useful, interpretative, and transferable tool for flood inundation probability prediction, especially in cities with similar environmental and urban characteristics. Future study may include incorporating temporal rainfall data and more localized ground data to better detect inundation risk and support urban planning efforts.