Final Project

Flood Risk Concerns in North Carolina Counties

Introduction

Within this project, I will be studying the flood risk scores of North Carolina counties. This topic is important because flooding can be detrimental to many communities, cities, and homes in many ways; including polluting drinking water systems and causing several environmental problems. Being able to analyze the risk of floods and the amount of damage that is being done can allow us to better prepare for them and understand why they are happening. I will measure the likelihood of a North Carolina county to experience flooding by analyzing the county’s risk score. A risk score is the standardized metric for the likelihood that an individual (county) will experience a particular outcome.

Data Preparation

library(tidyverse)
library(arcpullr)
library(sf)
library(tigris)
library(tmap)
library(dplyr)
library(readr)
library(spdep)
library(stringr)
library(knitr)
library(kableExtra)

## Source the download script 
source("../PROJECT/Codes/download_project_data.R")

## Read in Files
dat<- read_csv("../raw_data/NRI_Table_Counties.csv")

Nc_Boundaries_shp <- read_sf("../PROJECT/raw_data/North_Carolina_State_and_County_Boundary_Polygons.shp")


## Select Columns from Each Data Set 
coastal_flood_data <- select(dat,
                             "OID_",
                             "STATE", 
                             "STATEABBRV", 
                             "COUNTY", 
                             "COUNTYFIPS", 
                             "POPULATION", 
                             "AREA", 
                             "RISK_VALUE", 
                             "RISK_SCORE", 
                             "RISK_RATNG", 
                             "CFLD_AFREQ", 
                             "CFLD_EXP_AREA", 
                             "CFLD_RISKV", 
                             "CFLD_RISKS", 
                             "CFLD_RISKR")

NC_Counties_dat <- select(Nc_Boundaries_shp, 
                          "OBJECTID",
                          "County",
                          "FIPS",
                          "Shape__Are",
                          "Shape__Len",
                          "GlobalID",
                          "geometry")

## Rename Each Data Sets Columns 
coastal_flood_data <- coastal_flood_data %>%
  rename(ANNUAL_FREQ = CFLD_AFREQ,
         RISK_INDEX_VALUE = CFLD_RISKV,
         RISK_INDEX_SCORE = CFLD_RISKS,
         RISK_INDEX_RATE = CFLD_RISKR)

NC_Counties_dat <- NC_Counties_dat %>%
  rename(COUNTY = County,
         AREA = Shape__Are,
         LENGTH = Shape__Len) 
    
## Mutate Columns of Coastal Flooding Data 
coastal_flood_data <- coastal_flood_data %>%
  mutate( 
    AnnualFrequency = case_when(
      RISK_RATNG %in% c("Relatively Moderate", "Relatively High", "High", "Very High") ~ ANNUAL_FREQ, TRUE ~ 0))
    

## Filter Data Set to Only Include North Carolina Counties 
coastal_flood_data <- coastal_flood_data %>% filter(STATE %in% c("North Carolina"))


## Join Data to Create a Single Table for Risk Score Map 1
NC_Counties_sp <- left_join(NC_Counties_dat,
                   coastal_flood_data,
                   by = c("COUNTY" = "COUNTY"))

The study area of my analysis is North Carolina counties with a special interest in analyzing the counties that have a high risk of flooding. The National Risk Index was sourced in order to receive the numerical data associated with flood risks across all U.S. states/counties and the NC One Map provided the spatial data of North Carolina counties. This NC One Map polygon data was sourced through a downloaded zip file and was read in. Once the data was downloaded from the proper sources, i.e. The National Risk Index and NC One Map, the next step in data processing was to select and rename the columns that would be beneficial to my research. I then filtered the data to only include counties within North Carolina. After that, I joined the two data sets into a singular table. Doing this allowed me to view all the data I needed to complete my analysis within one table.

Exploratory Spatial Data Analysis

Data Description and Summary

## Number of Total Observations
num_obs <- nrow(NC_Counties_sp)

## Number of NA Observations for Risk Score Column
na_obs <- sum(is.na(NC_Counties_sp$RISK_SCORE))

## Calculate Mean Flood Risk Scores Through ALL NC Counties 
mean_floodrisk_nc <- mean(NC_Counties_sp$RISK_SCORE)

## Calculate Minimum Flood Risk Scores Through ALL NC Counties 
min_floodrisk_nc <- min(NC_Counties_sp$RISK_SCORE)
  
## Calculate Maximum Flood Risk Scores Through ALL NC Counties 
max_floodrisk_nc <- max(NC_Counties_sp$RISK_SCORE)

## County with the Minimum Flood Risk Score 
index_lowest_risk <- which.min(NC_Counties_sp$RISK_SCORE)
county_lowest_risk <- NC_Counties_sp$COUNTY[index_lowest_risk]

## County with the Maximum Flood Risk Score 
index_highest_risk <- which.max(NC_Counties_sp$RISK_SCORE)
county_highest_risk <- NC_Counties_sp$COUNTY[index_highest_risk]        

## Make Histogram That Shows the Number of Counties within each risk score
NC_Counties_sp |>
  ggplot(mapping = aes(x = RISK_SCORE)) + 
  geom_histogram(binwidth = 1,
                 na.rm = TRUE,
                 color = "black",
                 fill = "lightblue") +
  labs(title = "Flood Risk Scores of North Carolina Counites", 
       y = "Number of Counties",
       x = "Flood Risk Scores")

## Create Kable Table That Shows the Descriptive Statistics
des_table <- data.frame(
  Measure = c("Number of Total Observations", "Number of NA Observations", "Mean Flood Risk Score", "Min Flood Risk Score", "Max Flood Risk score"),
  Value = c(num_obs,na_obs, mean_floodrisk_nc, min_floodrisk_nc, max_floodrisk_nc),
  County = c("","","", county_lowest_risk, county_highest_risk)
)

kable(des_table, "html") |>
  kable_styling(full_width = FALSE,
                latex_options = "hold_position",
                position = "center") |>
  add_header_above(c("Descriptive Statistics of Flood Risk Scores" = 3), escape = FALSE)

Descriptive Statistics of Flood Risk Scores
Measure	Value	County
Number of Total Observations	100.000000
Number of NA Observations	0.000000
Mean Flood Risk Score	66.529431
Min Flood Risk Score	7.572383	Clay
Max Flood Risk score	98.504613	New Hanover

Since my variable of interest is flood risk scores within each North Carolina county, this data demonstrates the observations of flood risks. Within this data there is a total number of 100 observations which represents each county. 0 of these observations are NA and do not provide information to my data.The mean number of coastal flood risks in throughout North Carolina counties is 66.5294305. The county with the lowest flood risk score is Clay with a score of 7.5723831, and the county with the highest flood risk score is New Hanover with a score of 98.5046134.

Within the produced histogram the x-axis represents the risk score level, with the higher the risk score the more likely the risk of flooding, and the y-axis represents the number of counties. We can see in this histogram that there are more counties in North Carolina that have higher risk scores rather than counties with lower risk scores; this is also shown when looking at the mean flood risk score of all counties in North Carolina, with it being 66.5294305.

Geographic Distribution and Clustering

## Put tmap in view/interactive mode 
tmap_mode("view")

## Make Map of All Counties Risk Ratings 
 map_1 <- tm_shape(NC_Counties_sp) + 
           tm_polygons("RISK_SCORE",
                 title = "Risk of Flooding Across North Carolina Counties",
                 style = "jenks", 
                 n = 5,
                 palette = "YlOrRd",
                 alpha = 0.9,
                 border.col = "black",
                 border.alpha = 0.3) +
          tm_layout(title = "North Carolina",
                legend.outside = TRUE,
                 frame = FALSE) +
   tm_scale_bar() +
   tm_view(view.legend.position = c("left", "bottom"))

## Create link to basemap background 
bg_basemap <- "https://{s}.basemaps.cartocdn.com/rastertiles/voyager/{z}/{x}/{y}{r}.png"

## Analyze Moran I and LISA spatial autocorrelation 
nc <- read_sf(system.file("shape/nc.shp", package = "sf"))

## Remove NA regions 
nc_c <- nc |>
  filter(!is.na(SID74))

# Create a spatial weights matrix
W <- poly2nb(nc)
W <- nb2listw(W, style = "W")

# Calculate Moran's I
moran.I <- moran.test(nc$SID74, 
                      W,
                      zero.policy = TRUE)
# Print summary to screen
print(moran.I)

## 
##  Moran I test under randomisation
## 
## data:  nc$SID74  
## weights: W    
## 
## Moran I statistic standard deviate = 2.5192, p-value = 0.00588
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic       Expectation          Variance 
##       0.147740529      -0.010101010       0.003925567

## Moran's I value
moran_value <- moran.I$estimate[1]

## P-Value for Moran's I 
moran_p_value <- moran.I$p.value

# Calculate Local Indicators of Spatial Association (LISA)
lisa <- localmoran(nc$SID74,                        # Variable that I am analyzing
                   listw = W,                       # Weights the object
                   alternative = "two.sided",       # Clustering or Dispersion
                   zero.policy = TRUE) |>           # Best to keep TRUE LISA
  as_tibble() |>                                    # Better Object Type
  mutate(across(everything(), as.vector))           # Remove junk from localmoran output

# Add Required valyes for LISA category 
lisa <- lisa |>
  mutate(SCVAR =  scale(nc$SID74) |> as.vector(),       # Original data column
         LAGVAR = lag.listw(W, scale(nc$SID74)), 
         # Lag of original data column
         LISACAT = case_when(SCVAR >= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 1,
                             SCVAR <= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 2,
                             SCVAR >= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 3,
                             SCVAR <= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 4,
                             `Pr(z != E(Ii))` > 0.05 ~ 5))


# Add lavels based on the values 
lisa <- lisa |>
  mutate(CATNAME = case_when(LISACAT == 1 ~ "HH",
                             LISACAT == 2 ~ "LL",
                             LISACAT == 3 ~ "HL",
                             LISACAT == 4 ~ "LH",
                             LISACAT == 5 ~ "Not Significant"))

# Create a Table of Lisa Statistics 
table(lisa$CATNAME)

## 
##              HH              LH Not Significant 
##               5               1              94

# Add LISA category column to the spatial data 
nc <- nc |>
  mutate(LISACAT = lisa$LISACAT,
         CATNAME = lisa$CATNAME)
# Create a LISA map
lisa_tmap <- tm_shape(nc_c) +
  tm_polygons(col = "grey50") +
  tm_shape(nc) + 
  tm_polygons("LISACAT", 
              title = "LISA Category",
              breaks = c(1, 2, 3, 4, 5, 6),
              palette =  c("red", 
                           "blue", 
                           "lightpink", 
                           "skyblue", 
                           "grey90"),
              colorNA = "white",
              labels = c("High-High", 
                         "Low-Low",
                         "High-Low",
                         "Low-High", 
                         "Not significant"),
              border.col = "black", 
              border.alpha = 0.25) +
  tm_layout(frame = FALSE,
            legend.outside = TRUE)

# This command maps them together!
final_maps <- tmap_arrange(map_1,
             lisa_tmap,
             nrow = 1, 
             ncol = 2, 
             sync = TRUE)

# Display map
final_maps

Through the analysis of the produced Choropleth Map, it easy to see the clustering of counties that have a higher risk of flooding. The composition of the choropleth map is using the flood risk score variable, so the color of each county demonstrates the risk score of each. By analyzing the legend, we can tell that the darkest red color represents areas that have the highest concern of flooding and the lightest yellow/tan color represents areas that are least likely for flooding. It is apparent that the south-east of North Carolina has the highest concern of flooding while the more western counties do not have a major risk.

Moran I statistic measures the spatial autocorrelation within data, in the case of this research the Moran I statistic is 0.1477405. With this value being positive, the statistic indicates that there is a spatial pattern where high values are close to other high values, and low values are close to other low values. The p-value that is associated with the Moran I statistic standard deviate and indicates the statistical significance of the spatial autocorrelation; in the case of this research there is a p-value of 0.0058804. This value is less than the commonly used significance level of 0.05, which means that there is likely a spatial autocorrelation within my data. With both the Moran I statistic and p-value, it is clear that within this data, the observed spatial pattern indicates that similar values of the variable tend to cluster together in space.

The LISA map provides information on the spatial patterns of the flood risk variable and helps identify clusters within the data along with spatial outliers. Through the LISA map associated with this research, it is apparent that there is a positive autocorrelation where there are five counties that represent HH clusters; meaning that high values tend to be near other high values. The LISA spatial autocorrelation analysis also shows that 94 of the 100 counties do not have an significant clustering; of the 6 that do have significant spatial patterns 5 of them shows HH clusters and 1 shows LH. However, it is also apparent that there is one county that has a negative autocorrelation with a LH cluster; meaning that there are high values near low values. This LH cluster appears to be the outlier within the data.

Conclusions

In summary, this research has allowed us to better understand which counties in North Carolina should be concerned about flooding and which counties flooding is not much of a problem in. Through this analysis with aid from the generated choropleth map and Moran I’s and LISA spatial autocorrelation analysis, it revealed that the clustering of flood risks across North Carolina demonstrates the southeastern region as having a notably higher risk. For future endeavors, it is imperative to leverage the identified high-risk clusters and spatial patterns to inform and implement targeted flood mitigation strategies, allocating proper resources to the most vulnerable counties. By integrating spatial analytics and mapping techniques, this project not only enhances our understanding of flood risk patterns in North Carolina but also equips decision-makers with valuable insights to address vulnerabilities and guide strategic interventions in the risk of potential flooding events.

Document Statistics

Word Count

Method	koRpus	stringi
Word count	998	1000
Character count	6087	6087
Sentence count	45	Not available
Reading time	5 minutes	5 minutes