Within this project, I will be studying the flood risk scores of North Carolina counties. This topic is important because flooding can be detrimental to many communities, cities, and homes in many ways; including polluting drinking water systems and causing several environmental problems. Being able to analyze the risk of floods and the amount of damage that is being done can allow us to better prepare for them and understand why they are happening. I will measure the likelihood of a North Carolina county to experience flooding by analyzing the county’s risk score. A risk score is the standardized metric for the likelihood that an individual (county) will experience a particular outcome.
library(tidyverse)
library(arcpullr)
library(sf)
library(tigris)
library(tmap)
library(dplyr)
library(readr)
library(spdep)
library(stringr)
library(knitr)
library(kableExtra)
## Source the download script
source("../PROJECT/Codes/download_project_data.R")
## Read in Files
dat<- read_csv("../raw_data/NRI_Table_Counties.csv")
Nc_Boundaries_shp <- read_sf("../PROJECT/raw_data/North_Carolina_State_and_County_Boundary_Polygons.shp")
## Select Columns from Each Data Set
coastal_flood_data <- select(dat,
"OID_",
"STATE",
"STATEABBRV",
"COUNTY",
"COUNTYFIPS",
"POPULATION",
"AREA",
"RISK_VALUE",
"RISK_SCORE",
"RISK_RATNG",
"CFLD_AFREQ",
"CFLD_EXP_AREA",
"CFLD_RISKV",
"CFLD_RISKS",
"CFLD_RISKR")
NC_Counties_dat <- select(Nc_Boundaries_shp,
"OBJECTID",
"County",
"FIPS",
"Shape__Are",
"Shape__Len",
"GlobalID",
"geometry")
## Rename Each Data Sets Columns
coastal_flood_data <- coastal_flood_data %>%
rename(ANNUAL_FREQ = CFLD_AFREQ,
RISK_INDEX_VALUE = CFLD_RISKV,
RISK_INDEX_SCORE = CFLD_RISKS,
RISK_INDEX_RATE = CFLD_RISKR)
NC_Counties_dat <- NC_Counties_dat %>%
rename(COUNTY = County,
AREA = Shape__Are,
LENGTH = Shape__Len)
## Mutate Columns of Coastal Flooding Data
coastal_flood_data <- coastal_flood_data %>%
mutate(
AnnualFrequency = case_when(
RISK_RATNG %in% c("Relatively Moderate", "Relatively High", "High", "Very High") ~ ANNUAL_FREQ, TRUE ~ 0))
## Filter Data Set to Only Include North Carolina Counties
coastal_flood_data <- coastal_flood_data %>% filter(STATE %in% c("North Carolina"))
## Join Data to Create a Single Table for Risk Score Map 1
NC_Counties_sp <- left_join(NC_Counties_dat,
coastal_flood_data,
by = c("COUNTY" = "COUNTY"))
The study area of my analysis is North Carolina counties with a special interest in analyzing the counties that have a high risk of flooding. The National Risk Index was sourced in order to receive the numerical data associated with flood risks across all U.S. states/counties and the NC One Map provided the spatial data of North Carolina counties. This NC One Map polygon data was sourced through a downloaded zip file and was read in. Once the data was downloaded from the proper sources, i.e. The National Risk Index and NC One Map, the next step in data processing was to select and rename the columns that would be beneficial to my research. I then filtered the data to only include counties within North Carolina. After that, I joined the two data sets into a singular table. Doing this allowed me to view all the data I needed to complete my analysis within one table.
## Number of Total Observations
num_obs <- nrow(NC_Counties_sp)
## Number of NA Observations for Risk Score Column
na_obs <- sum(is.na(NC_Counties_sp$RISK_SCORE))
## Calculate Mean Flood Risk Scores Through ALL NC Counties
mean_floodrisk_nc <- mean(NC_Counties_sp$RISK_SCORE)
## Calculate Minimum Flood Risk Scores Through ALL NC Counties
min_floodrisk_nc <- min(NC_Counties_sp$RISK_SCORE)
## Calculate Maximum Flood Risk Scores Through ALL NC Counties
max_floodrisk_nc <- max(NC_Counties_sp$RISK_SCORE)
## County with the Minimum Flood Risk Score
index_lowest_risk <- which.min(NC_Counties_sp$RISK_SCORE)
county_lowest_risk <- NC_Counties_sp$COUNTY[index_lowest_risk]
## County with the Maximum Flood Risk Score
index_highest_risk <- which.max(NC_Counties_sp$RISK_SCORE)
county_highest_risk <- NC_Counties_sp$COUNTY[index_highest_risk]
## Make Histogram That Shows the Number of Counties within each risk score
NC_Counties_sp |>
ggplot(mapping = aes(x = RISK_SCORE)) +
geom_histogram(binwidth = 1,
na.rm = TRUE,
color = "black",
fill = "lightblue") +
labs(title = "Flood Risk Scores of North Carolina Counites",
y = "Number of Counties",
x = "Flood Risk Scores")
## Create Kable Table That Shows the Descriptive Statistics
des_table <- data.frame(
Measure = c("Number of Total Observations", "Number of NA Observations", "Mean Flood Risk Score", "Min Flood Risk Score", "Max Flood Risk score"),
Value = c(num_obs,na_obs, mean_floodrisk_nc, min_floodrisk_nc, max_floodrisk_nc),
County = c("","","", county_lowest_risk, county_highest_risk)
)
kable(des_table, "html") |>
kable_styling(full_width = FALSE,
latex_options = "hold_position",
position = "center") |>
add_header_above(c("Descriptive Statistics of Flood Risk Scores" = 3), escape = FALSE)
| Measure | Value | County |
|---|---|---|
| Number of Total Observations | 100.000000 | |
| Number of NA Observations | 0.000000 | |
| Mean Flood Risk Score | 66.529431 | |
| Min Flood Risk Score | 7.572383 | Clay |
| Max Flood Risk score | 98.504613 | New Hanover |
Since my variable of interest is flood risk scores within each North Carolina county, this data demonstrates the observations of flood risks. Within this data there is a total number of 100 observations which represents each county. 0 of these observations are NA and do not provide information to my data.The mean number of coastal flood risks in throughout North Carolina counties is 66.5294305. The county with the lowest flood risk score is Clay with a score of 7.5723831, and the county with the highest flood risk score is New Hanover with a score of 98.5046134.
Within the produced histogram the x-axis represents the risk score level, with the higher the risk score the more likely the risk of flooding, and the y-axis represents the number of counties. We can see in this histogram that there are more counties in North Carolina that have higher risk scores rather than counties with lower risk scores; this is also shown when looking at the mean flood risk score of all counties in North Carolina, with it being 66.5294305.
## Put tmap in view/interactive mode
tmap_mode("view")
## Make Map of All Counties Risk Ratings
map_1 <- tm_shape(NC_Counties_sp) +
tm_polygons("RISK_SCORE",
title = "Risk of Flooding Across North Carolina Counties",
style = "jenks",
n = 5,
palette = "YlOrRd",
alpha = 0.9,
border.col = "black",
border.alpha = 0.3) +
tm_layout(title = "North Carolina",
legend.outside = TRUE,
frame = FALSE) +
tm_scale_bar() +
tm_view(view.legend.position = c("left", "bottom"))
## Create link to basemap background
bg_basemap <- "https://{s}.basemaps.cartocdn.com/rastertiles/voyager/{z}/{x}/{y}{r}.png"
## Analyze Moran I and LISA spatial autocorrelation
nc <- read_sf(system.file("shape/nc.shp", package = "sf"))
## Remove NA regions
nc_c <- nc |>
filter(!is.na(SID74))
# Create a spatial weights matrix
W <- poly2nb(nc)
W <- nb2listw(W, style = "W")
# Calculate Moran's I
moran.I <- moran.test(nc$SID74,
W,
zero.policy = TRUE)
# Print summary to screen
print(moran.I)
##
## Moran I test under randomisation
##
## data: nc$SID74
## weights: W
##
## Moran I statistic standard deviate = 2.5192, p-value = 0.00588
## alternative hypothesis: greater
## sample estimates:
## Moran I statistic Expectation Variance
## 0.147740529 -0.010101010 0.003925567
## Moran's I value
moran_value <- moran.I$estimate[1]
## P-Value for Moran's I
moran_p_value <- moran.I$p.value
# Calculate Local Indicators of Spatial Association (LISA)
lisa <- localmoran(nc$SID74, # Variable that I am analyzing
listw = W, # Weights the object
alternative = "two.sided", # Clustering or Dispersion
zero.policy = TRUE) |> # Best to keep TRUE LISA
as_tibble() |> # Better Object Type
mutate(across(everything(), as.vector)) # Remove junk from localmoran output
# Add Required valyes for LISA category
lisa <- lisa |>
mutate(SCVAR = scale(nc$SID74) |> as.vector(), # Original data column
LAGVAR = lag.listw(W, scale(nc$SID74)),
# Lag of original data column
LISACAT = case_when(SCVAR >= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 1,
SCVAR <= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 2,
SCVAR >= 0 & LAGVAR <= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 3,
SCVAR <= 0 & LAGVAR >= 0 & `Pr(z != E(Ii))` <= 0.05 ~ 4,
`Pr(z != E(Ii))` > 0.05 ~ 5))
# Add lavels based on the values
lisa <- lisa |>
mutate(CATNAME = case_when(LISACAT == 1 ~ "HH",
LISACAT == 2 ~ "LL",
LISACAT == 3 ~ "HL",
LISACAT == 4 ~ "LH",
LISACAT == 5 ~ "Not Significant"))
# Create a Table of Lisa Statistics
table(lisa$CATNAME)
##
## HH LH Not Significant
## 5 1 94
# Add LISA category column to the spatial data
nc <- nc |>
mutate(LISACAT = lisa$LISACAT,
CATNAME = lisa$CATNAME)
# Create a LISA map
lisa_tmap <- tm_shape(nc_c) +
tm_polygons(col = "grey50") +
tm_shape(nc) +
tm_polygons("LISACAT",
title = "LISA Category",
breaks = c(1, 2, 3, 4, 5, 6),
palette = c("red",
"blue",
"lightpink",
"skyblue",
"grey90"),
colorNA = "white",
labels = c("High-High",
"Low-Low",
"High-Low",
"Low-High",
"Not significant"),
border.col = "black",
border.alpha = 0.25) +
tm_layout(frame = FALSE,
legend.outside = TRUE)
# This command maps them together!
final_maps <- tmap_arrange(map_1,
lisa_tmap,
nrow = 1,
ncol = 2,
sync = TRUE)
# Display map
final_maps