Call Data & Functions
# Function Lord
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(maps)
# Read CSV File
disaster <- read.csv("C:/Users/user/Desktop/data/database.csv",
stringsAsFactors = FALSE)
# Checking the name of Variables
str(disaster)
## 'data.frame': 46185 obs. of 14 variables:
## $ Declaration.Number : chr "DR-1" "DR-2" "DR-3" "DR-4" ...
## $ Declaration.Type : chr "Disaster" "Disaster" "Disaster" "Disaster" ...
## $ Declaration.Date : chr "05/02/1953" "05/15/1953" "05/29/1953" "06/02/1953" ...
## $ State : chr "GA" "TX" "LA" "MI" ...
## $ County : chr "" "" "" "" ...
## $ Disaster.Type : chr "Tornado" "Tornado" "Flood" "Tornado" ...
## $ Disaster.Title : chr "Tornado" "Tornado and Heavy Rainfall" "Flood" "Tornado" ...
## $ Start.Date : chr "05/02/1953" "05/15/1953" "05/29/1953" "06/02/1953" ...
## $ End.Date : chr "05/02/1953" "05/15/1953" "05/29/1953" "06/02/1953" ...
## $ Close.Date : chr "06/01/1954" "01/01/1958" "02/01/1960" "02/01/1956" ...
## $ Individual.Assistance.Program : chr "Yes" "Yes" "Yes" "Yes" ...
## $ Individuals...Households.Program: chr "No" "No" "No" "No" ...
## $ Public.Assistance.Program : chr "Yes" "Yes" "Yes" "Yes" ...
## $ Hazard.Mitigation.Program : chr "Yes" "Yes" "Yes" "Yes" ...
# Extract date -> year
disaster$Declaration.Date <- as.Date(disaster$Declaration.Date, format = "%m/%d/%Y")
disaster$year <- year(disaster$Declaration.Date)
# Create a subset with only the essential variables for analysis (for readability)
disaster_clean <- disaster %>%
select(year,
State,
Disaster.Type,
Declaration.Date) %>%
filter(!is.na(year),
!is.na(State),
State != "")
# Descriptive statistics of key variables
summary(disaster_clean)
## year State Disaster.Type Declaration.Date
## Min. :1953 Length:46185 Length:46185 Min. :1953-05-02
## 1st Qu.:1993 Class :character Class :character 1st Qu.:1993-03-17
## Median :2003 Mode :character Mode :character Median :2003-03-20
## Mean :1998 Mean :1998-10-26
## 3rd Qu.:2008 3rd Qu.:2008-07-09
## Max. :2017 Max. :2017-02-14
State and Disaster Type Variables
State: Two-letter state code (e.g., TX, CA, NY)
Disaster.Type: Disaster categories such as Tornado, Flood,
Hurricane
year: Year of the disaster declaration (1953–2017)
These three variables each play a key role:
State → “How many regions are affected?” (spatial extent)
Disaster.Type × State → “Which disasters occur in which regions?”
(pattern diversity)
year → “How do these patterns change over time?” (long-term
trends)
4. Results
We examine the research questions step-by-step using three primary
plots.
4.1 Plot 1: Changes in the Spatial Extent of Disasters
4.1.1 Code for Plot 1
Plot 1 calculates the number of states with at least one disaster
declaration per year, visualized with a bar chart, with an overlaid
smooth line to show general trends.
- X-axis: Year
- Y-axis: Number of unique states affected
- Bars: Actual yearly state counts
- Red Loess curve: Long-term trend
# Number of States Affected Per Year
year_state_count <- disaster_clean %>%
group_by(year) %>%
summarise(
n_states = n_distinct(State),
.groups = "drop") %>%
filter(!is.na(year))
head(year_state_count, 100)
## # A tibble: 65 × 2
## year n_states
## <dbl> <int>
## 1 1953 11
## 2 1954 17
## 3 1955 17
## 4 1956 14
## 5 1957 16
## 6 1958 7
## 7 1959 6
## 8 1960 10
## 9 1961 12
## 10 1962 20
## # ℹ 55 more rows
# Plot 1: Bar + Smooth line
ggplot(year_state_count, aes(x = year, y = n_states)) +
geom_col(fill = "steelblue", alpha = 0.7) +
geom_smooth(method = "loess", se = FALSE, color = "darkred", linewidth = 1) +
labs(
title = "Number of States Affected by Disasters per Year",
x = "Year",
y = "Number of States with Disaster Declarations") +
theme_minimal(base_size = 12)
## `geom_smooth()` using formula = 'y ~ x'

4.1.2 Plot 1 Result
Key Observations - Continuous increase from the late
1950s to early 2000s - Peak in early 2000s - Slight decline
afterward
Core Interpretation - Disaster impact has expanded
spatially over the long term - Despite yearly fluctuations, the pattern
is clear: 1950s < 1980s < 2000s - Some years show more than 40
states affected → Indicates U.S. disasters increasingly impact many
regions simultaneously
4.2 Plot 2: Complexity of Hazard Structure
Plot 2 counts unique disaster-type × state combinations for each
year. It can measure complexity and diversity of the national disaster
“portfolio”.
4.2.1 Calculating the Plot 2
We use the simple measure: unique combinations per year.
If in one year the combinations were (Flood, TX), (Flood, CA),
(Hurricane, FL) → then the count = 3
# Unique Type × State Combinations Over Time
combo_diversity <- disaster_clean %>%
mutate(
combo = paste(Disaster.Type, State, sep = "_")) %>%
group_by(year) %>%
summarise(
n_unique_combos = n_distinct(combo),
.groups = "drop") %>%
filter(!is.na(year))
head(combo_diversity)
## # A tibble: 6 × 2
## year n_unique_combos
## <dbl> <int>
## 1 1953 12
## 2 1954 17
## 3 1955 17
## 4 1956 15
## 5 1957 16
## 6 1958 7
4.2.2 Code for Plot 2
ggplot(combo_diversity, aes(x = year, y = n_unique_combos)) +
geom_line(color = "darkorange", linewidth = 1) +
geom_point(color = "darkorange", size = 1) +
labs(
title = "Yearly Diversity of (Disaster Type × State) Combinations",
x = "Year",
y = "Number of Unique Combinations") +
theme_minimal(base_size = 12)

4.2.3 Plot 2 Result
Key Interpretation - Large increase from the 1950s
(10–20 combos) to the 2000s (60–90 combos) - Disaster types spread
across more regions - Trend shows long-term growth despite fluctuations
- Recent decreases may be temporary or reflect external factors
Conclusion Not only has the range expanded, but the
U.S. hazard environment has become far more complex and
multi-layered.
4.3 Plot 3: Spatial Movement of Disasters
Summary of how the spatial centers of each disaster type have moved
over time:
Annual Centroids by Disaster Type (Top 5) 1. Fire, 2. Flood, 3. Snow,
4. storm, 5. Hurricane
4.3.1 Calculating Yearly Centroids
Use the state center coordinates (state.center) for the 50 U.S.
states to obtain approximate latitude and longitude for each
state.
Map each FEMA disaster declaration to the corresponding state’s
centroid coordinates.
For each year, compute the average latitude (lat) and longitude
(lon) of all disaster declarations to obtain the yearly
centroid.
state_coords <- data.frame(
State = state.abb,
lon = state.center$x,
lat = state.center$y)
# 3. Match FEMA data with state centroid coordinates (latitude/longitude)
dis_geo <- disaster %>%
left_join(state_coords, by = "State") %>%
filter(!is.na(lon), !is.na(lat), !is.na(Disaster.Type), !is.na(year))
# 4. Extract the top 5 most frequent disaster types
top5_types <- dis_geo %>%
count(Disaster.Type, sort = TRUE) %>%
slice_head(n = 5) %>%
pull(Disaster.Type)
# 5. Compute yearly centroids (mean latitude/longitude) for top 5 disaster types
centroids_top5 <- dis_geo %>%
filter(Disaster.Type %in% top5_types) %>%
group_by(Disaster.Type, year) %>%
summarise(
centroid_lon = mean(lon, na.rm = TRUE),
centroid_lat = mean(lat, na.rm = TRUE),
.groups = "drop") %>%
arrange(Disaster.Type, year)
# 1. U.S. map data
us_map <- map_data("state")
# 2. Define a function to plot centroid movement for each disaster type
plot_centroid_type <- function(type_label) {
# Filter for the selected disaster type
df_type <- centroids_top5 %>%
filter(Disaster.Type == type_label)
ggplot() +
# Background U.S. map
geom_polygon(
data = us_map,
aes(x = long, y = lat, group = group),
fill = "white",
color = "gray70",
inherit.aes = FALSE) +
geom_path(
data = df_type,
aes(x = centroid_lon, y = centroid_lat, group = 1),
color = "gray",
linewidth = 0.5) +
# Centroid points with color mapped to year
geom_point(
data = df_type,
aes(x = centroid_lon, y = centroid_lat, color = year),
size = 2) +
scale_color_gradient(
low = "blue",
high = "orange",
name = "Year") +
labs(
title = paste0("Top 5 Disaster: ", type_label),
subtitle = "FEMA Disaster Declarations, 1953–2017",
x = "Longitude",
y = "Latitude") +
coord_fixed(1.3) +
theme_minimal(base_size = 11) +
theme(panel.grid = element_line(linewidth = 0.2, color = "gray85"))}
# 3. Generate plots for the top 5 disaster types
plot_centroid_type("Fire")

plot_centroid_type("Flood")

plot_centroid_type("Snow")

plot_centroid_type("Storm")

plot_centroid_type("Hurricane")

4.3.3 Plot 3 Result
- Fire
- Early (1960s): Center in the Northeast
- Mid-period (1980s): Moved toward central U.S.
- Recent (2000s): Concentrated in the West/Southwest → Clear long-term
shift East → Central → West
- Flood
- Highly concentrated in the Midwest
- Early and recent centroids cluster in similar regions
- No strong directional long-term movement → Flood risk appears
anchored in specific inland regions
Snow Spread across a wide East–Central range Early (blue) and
recent (orange) points intermixed → No consistent directional shift →
High yearly variability
Storm
- Clustered in the central inland region
- No strong long-term directional pattern
- Some recent eastward/southward dispersion → Essentially a stable
hazard zone with minor variation
- Hurricane
- Early (1960s): Southernmost parts of the U.S. coastline
- Recent (2000s): Shifted northward and northeastward → Clear shift
Southeast → East/Northeast coastline
5. Discussion & Overall Conclusion
5.1 Integrated Interpretation
Combining all analyses, long-term structural changes in U.S. disaster
patterns are as follows:
Expansion of Spatial Extent (Plot 1) → More states
are being affected over time
Increase in Structural Complexity (Plot 2) → More
diverse types of disasters occurring across more regions
Geographic Shifts in Disaster Centers (Plot 3) →
Some types (Fire, Hurricane) show clear directional movement → Others
(Flood, Storm) remain stable → Snow is highly variable
→ Together, these indicators show a structural reconfiguration of
national disaster patterns compared to the past.
4.2 Limitations & Further Work
Limitations - Dataset includes only declared FEMA
disasters → Non-declared events excluded - state.center provides
approximate—not precise—geographic locations
Further Work - Weighted centroid analysis
incorporating cost or damage severity - Clustering analysis based on
disaster occurrence patterns