This analysis evaluates the impacts of severe weather events on both population health and economic damages, using data from the NOAA Storm Database. Tornadoes account for the highest number of injuries (26,767) and fatalities (1,758), making them the most detrimental event to public health. Thunderstorm Winds dominate property damage with $3.74M in losses, while Hail is the leading contributor to crop damage at $581K. Floods emerge as a dual threat, causing significant health impacts (8,683 injuries, 1,553 fatalities) and substantial economic damage ($2.46M in property and $367K in crop losses). These findings highlight the diverse impacts of different weather events, offering valuable insights for resource prioritization.
First we read the dataset, and then identify which metrics we’ll be working with:
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
To ensure accuracy in the analysis, the dataset requires
preprocessing. The weather event types (EVTYPE
) column
appears to have many inconsistencies in naming, formatting, and
misspellings. Just to give you an example, here’s what happens when we
pull a list of all the unique character values appearing at least once
in the EVTYPE
column:
## Length Class Mode
## 985 character character
Wow! Of the 902,297 rows in the dataset, there are nearly 1,000 unique names for weather event types. For a deeper look, let’s count the incidence of each word in unique_evtype:
word_counts <- tibble(event_type = unique_evtype) %>%
unnest_tokens(word, event_type) %>%
count(word, sort = TRUE)
print(word_counts, n = 20)
## # A tibble: 444 × 2
## word n
## <chr> <int>
## 1 wind 138
## 2 snow 119
## 3 winds 93
## 4 heavy 83
## 5 flood 78
## 6 high 77
## 7 thunderstorm 74
## 8 summary 67
## 9 rain 65
## 10 and 56
## 11 hail 44
## 12 of 44
## 13 cold 41
## 14 record 41
## 15 flooding 37
## 16 ice 35
## 17 urban 33
## 18 storm 32
## 19 freezing 29
## 20 tstm 29
## # ℹ 424 more rows
Now we’re getting somewhere! As you can see, there’s a lot of
similarities within the weather event types that we observed in
unique_evtype
. That means many of these unique values can
be merged so that when we run our calculations to figure out the impact
of particular weather events on human health and the economy, we’ll
arrive at a much more accurate conclusion. Let’s begin to group similar
weather events using the word_counts
tibble that we just
created:
wind_related <- unique_evtype[str_detect(unique_evtype, regex("wind|tstm|thunderstorm|winds|thunderstorms|lightning|thunderstrom", ignore_case = TRUE))]
flood_related <- unique_evtype[str_detect(unique_evtype, regex("flood|flash|flooding|stream", ignore_case = TRUE))]
winter_related <- unique_evtype[str_detect(unique_evtype, regex("snow|blizzard|ice|winter|cold|freezing|frost|freeze|snowfall", ignore_case = TRUE))]
heat_related <- unique_evtype[str_detect(unique_evtype, regex("hot|heat", ignore_case = TRUE))]
tornado_related <- unique_evtype[str_detect(unique_evtype, regex("tornado|tornadoes|funnel|funnels|waterspout", ignore_case = TRUE))]
hail_related <- unique_evtype[str_detect(unique_evtype, regex("hail", ignore_case = TRUE))]
hurricane_related <- unique_evtype[str_detect(unique_evtype, regex("hurricane|typhoon", ignore_case = TRUE))]
landslide_related <- unique_evtype[str_detect(unique_evtype, regex("landslide|landslides|landslump|mud|mudslide|mudslides", ignore_case = TRUE))]
rain_related <- unique_evtype[str_detect(unique_evtype, regex("rain|rainfall|rains|drizzle", ignore_case = TRUE))]
drought_related <- unique_evtype[str_detect(unique_evtype, regex("drought|dryness|dry", ignore_case = TRUE))]
storm_data <- storm_data %>%
mutate(event_type_clean = case_when(
EVTYPE %in% wind_related ~ "Thunderstorm Wind",
EVTYPE %in% flood_related ~ "Flood",
EVTYPE %in% winter_related ~ "Winter Storm",
EVTYPE %in% heat_related ~ "Excessive Heat",
EVTYPE %in% tornado_related ~ "Tornado",
EVTYPE %in% hail_related ~ "Hail",
EVTYPE %in% hurricane_related ~ "Hurricane/Typhoon",
EVTYPE %in% landslide_related ~ "Landslide",
EVTYPE %in% rain_related ~ "Heavy Rain",
EVTYPE %in% drought_related ~ "Drought",
TRUE ~ "Other"
))
storm_data %>%
count(event_type_clean, sort = TRUE)
## event_type_clean n
## 1 Thunderstorm Wind 380765
## 2 Hail 289274
## 3 Flood 86122
## 4 Tornado 71526
## 5 Winter Storm 44877
## 6 Heavy Rain 11845
## 7 Other 11493
## 8 Drought 2785
## 9 Excessive Heat 2666
## 10 Landslide 646
## 11 Hurricane/Typhoon 298
As you can see, we created 10 different groups, each related to a
major weather event, by merging and then renaming similar unique
character values in the EVTYPE
column. Next, we create a
new dataset, keeping only the columns that are most important to our
analysis: Event Types, Fatalities, Injuries, Crop Damage, Property
Damage, and Begin Date. We only included values from 1990-2011 because a
majority of the weather event types were not recorded until the 1990s,
therefore any weather event recorded since the 1950s, like tornadoes,
would have skewed the analysis.
storm_data_by_decade <- storm_data %>%
mutate(
BGN_DATE = mdy_hms(BGN_DATE),
decade = case_when(
year(BGN_DATE) >= 1990 & year(BGN_DATE) < 2000 ~ "1990s",
year(BGN_DATE) >= 2000 & year(BGN_DATE) <= 2011 ~ "2000s"
)
) %>%
group_by(event_type_clean, decade) %>%
summarize(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE),
CROPDMG = sum(CROPDMG, na.rm = TRUE),
PROPDMG = sum(PROPDMG, na.rm = TRUE),
.groups = "drop"
) %>%
filter(event_type_clean != "Other") %>%
arrange(desc(FATALITIES + INJURIES)) %>%
mutate(
CROPDMG = dollar(CROPDMG),
PROPDMG = dollar(PROPDMG)
)
storm_data_by_decade<-drop_na(storm_data_by_decade)
print(storm_data_by_decade)
## # A tibble: 20 × 6
## event_type_clean decade FATALITIES INJURIES CROPDMG PROPDMG
## <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 Tornado 2000s 1195 15214 $73,635 $912,500
## 2 Tornado 1990s 563 11553 $26,392 $689,482
## 3 Thunderstorm Wind 2000s 1219 7281 $137,369 $2,390,820
## 4 Thunderstorm Wind 1990s 873 7553 $90,225 $1,349,686
## 5 Flood 1990s 671 7526 $113,814 $773,990
## 6 Excessive Heat 1990s 1894 4294 $802 $1,805
## 7 Excessive Heat 2000s 1244 4930 $671 $1,428
## 8 Winter Storm 1990s 577 4899 $12,621 $161,968
## 9 Flood 2000s 882 1157 $253,323 $1,687,671
## 10 Winter Storm 2000s 289 1378 $9,830 $254,852
## 11 Hurricane/Typhoon 2000s 68 1291 $6,226 $11,926
## 12 Hail 1990s 10 604 $217,435 $236,703
## 13 Hail 2000s 5 545 $363,984 $452,607
## 14 Heavy Rain 2000s 37 167 $9,534 $39,576
## 15 Heavy Rain 1990s 63 113 $2,914 $13,932
## 16 Hurricane/Typhoon 1990s 65 42 $5,401 $13,260
## 17 Landslide 2000s 37 52 $37 $19,270
## 18 Drought 1990s 30 9 $7,048 $1,396
## 19 Drought 2000s 2 23 $26,866 $4,441
## 20 Landslide 1990s 7 3 $0 $1,343
The goal of this section is to identify weather events most harmful
to human population health. In order to answer this question, we refine
dataset by aggregating by event type and removing the
decades
column. Then, we filter out droughts and landslides
because while their economic impact is clear, their incidences of human
injuries and fatalities is very low. We’re left with a more solid, clean
dataset of the eight weather events most harmful to population health
between 1990-2011:
storm_data_aggregated <- storm_data_by_decade %>%
group_by(event_type_clean) %>%
summarize(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE),
CROPDMG = sum(as.numeric(gsub("[\\$,]", "", CROPDMG)), na.rm = TRUE),
PROPDMG = sum(as.numeric(gsub("[\\$,]", "", PROPDMG)), na.rm = TRUE),
.groups = "drop"
) %>%
filter(!event_type_clean %in% c("Drought", "Landslide")) %>%
select(-CROPDMG, -PROPDMG)
print(storm_data_aggregated)
## # A tibble: 8 × 3
## event_type_clean FATALITIES INJURIES
## <chr> <dbl> <dbl>
## 1 Excessive Heat 3138 9224
## 2 Flood 1553 8683
## 3 Hail 15 1149
## 4 Heavy Rain 100 280
## 5 Hurricane/Typhoon 133 1333
## 6 Thunderstorm Wind 2092 14834
## 7 Tornado 1758 26767
## 8 Winter Storm 866 6277
Next, let’s observe the impact of these eight weather events on human health:
storm_data_long <- storm_data_aggregated %>%
pivot_longer(cols = c(FATALITIES, INJURIES), names_to = "Impact_Type", values_to = "Count")
ggplot(data = storm_data_long, aes(x = Count, y = reorder(event_type_clean, -Count), fill = Impact_Type)) +
geom_segment(aes(xend = 0, yend = event_type_clean), color = "grey37", size = 0.8) +
geom_point(size = 5, shape = 21, color = "black") +
scale_fill_manual(values = c("FATALITIES" = "wheat1", "INJURIES" = "slategray1")) +
scale_x_continuous(
breaks = seq(0, 27500, by = 2500),
limits = c(0, 27500),
expand = c(0, 0)
) +
coord_cartesian(clip = "off") +
labs(
title = "Impacts of Weather Event Types on Human Health",
subtitle = "Comparison of Fatalities and Injuries by Event Type, 1990-2011",
x = "Count",
y = "Weather Event Type",
fill = "Impact Type"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
axis.text.y = element_text(size = 10, margin = margin(r = 11)),
axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, size = 11),
legend.position = "top",
plot.margin = margin(10, 20, 10, 20)
)
kable(storm_data_aggregated, caption = "Top 8 Weather Events Impacting Population Health, 1990-2011") %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
event_type_clean | FATALITIES | INJURIES |
---|---|---|
Excessive Heat | 3138 | 9224 |
Flood | 1553 | 8683 |
Hail | 15 | 1149 |
Heavy Rain | 100 | 280 |
Hurricane/Typhoon | 133 | 1333 |
Thunderstorm Wind | 2092 | 14834 |
Tornado | 1758 | 26767 |
Winter Storm | 866 | 6277 |
As you can see from the chart and the corresponding table, tornadoes
emerge as the most harmful event type to population health, causing
significantly more fatalities and injuries than any other event. It’s
also worth restating the importance of having merged similar
EVTYPE
values earlier. Had I skipped preprocessing and done
my calculations on the raw dataset, thunderstorm winds would have ranked
significantly lower in the analysis because of all the naming
inconsistencies in the raw dataset. Also, tornadoes would have heavily
skewed the final analysis because most of these other weather event
types were not recorded in the NOAA storm database prior to the
1990s.
Let’s move onto the next question in our analysis, which seeks to
know which weather events have the greatest econonmic consequences. To
analyze the economic impacts of weather events, we return to our
storm_data_by_decade
, which included property and crop
damages, so that we can highlight the types of losses (infrastructure
vs. agriculture) associated with each event.
economic_data <- storm_data_by_decade %>%
select(-FATALITIES, -INJURIES) %>%
mutate(
CROPDMG = as.numeric(gsub("[\\$,]", "", CROPDMG)),
PROPDMG = as.numeric(gsub("[\\$,]", "", PROPDMG))
) %>%
group_by(event_type_clean) %>%
summarize(
CROPDMG = sum(CROPDMG, na.rm = TRUE),
PROPDMG = sum(PROPDMG, na.rm = TRUE),
.groups = "drop"
)
print(economic_data)
## # A tibble: 10 × 3
## event_type_clean CROPDMG PROPDMG
## <chr> <dbl> <dbl>
## 1 Drought 33914 5837
## 2 Excessive Heat 1473 3233
## 3 Flood 367137 2461661
## 4 Hail 581419 689310
## 5 Heavy Rain 12448 53508
## 6 Hurricane/Typhoon 11627 25186
## 7 Landslide 37 20613
## 8 Thunderstorm Wind 227594 3740506
## 9 Tornado 100027 1601982
## 10 Winter Storm 22451 416820
After refining our data into a new dataset named
economic_data
as seen above, we now have a table of weather
event types and their corresponding infrastructural and agricultural
damage between 1990-2011. Now let’s map this out:
economic_data_prop <- economic_data %>%
mutate(Total_Damage = CROPDMG + PROPDMG) %>%
pivot_longer(cols = c(CROPDMG, PROPDMG), names_to = "Damage_Type", values_to = "Amount")
ggplot(economic_data_prop, aes(x = reorder(event_type_clean, -Total_Damage), y = Amount, fill = Damage_Type)) +
geom_bar(stat = "identity", position = "fill") +
scale_y_continuous(labels = scales::percent) +
scale_fill_manual(values = c("CROPDMG" = "wheat1", "PROPDMG" = "slategrey"), name = "Damage Type") +
labs(
title = "Proportion of Crop vs Property Damage by Weather Event Type",
subtitle = "Percentage of Total Damage, 1990-2011",
x = "Weather Event Type",
y = "Proportion of Damage",
fill = "Damage Type"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, size = 10),
legend.position = "top"
)
As we can see from the chart above and the corresponding table below, thunderstorm wind causes the highest property damage, while hail contributes the most to crop damage. Floods also represent a very significant economic burden.
economic_data %>%
mutate(
CROPDMG = scales::comma(CROPDMG),
PROPDMG = scales::comma(PROPDMG)
) %>%
kable(
caption = "Economic Impact by Event Type: Crop vs Property Damage",
col.names = c("Event Type", "Crop Damage (USD)", "Property Damage (USD)"),
align = c("l", "r", "r"),
format = "html"
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center"
) %>%
column_spec(1, bold = TRUE)
Event Type | Crop Damage (USD) | Property Damage (USD) |
---|---|---|
Drought | 33,914 | 5,837 |
Excessive Heat | 1,473 | 3,233 |
Flood | 367,137 | 2,461,661 |
Hail | 581,419 | 689,310 |
Heavy Rain | 12,448 | 53,508 |
Hurricane/Typhoon | 11,627 | 25,186 |
Landslide | 37 | 20,613 |
Thunderstorm Wind | 227,594 | 3,740,506 |
Tornado | 100,027 | 1,601,982 |
Winter Storm | 22,451 | 416,820 |
While this analysis identifies key weather events and their impacts, certain limitations must be considered. Economic damage estimates are approximate and may not reflect full losses, which is why damage estimates may seem low in some cases. This could be due to data collection challenges, lack of inflation adjustment, or reporting bias. Additionally, fatalities and injuries are aggregated without distinguishing direct from indirect causes, which could provide further insights. Improved data collection and reporting standards would strengthen the reliability of these findings.
The findings reveal that tornadoes impose the greatest burden on population health, accounting for 65% of total injuries and 21% of fatalities among all analyzed event types. Thunderstorm Winds lead in property damage, contributing 37% of total property losses, while Hail is the top contributor to crop damage at 40% of total agricultural losses. Floods stand out as a significant concern for both public health and economic stability, ranking second in injuries and fatalities while also contributing heavily to crop and property damage. Events like Excessive Heat also pose notable health risks, with 3,138 fatalities and 9,224 injuries, emphasizing their impact on vulnerable populations. This analysis underscores the varied nature of weather-related threats, providing a comprehensive understanding to aid in preparing for future events.
National Oceanic and Atmospheric Administration (NOAA). NOAA
Storm Events Database.
URL: https://www.ncdc.noaa.gov/stormevents/
R Core Team (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL: https://www.R-project.org/
Packages used in this analysis: