This analysis examines the NOAA Storm Database to identify which weather events are most harmful to population health and which generate the greatest economic losses in the United States. Population health impact was measured as the total number of fatalities and injuries aggregated by event type. Economic consequences were calculated as the total property and crop damage, converted into millions of U.S. dollars. The results show that tornadoes are by far the most harmful event type in terms of combined fatalities and injuries, substantially exceeding all other categories. Excessive heat and thunderstorm wind also contribute significantly to health impacts, though at much lower levels than tornadoes. In contrast, floods generate the greatest total economic losses, followed by hurricanes/typhoons and tornadoes. Storm surge and hail also rank among the top contributors to financial damage. Overall, the events that pose the greatest threat to human life are not always the same as those that cause the largest economic losses.
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.5.2
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.5.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
storm_raw <- read.csv("repdata_data_StormData.csv.bz2",
sep = ",")
For population health impacts, the variables FATALITIES and INJURIES were used. For economic consequences, the variables PROPDMG and CROPDMG, along with their corresponding exponent variables PROPDMGEXP and CROPDMGEXP, were selected. The variable EVTYPE was retained to classify and group events by type. All other variables were excluded to focus the analysis strictly on measures relevant to the assignment questions.
storm_selected <- storm_raw %>%
select(
EVTYPE,
FATALITIES,
INJURIES,
PROPDMG,
PROPDMGEXP,
CROPDMG,
CROPDMGEXP
)
# Check structure
str(storm_selected)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
I set the unit of damage value into millions.
exp_to_multiplier <- function(x) {
x <- toupper(x)
case_when(
x == "K" ~ 1e3,
x == "M" ~ 1e6,
x == "B" ~ 1e9,
x == "H" ~ 1e2,
grepl("^[0-9]$", x) ~ 10^as.numeric(x),
TRUE ~ 1
)
}
storm_selected <- storm_selected %>%
mutate(
prop_multiplier = exp_to_multiplier(PROPDMGEXP),
crop_multiplier = exp_to_multiplier(CROPDMGEXP)
)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `prop_multiplier = exp_to_multiplier(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
storm_selected <- storm_selected %>%
mutate(
property_damage = PROPDMG * prop_multiplier,
crop_damage = CROPDMG * crop_multiplier,
total_damage = property_damage + crop_damage
)
# Set the unit into million
storm_selected <- storm_selected %>%
mutate(
property_damage_M = property_damage / 1000000,
crop_damage_M = crop_damage / 1000000,
total_damage_M = total_damage / 1000000
)
Tornadoes are the most harmful weather event in terms of fatalities and injuries, far exceeding all other event types.
health_by_event <- storm_selected %>%
mutate(total_harm = FATALITIES + INJURIES) %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_harm = sum(total_harm, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_harm))
head(health_by_event, 10)
## # A tibble: 10 × 4
## EVTYPE total_fatalities total_injuries total_harm
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
# Plot
top10_health <- health_by_event %>%
slice_head(n = 10)
ggplot(top10_health,
aes(x = reorder(EVTYPE, total_harm), y = total_harm)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Most Harmful Weather Events (Fatalities + Injuries)",
x = "Event Type",
y = "Total Harm (Fatalities + Injuries)"
)
Floods generate the largest total economic losses, followed by hurricanes/typhoons and tornadoes.
econ_by_event <- storm_selected %>%
group_by(EVTYPE) %>%
summarise(
total_damage_M = sum(total_damage_M, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_damage_M))
head(econ_by_event, 10)
## # A tibble: 10 × 2
## EVTYPE total_damage_M
## <chr> <dbl>
## 1 FLOOD 150320.
## 2 HURRICANE/TYPHOON 71914.
## 3 TORNADO 57362.
## 4 STORM SURGE 43324.
## 5 HAIL 18761.
## 6 FLASH FLOOD 18244.
## 7 DROUGHT 15019.
## 8 HURRICANE 14610.
## 9 RIVER FLOOD 10148.
## 10 ICE STORM 8967.
# plot
top10_econ <- econ_by_event %>%
slice_head(n = 10)
ggplot(top10_econ, aes(x = reorder(EVTYPE, total_damage_M),
y = total_damage_M)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Weather Events by Total Economic Damage",
x = "Event Type",
y = "Total Damage (Millions of USD)"
)