This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States from 1950 to November 2011. The analysis addresses two key questions: (1) which types of weather events are most harmful to population health, and (2) which types of events have the greatest economic consequences.
After loading the raw data from the bzip2-compressed CSV file, we cleaned and processed the event type labels, decoded the property and crop damage exponents, and aggregated totals by event type. Tornadoes were found to be overwhelmingly the most harmful event type with respect to fatalities and injuries. For economic consequences, floods caused the most total economic damage, followed by hurricanes/typhoons and storm surges. These findings provide actionable insight for government and municipal managers seeking to prioritize emergency preparedness resources.
The data is loaded directly from the raw compressed
.csv.bz2 file. The read.csv() function in R
can read bzip2-compressed files natively.
# Load data directly from the bz2 file
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),
header = TRUE,
stringsAsFactors = FALSE)
dim(storm_data)## [1] 902297 37
# View structure and key columns
str(storm_data[, c("EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")])## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
Event type labels in the raw data are inconsistent (mixed case, abbreviations, trailing spaces). We standardize them for accurate grouping.
The PROPDMGEXP and CROPDMGEXP columns use
letter codes to represent multipliers (e.g., K = thousands,
M = millions, B = billions). We convert these
to numeric multipliers.
# Function to convert exponent codes to numeric multipliers
decode_exp <- function(exp) {
exp <- toupper(trimws(exp))
case_when(
exp == "B" ~ 1e9,
exp == "M" ~ 1e6,
exp == "K" ~ 1e3,
exp == "H" ~ 1e2,
exp %in% as.character(0:9) ~ 10^as.numeric(exp),
TRUE ~ 1
)
}
# Apply multipliers
storm_data <- storm_data %>%
mutate(
PROP_MULT = decode_exp(PROPDMGEXP),
CROP_MULT = decode_exp(CROPDMGEXP),
PROP_DMG_USD = PROPDMG * PROP_MULT,
CROP_DMG_USD = CROPDMG * CROP_MULT,
TOTAL_DMG_USD = PROP_DMG_USD + CROP_DMG_USD
)# Sum fatalities and injuries by event type
health_impact <- storm_data %>%
group_by(EVTYPE_CLEAN) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(Total_Casualties = Fatalities + Injuries) %>%
arrange(desc(Total_Casualties))
# Top 10 most harmful event types
top10_health <- head(health_impact, 10)
top10_health## # A tibble: 10 × 4
## EVTYPE_CLEAN Fatalities Injuries Total_Casualties
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
# Sum total economic damage by event type
econ_impact <- storm_data %>%
group_by(EVTYPE_CLEAN) %>%
summarise(
Property_Damage = sum(PROP_DMG_USD, na.rm = TRUE),
Crop_Damage = sum(CROP_DMG_USD, na.rm = TRUE),
Total_Damage = sum(TOTAL_DMG_USD, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(Total_Damage))
# Top 10 most economically damaging event types
top10_econ <- head(econ_impact, 10)
top10_econ## # A tibble: 10 × 4
## EVTYPE_CLEAN Property_Damage Crop_Damage Total_Damage
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56947380676. 414953270 57362333946.
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15735267513. 3025954473 18761221986.
## 6 FLASH FLOOD 16822723978. 1421317100 18244041078.
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927860 5022113500 8967041360
The figure below shows the top 10 weather event types by total casualties (fatalities + injuries). The bars are stacked to show the proportion of fatalities versus injuries within each event type.
# Reshape for plotting
top10_health_long <- top10_health %>%
select(EVTYPE_CLEAN, Fatalities, Injuries) %>%
pivot_longer(cols = c(Fatalities, Injuries),
names_to = "Type",
values_to = "Count") %>%
mutate(EVTYPE_CLEAN = factor(EVTYPE_CLEAN,
levels = top10_health$EVTYPE_CLEAN[order(top10_health$Total_Casualties)]))
ggplot(top10_health_long, aes(x = EVTYPE_CLEAN, y = Count, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("Fatalities" = "#D7191C", "Injuries" = "#FDAE61"),
name = "Casualty Type") +
scale_y_continuous(labels = comma) +
labs(
title = "Top 10 Most Harmful Storm Events to Population Health (1950–2011)",
subtitle = "Stacked total of fatalities and injuries per event type",
x = "Event Type",
y = "Number of Casualties"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "bottom",
axis.text.y = element_text(size = 11)
)Figure 1: Top 10 Storm Event Types by Total Casualties (Fatalities + Injuries). Tornadoes dominate both fatalities and injuries by a wide margin, accounting for more casualties than all other event types combined.
Key Finding: Tornadoes are by far the most dangerous weather event for population health, with 5,633 fatalities and 91,346 injuries recorded between 1950 and 2011. Excessive heat is the second-deadliest event type, followed by thunderstorm winds (TSTM WIND) and floods.
The figure below shows the top 10 weather event types by total economic damage (property damage + crop damage), with bars stacked to show the split between property and crop loss.
# Reshape for plotting
top10_econ_long <- top10_econ %>%
select(EVTYPE_CLEAN, Property_Damage, Crop_Damage) %>%
pivot_longer(cols = c(Property_Damage, Crop_Damage),
names_to = "Type",
values_to = "Damage_USD") %>%
mutate(
EVTYPE_CLEAN = factor(EVTYPE_CLEAN,
levels = top10_econ$EVTYPE_CLEAN[order(top10_econ$Total_Damage)]),
Type = recode(Type,
"Property_Damage" = "Property Damage",
"Crop_Damage" = "Crop Damage")
)
ggplot(top10_econ_long, aes(x = EVTYPE_CLEAN, y = Damage_USD / 1e9, fill = Type)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_manual(values = c("Property Damage" = "#2C7BB6", "Crop Damage" = "#ABD9E9"),
name = "Damage Type") +
scale_y_continuous(labels = dollar_format(suffix = "B", prefix = "$")) +
labs(
title = "Top 10 Storm Events by Total Economic Damage (1950–2011)",
subtitle = "Combined property and crop damage in billions of USD",
x = "Event Type",
y = "Total Economic Damage (Billions USD)"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", size = 14),
legend.position = "bottom",
axis.text.y = element_text(size = 11)
)Figure 2: Top 10 Storm Event Types by Total Economic Damage (USD). Floods cause the greatest total economic damage, followed by hurricanes/typhoons and tornado events. Drought has a proportionally larger share of crop damage.
Key Finding: Floods cause the greatest total economic damage, with approximately $150.3 billion in combined property and crop damage. Hurricanes/Typhoons and Tornadoes follow as the second and third most costly events, respectively. Notably, Drought stands out for its disproportionately high share of crop damage relative to property damage.
## === TOP 10 EVENTS: POPULATION HEALTH IMPACT ===
print(top10_health[, c("EVTYPE_CLEAN", "Fatalities", "Injuries", "Total_Casualties")],
row.names = FALSE)## # A tibble: 10 × 4
## EVTYPE_CLEAN Fatalities Injuries Total_Casualties
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
##
## === TOP 10 EVENTS: ECONOMIC IMPACT ===
top10_econ_display <- top10_econ %>%
mutate(
Property_Damage_B = sprintf("$%.2fB", Property_Damage / 1e9),
Crop_Damage_B = sprintf("$%.2fB", Crop_Damage / 1e9),
Total_Damage_B = sprintf("$%.2fB", Total_Damage / 1e9)
) %>%
select(EVTYPE_CLEAN, Property_Damage_B, Crop_Damage_B, Total_Damage_B)
print(top10_econ_display, row.names = FALSE)## # A tibble: 10 × 4
## EVTYPE_CLEAN Property_Damage_B Crop_Damage_B Total_Damage_B
## <chr> <chr> <chr> <chr>
## 1 FLOOD $144.66B $5.66B $150.32B
## 2 HURRICANE/TYPHOON $69.31B $2.61B $71.91B
## 3 TORNADO $56.95B $0.41B $57.36B
## 4 STORM SURGE $43.32B $0.00B $43.32B
## 5 HAIL $15.74B $3.03B $18.76B
## 6 FLASH FLOOD $16.82B $1.42B $18.24B
## 7 DROUGHT $1.05B $13.97B $15.02B
## 8 HURRICANE $11.87B $2.74B $14.61B
## 9 RIVER FLOOD $5.12B $5.03B $10.15B
## 10 ICE STORM $3.94B $5.02B $8.97B
This analysis of the NOAA Storm Database (1950–2011) reveals two clear priorities for emergency preparedness:
Tornadoes pose the greatest threat to human life and safety, accounting for the most fatalities and injuries of any event type. Governments should invest in tornado detection, warning systems, and shelter infrastructure.
Floods cause the largest economic losses in terms of property and crop damage. Infrastructure investment in flood management, levees, and crop insurance programs would be most cost-effective for economic protection.
These findings are based on the complete 61-year record of storm events and provide a robust basis for resource allocation decisions by emergency management agencies.