This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major weather events across the United States from 1950 to November 2011. The primary objective is to determine which types of severe weather events cause the greatest harm to population health and which cause the greatest economic damage. Fatalities and injuries are used as measures of public health impact, while property damage and crop damage estimates are used as measures of economic impact. Event type labels in the raw data were standardized to reduce redundancy caused by inconsistent naming. Tornadoes were found to be the single most dangerous event type for population health, accounting for the highest combined fatalities and injuries. Floods caused the greatest total economic damage when property and crop losses are combined. These findings highlight the need for prioritizing tornado preparedness and flood mitigation in disaster planning.
# Load required libraries
library(dplyr)
library(ggplot2)
library(tidyr)
The raw data is provided as a comma-separated values file compressed using the bzip2 algorithm. It is loaded directly into R without any external preprocessing. The dataset contains 902,297 observations and 37 variables. We extract only the columns needed for this analysis: event type, fatalities, injuries, property damage, property damage exponent, crop damage, and crop damage exponent.
# Load data directly from the raw compressed CSV file - no external preprocessing
storm <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE)
# Show dimensions and relevant columns
dim(storm)
## [1] 902297 37
storm_sub <- storm[, c("EVTYPE", "FATALITIES", "INJURIES",
"PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
head(storm_sub)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
We group the data by event type and sum total fatalities and injuries. A combined total column is created to rank events by overall health impact. This allows us to identify which events cause the most harm across both measures simultaneously.
# Aggregate fatalities and injuries by event type
health <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(
FATALITIES = sum(FATALITIES, na.rm = TRUE),
INJURIES = sum(INJURIES, na.rm = TRUE),
TOTAL = FATALITIES + INJURIES
) %>%
arrange(desc(TOTAL))
# Select top 10 most harmful event types
top_health <- head(health, 10)
top_health
## # A tibble: 10 × 4
## EVTYPE FATALITIES INJURIES TOTAL
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
The raw economic damage values are stored in two columns each for property and crop damage: a numeric value (PROPDMG / CROPDMG) and an exponent letter (PROPDMGEXP / CROPDMGEXP). The exponent letters must be converted to numeric multipliers to calculate actual dollar amounts. The justification for this transformation is that without converting the exponents, damage values would be incomparable across records. Letters K, M, B, and H represent thousands, millions, billions, and hundreds respectively. All other values are treated as a multiplier of 1.
# Function to convert exponent letters to numeric multipliers
# Justification: raw data stores damage in split format (value + exponent letter)
# K=thousands, M=millions, B=billions, H=hundreds, else=1
exp_convert <- function(exp) {
exp <- toupper(trimws(exp))
dplyr::case_when(
exp == "K" ~ 1e3,
exp == "M" ~ 1e6,
exp == "B" ~ 1e9,
exp == "H" ~ 1e2,
TRUE ~ 1
)
}
# Calculate actual dollar damage values
storm_sub <- storm_sub %>%
mutate(
PROP_DAMAGE = PROPDMG * exp_convert(PROPDMGEXP),
CROP_DAMAGE = CROPDMG * exp_convert(CROPDMGEXP),
TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE
)
# Aggregate total economic damage by event type
economic <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(TOTAL_DAMAGE = sum(TOTAL_DAMAGE, na.rm = TRUE)) %>%
arrange(desc(TOTAL_DAMAGE))
# Select top 10 events by economic damage
top_economic <- head(economic, 10)
top_economic
## # A tibble: 10 × 2
## EVTYPE TOTAL_DAMAGE
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758222016.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
Conclusion: Tornadoes are by far the most harmful weather event to population health, accounting for over 5,000 fatalities and more than 90,000 injuries — more than all other top event types combined.
# Reshape data to long format for grouped bar chart
top_health_long <- top_health %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
pivot_longer(cols = c(FATALITIES, INJURIES),
names_to = "Harm_Type",
values_to = "Count")
ggplot(top_health_long,
aes(x = reorder(EVTYPE, -Count), y = Count, fill = Harm_Type)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("FATALITIES" = "#d73027", "INJURIES" = "#4575b4")) +
labs(
title = "Figure 1: Top 10 Weather Event Types Most Harmful to Population Health",
subtitle = "Based on total fatalities and injuries recorded in NOAA Storm Database (1950-2011)",
x = "Weather Event Type",
y = "Number of People Affected",
fill = "Type of Harm",
caption = "Figure 1: Tornadoes dominate both fatality and injury counts, making them the
most dangerous weather event type for public health in the United States."
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The figure above clearly shows that Tornadoes cause dramatically more harm to population health than any other event type. Excessive Heat ranks second in fatalities, while TSTM Wind causes significant injuries. Emergency managers should prioritize tornado warning systems and shelters above all other weather hazards.
Conclusion: Floods cause the greatest total economic damage, with over $150 billion in combined property and crop losses — nearly double that of the second-ranked event (Hurricane/Typhoon at approximately $72 billion).
ggplot(top_economic,
aes(x = reorder(EVTYPE, -TOTAL_DAMAGE), y = TOTAL_DAMAGE / 1e9)) +
geom_bar(stat = "identity", fill = "#2166ac") +
labs(
title = "Figure 2: Top 10 Weather Event Types with Greatest Economic Consequences",
subtitle = "Combined property and crop damage in billions USD (NOAA Storm Database, 1950-2011)",
x = "Weather Event Type",
y = "Total Economic Damage (Billions USD)",
caption = "Figure 2: Floods are the costliest weather event type, causing over $150 billion
in combined property and crop damage. Hurricane/Typhoon and Tornado follow in second and third place."
) +
theme_minimal(base_size = 12) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The figure above shows that Floods cause the greatest economic damage overall. This is largely driven by massive property damage from flooding events. Municipal planners should prioritize flood infrastructure investment and insurance programs to reduce economic losses from future flood events.