This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks major weather events across the United States from 1950 to November 2011. The goal is to identify which types of severe weather events are most harmful to population health and which have the greatest economic consequences. The analysis shows that tornadoes are by far the most harmful event type with respect to fatalities and injuries combined. Excessive heat is the leading cause of fatalities alone, while tornadoes cause the most injuries. For economic damage, floods cause the greatest property damage overall, while droughts cause the most crop damage. These findings can help government agencies prioritize resources and preparedness efforts for the most impactful weather event types.
The data comes as a comma-separated-value file compressed via the
bzip2 algorithm. We load it directly using read.csv(),
which handles .bz2 files natively in R.
# Load the data directly from bz2 file
storm <- read.csv("repdata_data_StormData.csv.bz2", stringsAsFactors = FALSE)
# Basic exploration
dim(storm)
## [1] 902297 37
names(storm)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
For this analysis, we only need the event type, fatalities, injuries, property damage, crop damage, and their exponent multiplier columns.
library(dplyr)
storm_sub <- storm %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
The PROPDMGEXP and CROPDMGEXP columns
contain letters (K = thousands, M = millions, B = billions) that serve
as multipliers. We convert them to numeric values.
# Function to convert exponent characters to numeric multipliers
convert_exp <- function(exp) {
exp <- toupper(trimws(exp))
dplyr::case_when(
exp == "K" ~ 1e3,
exp == "M" ~ 1e6,
exp == "B" ~ 1e9,
exp == "H" ~ 1e2,
exp %in% as.character(0:9) ~ 10 ^ as.numeric(exp),
TRUE ~ 1
)
}
storm_sub <- storm_sub %>%
mutate(
prop_mult = convert_exp(PROPDMGEXP),
crop_mult = convert_exp(CROPDMGEXP),
prop_damage = PROPDMG * prop_mult,
crop_damage = CROPDMG * crop_mult,
total_damage = prop_damage + crop_damage,
total_casualties = FATALITIES + INJURIES
)
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `prop_mult = convert_exp(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
There are many inconsistently labelled event types. We convert them to uppercase and trim whitespace to reduce redundancy.
storm_sub$EVTYPE <- toupper(trimws(storm_sub$EVTYPE))
We look at both fatalities and injuries separately, taking the top 10 event types for each.
# Top 10 by fatalities
top_fatalities <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
arrange(desc(Fatalities)) %>%
slice(1:10)
# Top 10 by injuries
top_injuries <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES, na.rm = TRUE)) %>%
arrange(desc(Injuries)) %>%
slice(1:10)
top_fatalities
## # A tibble: 10 × 2
## EVTYPE Fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
top_injuries
## # A tibble: 10 × 2
## EVTYPE Injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
library(ggplot2)
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 4.5.3
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
p1 <- ggplot(top_fatalities, aes(x = reorder(EVTYPE, Fatalities), y = Fatalities)) +
geom_bar(stat = "identity", fill = "firebrick") +
coord_flip() +
labs(title = "Top 10 Events by Fatalities",
x = "Event Type", y = "Total Fatalities") +
theme_bw(base_size = 9)
p2 <- ggplot(top_injuries, aes(x = reorder(EVTYPE, Injuries), y = Injuries)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Events by Injuries",
x = "Event Type", y = "Total Injuries") +
theme_bw(base_size = 9)
grid.arrange(p1, p2, ncol = 2)
Figure 1: Top 10 weather event types by fatalities (left) and injuries (right) across the United States, 1950-2011.
Finding: Tornadoes are the single most harmful event type for both fatalities and injuries. Excessive heat ranks second for fatalities, while thunderstorm winds rank second for injuries.
We examine property damage and crop damage separately, and also look at total combined damage.
# Top 10 by property damage
top_prop <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(PropertyDamage = sum(prop_damage, na.rm = TRUE)) %>%
arrange(desc(PropertyDamage)) %>%
slice(1:10)
# Top 10 by crop damage
top_crop <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(CropDamage = sum(crop_damage, na.rm = TRUE)) %>%
arrange(desc(CropDamage)) %>%
slice(1:10)
top_prop
## # A tibble: 10 × 2
## EVTYPE PropertyDamage
## <chr> <dbl>
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380676.
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822723978.
## 6 HAIL 15735267513.
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046295
top_crop
## # A tibble: 10 × 2
## EVTYPE CropDamage
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954473
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1312973000
## 10 FROST/FREEZE 1094186000
p3 <- ggplot(top_prop, aes(x = reorder(EVTYPE, PropertyDamage), y = PropertyDamage / 1e9)) +
geom_bar(stat = "identity", fill = "darkorange") +
coord_flip() +
labs(title = "Top 10 Events by Property Damage",
x = "Event Type", y = "Property Damage (Billions USD)") +
theme_bw(base_size = 9)
p4 <- ggplot(top_crop, aes(x = reorder(EVTYPE, CropDamage), y = CropDamage / 1e9)) +
geom_bar(stat = "identity", fill = "darkgreen") +
coord_flip() +
labs(title = "Top 10 Events by Crop Damage",
x = "Event Type", y = "Crop Damage (Billions USD)") +
theme_bw(base_size = 9)
grid.arrange(p3, p4, ncol = 2)
Figure 2: Top 10 weather event types by property damage (left) and crop damage (right) in USD, across the United States, 1950-2011.
Finding: Floods cause the greatest property damage (over $140 billion), followed by hurricanes/typhoons. Droughts are the leading cause of crop damage, followed by floods and river floods.
# Combined top 5 by total economic damage
top_total <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(
Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
Property_Damage_B = round(sum(prop_damage, na.rm = TRUE) / 1e9, 2),
Crop_Damage_B = round(sum(crop_damage, na.rm = TRUE) / 1e9, 2)
) %>%
mutate(Total_Econ_B = Property_Damage_B + Crop_Damage_B) %>%
arrange(desc(Total_Econ_B)) %>%
slice(1:5)
knitr::kable(top_total,
caption = "Top 5 Event Types by Total Economic Damage (Billions USD)",
col.names = c("Event Type", "Fatalities", "Injuries",
"Property Dmg (B$)", "Crop Dmg (B$)", "Total Econ (B$)"))
| Event Type | Fatalities | Injuries | Property Dmg (B\()| Crop Dmg (B\)) | Total Econ (B$) | |
|---|---|---|---|---|---|
| FLOOD | 470 | 6789 | 144.66 | 5.66 | 150.32 |
| HURRICANE/TYPHOON | 64 | 1275 | 69.31 | 2.61 | 71.92 |
| TORNADO | 5633 | 91346 | 56.95 | 0.41 | 57.36 |
| STORM SURGE | 13 | 38 | 43.32 | 0.00 | 43.32 |
| HAIL | 15 | 1361 | 15.74 | 3.03 | 18.77 |