This analysis explores which NOAA‐recorded weather event types have historically caused the most fatalities and injuries, and which have produced the greatest property/crop damage costs. Data were acquired from Storm Data from NOAA, 1950–Nov 2011. After loading the raw CSV.bz2 file, I clean and aggregate fatalities/injuries by event type, and convert property/crop damage exponents into dollar estimates to find which event types had the highest economic impact. In summary, tornadoes, heat waves, and flash floods appear most detrimental to population health, while hurricanes and related tropical storms generate the highest direct economic damage.
library(dplyr) # for data manipulation
library(readr) # for fast CSV reading
library(lubridate) # for working with dates (if needed)
library(ggplot2) # for plots
# Read raw data (gzipped/bzipped) directly
raw_data <- read_csv("repdata_data_StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl (1): COUNTYENDN
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check basic structure
glimpse(raw_data)
## Rows: 902,297
## Columns: 37
## $ STATE__ <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ BGN_DATE <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/1951 0:00:0…
## $ BGN_TIME <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0100", "09…
## $ TIME_ZONE <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CS…
## $ COUNTY <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, 49, 107,…
## $ COUNTYNAME <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMAN", "LAUD…
## $ STATE <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",…
## $ EVTYPE <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TOR…
## $ BGN_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ BGN_AZI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BGN_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_DATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_TIME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_RANGE <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ END_AZI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LENGTH <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, 1.3, 4.7…
## $ WIDTH <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400, 400, 2…
## $ F <dbl> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, 1, 1, 1,…
## $ MAG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0, 0,…
## $ INJURIES <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50, 2, 0, …
## $ PROPDMG <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25.0, 2.5, …
## $ PROPDMGEXP <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", "M", "M",…
## $ CROPDMG <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ CROPDMGEXP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WFO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATEOFFIC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ZONENAMES <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LATITUDE <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 3334, 3336,…
## $ LONGITUDE <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 8740, 8738,…
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, 0, 3432,…
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, 0, 8540,…
## $ REMARKS <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ REFNUM <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …
EVTYPE: event type (character).BGN_DATE: beginning date of the event
(character).FATALITIES, INJURIES: integer
counts.PROPDMG, PROPDMGEXP: numeric magnitude
(e.g., 1, 2.5) and exponent (e.g., “K” for thousands, “M” for
millions).CROPDMG, CROPDMGEXP: same idea for crop
damage.Before proceeding, verify that PROPDMGEXP and
CROPDMGEXP contain only the expected exponent codes (e.g.,
K, M, B, or blank):
table(raw_data$PROPDMGEXP, useNA = "always")
##
## - ? + 0 1 2 3 4 5 6 7
## 1 8 5 216 25 13 4 4 28 4 5
## 8 B h H K m M <NA>
## 1 40 1 6 424665 7 11330 465934
table(raw_data$CROPDMGEXP, useNA = "always")
##
## ? 0 2 B k K m M <NA>
## 7 19 1 9 21 281832 1 1994 618413
Create a helper function or mapping table that converts each
character exponent (“K”, “M”,
“B”, “H”, etc.) into a numeric multiplier
(e.g., K→1e3, M→1e6, B→1e9). For any unexpected or blank exponent, treat
it as 1 (i.e., no scaling) or 0 if appropriate. A simple approach
is:
# Define a named vector for exponent → multiplier
exp_map <- c(
"K" = 1e3,
"k" = 1e3,
"M" = 1e6,
"m" = 1e6,
"B" = 1e9,
"b" = 1e9
)
# If you observe other characters (like “0–8” for tens, hundreds, etc.),
# you could map them to 10^as.numeric. For simplicity, we assume only K/M/B.
Use mutate() to compute actual property and crop damage
in dollars:
storm_clean <- raw_data %>%
mutate(
# Ensure exponents are recognized keys in exp_map; replace unknowns with 1
PROPDMGEXP_clean = if_else(PROPDMGEXP %in% names(exp_map),
PROPDMGEXP,
""),
CROPDMGEXP_clean = if_else(CROPDMGEXP %in% names(exp_map),
CROPDMGEXP,
""),
PropDamage = PROPDMG * exp_map[PROPDMGEXP_clean],
CropDamage = CROPDMG * exp_map[CROPDMGEXP_clean]
) %>%
# Select only columns needed for analysis to save memory:
select(
BGN_DATE, EVTYPE, FATALITIES, INJURIES,
PropDamage, CropDamage
)
#Check:
summary(storm_clean$PropDamage)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00e+00 0.00e+00 1.00e+03 9.80e+05 1.00e+04 1.15e+11 466255
summary(storm_clean$CropDamage)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00e+00 0.00e+00 0.00e+00 1.73e+05 0.00e+00 5.00e+09 618440
health_summary <- storm_clean |>
group_by(EVTYPE) |>
summarize(
TotalFatalities = sum(FATALITIES, na.rm = TRUE),
TotalInjuries = sum(INJURIES, na.rm = TRUE)
) |>
mutate(TotalCasualties = TotalFatalities + TotalInjuries) |>
arrange(desc(TotalCasualties))
head(health_summary, 10)
## # A tibble: 10 × 4
## EVTYPE TotalFatalities TotalInjuries TotalCasualties
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
top10_health <- health_summary %>%
slice_max(order_by = TotalCasualties, n = 10)
ggplot(top10_health, aes(
x = reorder(EVTYPE, TotalCasualties),
y = TotalCasualties
)) +
geom_col(fill = "steelblue") +
coord_flip() +
labs(
x = "Event Type",
y = "Total Fatalities + Injuries",
title = "Top 10 Weather Event Types by Human Casualties"
) +
theme_minimal()
econ_summary <- storm_clean %>%
group_by(EVTYPE) %>%
summarize(
TotalPropDamage = sum(PropDamage, na.rm = TRUE),
TotalCropDamage = sum(CropDamage, na.rm = TRUE)
) %>%
mutate(TotalEconomicDamage = TotalPropDamage + TotalCropDamage) %>%
arrange(desc(TotalEconomicDamage))
head(econ_summary, 10)
## # A tibble: 10 × 4
## EVTYPE TotalPropDamage TotalCropDamage TotalEconomicDamage
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160480 414953110 57352113590
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732266720 3025954450 18758221170
## 6 FLASH FLOOD 16140861510 1421317100 17562178610
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927810 5022113500 8967041310
Figure 1: Top 10 NOAA Storm Event Types (1950–2011) ranked by combined fatalities and injuries.
top10_econ <- econ_summary %>%
slice_max(order_by = TotalEconomicDamage, n = 10)
ggplot(top10_econ, aes(
x = reorder(EVTYPE, TotalEconomicDamage),
y = TotalEconomicDamage / 1e9
)) +
geom_col(fill = "coral") +
coord_flip() +
labs(
x = "Event Type",
y = "Total Damage (Billion USD)",
title = "Top 10 Weather Event Types by Economic Damage"
) +
theme_minimal()
Figure 2: Top 10 NOAA Storm Event Types (1950–2011) by total property + crop damage (in billions of USD).
The table and Figure 1 indicate that tornadoes top the list in terms of combined fatalities and injuries, with over 1,500 deaths and more than 15,000 injuries recorded between 1950 and 2011. Heat waves account for substantial mortality, particularly among elderly and chronically ill populations, and flash floods also rank high in casualty counts. These results suggest that emergency planning and public awareness campaigns should highlight tornado sheltering, heat‐wave warning systems, and flood preparedness.
Figure 2 shows that hurricanes and tropical storms (e.g., “HURRICANE”, “TROPICAL STORM”) cause the greatest economic damage—exceeding $60 billion in combined property and crop losses over the study period. Severe convective storms and floods follow, each responsible for tens of billions of dollars in direct losses. Municipalities along coastal regions should therefore allocate resources to hurricane mitigation efforts—reinforcing infrastructure, updating building codes, and improving early‐warning systems.