This report explores the NOAA Storm Database to analyze the effects of severe weather events in the United States. The data cover the period from 1950 to 2011 and include information on the type of event, as well as its impact on population health and economic damage. The main objective of this analysis is to identify which event types are most harmful with respect to fatalities and injuries, and which events have the greatest economic consequences. The dataset was processed by selecting relevant variables and transforming economic damage data into consistent units. The analysis shows that tornadoes are the most harmful to population health, while floods and hurricanes cause the greatest economic losses. The results highlight the importance of prioritizing resources for events with the most severe consequences.
The analysis uses the NOAA Storm Database, provided as a compressed CSV file. The data were loaded directly into R from the raw file without external preprocessing. For the purpose of this report, we selected only the variables relevant to population health (fatalities, injuries) and economic consequences (property damage, crop damage, and their exponents).
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
dim(storm_data)
## [1] 902297 37
head(storm_data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
The NOAA Storm Database contains information on severe weather events
in the United States between 1950 and 2011.
After loading, the dataset has 902297 rows and 37 columns.
For this analysis, we kept only the variables related to event type,
population health (fatalities and injuries), and economic consequences
(property and crop damage with their exponents).
This transformation was necessary to reduce the dataset to the variables
relevant to the research questions.
storm_data <- storm_data %>%
select(EVTYPE, BGN_DATE, FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP,
STATE)
head(storm_data)
## EVTYPE BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 TORNADO 4/18/1950 0:00:00 0 15 25.0 K 0
## 2 TORNADO 4/18/1950 0:00:00 0 0 2.5 K 0
## 3 TORNADO 2/20/1951 0:00:00 0 2 25.0 K 0
## 4 TORNADO 6/8/1951 0:00:00 0 2 2.5 K 0
## 5 TORNADO 11/15/1951 0:00:00 0 2 2.5 K 0
## 6 TORNADO 11/15/1951 0:00:00 0 6 2.5 K 0
## CROPDMGEXP STATE
## 1 AL
## 2 AL
## 3 AL
## 4 AL
## 5 AL
## 6 AL
ev_freq <- storm_data %>%
count(EVTYPE, sort = TRUE)
# Show top 50 event types
head(ev_freq, 50)
## EVTYPE n
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
## 7 THUNDERSTORM WINDS 20843
## 8 HIGH WIND 20212
## 9 LIGHTNING 15754
## 10 HEAVY SNOW 15708
## 11 HEAVY RAIN 11723
## 12 WINTER STORM 11433
## 13 WINTER WEATHER 7026
## 14 FUNNEL CLOUD 6839
## 15 MARINE TSTM WIND 6175
## 16 MARINE THUNDERSTORM WIND 5812
## 17 WATERSPOUT 3796
## 18 STRONG WIND 3566
## 19 URBAN/SML STREAM FLD 3392
## 20 WILDFIRE 2761
## 21 BLIZZARD 2719
## 22 DROUGHT 2488
## 23 ICE STORM 2006
## 24 EXCESSIVE HEAT 1678
## 25 HIGH WINDS 1533
## 26 WILD/FOREST FIRE 1457
## 27 FROST/FREEZE 1342
## 28 DENSE FOG 1293
## 29 WINTER WEATHER/MIX 1104
## 30 TSTM WIND/HAIL 1028
## 31 EXTREME COLD/WIND CHILL 1002
## 32 HEAT 767
## 33 HIGH SURF 725
## 34 TROPICAL STORM 690
## 35 FLASH FLOODING 682
## 36 EXTREME COLD 655
## 37 COASTAL FLOOD 650
## 38 LAKE-EFFECT SNOW 636
## 39 FLOOD/FLASH FLOOD 624
## 40 LANDSLIDE 600
## 41 SNOW 587
## 42 COLD/WIND CHILL 539
## 43 FOG 538
## 44 RIP CURRENT 470
## 45 MARINE HAIL 442
## 46 DUST STORM 427
## 47 AVALANCHE 386
## 48 WIND 340
## 49 RIP CURRENTS 304
## 50 STORM SURGE 261
storm_data <- storm_data %>%
mutate(
EVTYPE_clean = str_to_lower(EVTYPE),
EVTYPE_clean = str_replace_all(EVTYPE_clean, "[[:punct:]]", " "),
EVTYPE_clean = str_squish(EVTYPE_clean) # remove extra spaces
)
# Check first rows
head(storm_data$EVTYPE_clean)
## [1] "tornado" "tornado" "tornado" "tornado" "tornado" "tornado"
pattern_map <- list(
tornado = c("tornado"),
thunderstorm_wind = c("thunderstorm wind", "thunderstorm winds", "tstm wind", "tstm"),
flood = c("flood", "flash flood", "flooding"),
hail = c("hail"),
heat = c("heat wave", "extreme heat", "\\bheat\\b"),
cold = c("extreme cold", "cold wave", "cold/wind chill"),
hurricane = c("hurricane", "tropical storm", "tropical depression"),
wind = c("\\bwind\\b", "high wind"),
lightning = c("lightning"),
snow = c("snow", "blizzard"),
ice = c("ice", "sleet"),
coastal_flood = c("coastal flood", "storm surge", "surge"),
rip_current = c("rip current")
)
map_evtype <- function(ev) {
for (name in names(pattern_map)) {
pats <- pattern_map[[name]]
pattern <- paste0("(", paste(pats, collapse = "|"), ")")
if (str_detect(ev, pattern)) return(name)
}
return("other") # group all unmatched events
}
storm_data <- storm_data %>%
mutate(EVTYPE_simple = vapply(EVTYPE_clean, map_evtype, FUN.VALUE = character(1)))
# Check top 20 simplified event types
top_ev <- storm_data %>%
count(EVTYPE_simple, sort = TRUE)
head(top_ev, 20)
## EVTYPE_simple n
## 1 thunderstorm_wind 336688
## 2 hail 289283
## 3 flood 82726
## 4 other 61390
## 5 tornado 60700
## 6 wind 26607
## 7 snow 20404
## 8 lightning 15765
## 9 heat 2647
## 10 ice 2192
## 11 cold 1662
## 12 hurricane 1045
## 13 rip_current 777
## 14 coastal_flood 411
exp_to_num <- function(exp) {
case_when(
exp %in% c("H","h") ~ 100,
exp %in% c("K","k") ~ 1e3,
exp %in% c("M","m") ~ 1e6,
exp %in% c("B","b") ~ 1e9,
TRUE ~ 1
)
}
storm_data <- storm_data %>%
mutate(
PROPDMG_num = PROPDMG * exp_to_num(PROPDMGEXP),
CROPDMG_num = CROPDMG * exp_to_num(CROPDMGEXP),
TOTAL_ECONOMIC = PROPDMG_num + CROPDMG_num
)
health_summary <- storm_data %>%
group_by(EVTYPE_simple) %>%
summarize(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE)
) %>%
arrange(desc(total_fatalities + total_injuries))
# Show top 10 events affecting population health
head(health_summary, 10)
## # A tibble: 10 × 3
## EVTYPE_simple total_fatalities total_injuries
## <chr> <dbl> <dbl>
## 1 tornado 5661 91407
## 2 heat 3138 9224
## 3 thunderstorm_wind 728 9503
## 4 flood 1525 8604
## 5 other 1267 6638
## 6 lightning 817 5232
## 7 wind 538 1931
## 8 ice 99 2152
## 9 snow 265 1928
## 10 hurricane 201 1711
economic_summary <- storm_data %>%
group_by(EVTYPE_simple) %>%
summarize(
total_economic = sum(TOTAL_ECONOMIC, na.rm = TRUE)
) %>%
arrange(desc(total_economic))
# Show top 10 events causing economic damage
head(economic_summary, 10)
## # A tibble: 10 × 2
## EVTYPE_simple total_economic
## <chr> <dbl>
## 1 flood 179909795032.
## 2 hurricane 98682496360
## 3 tornado 59010559549.
## 4 coastal_flood 47966079000
## 5 other 39624569341
## 6 hail 19021507166.
## 7 thunderstorm_wind 11020123044.
## 8 ice 8984168660
## 9 wind 7005879233
## 10 snow 1932486802.
The table below shows the top 10 event types with the highest combined fatalities and injuries. The bar plot visualizes the same information, highlighting which events are most harmful to the population:
head(health_summary, 10)
## # A tibble: 10 × 3
## EVTYPE_simple total_fatalities total_injuries
## <chr> <dbl> <dbl>
## 1 tornado 5661 91407
## 2 heat 3138 9224
## 3 thunderstorm_wind 728 9503
## 4 flood 1525 8604
## 5 other 1267 6638
## 6 lightning 817 5232
## 7 wind 538 1931
## 8 ice 99 2152
## 9 snow 265 1928
## 10 hurricane 201 1711
top10_health <- health_summary %>%
slice_max(total_fatalities + total_injuries, n = 10)
ggplot(top10_health, aes(x = reorder(EVTYPE_simple, total_fatalities + total_injuries),
y = total_fatalities + total_injuries)) +
geom_col(fill = "tomato") +
coord_flip() +
labs(
title = "Top 10 Event Types by Health Impact (Fatalities + Injuries)",
x = "Event Type",
y = "Total Health Impact"
)
Figure 1. Bar plot showing the top 10 weather event types by health impact (fatalities + injuries).
The table below shows the top 10 event types with the highest total economic damage (property + crop). The bar plot visualizes the same information, highlighting which events have the greatest economic consequences:
head(economic_summary, 10)
## # A tibble: 10 × 2
## EVTYPE_simple total_economic
## <chr> <dbl>
## 1 flood 179909795032.
## 2 hurricane 98682496360
## 3 tornado 59010559549.
## 4 coastal_flood 47966079000
## 5 other 39624569341
## 6 hail 19021507166.
## 7 thunderstorm_wind 11020123044.
## 8 ice 8984168660
## 9 wind 7005879233
## 10 snow 1932486802.
top10_econ <- economic_summary %>%
slice_max(total_economic, n = 10)
ggplot(top10_econ, aes(x = reorder(EVTYPE_simple, total_economic),
y = total_economic)) +
geom_col(fill = "steelblue") +
coord_flip() +
scale_y_continuous(labels = scales::dollar_format()) +
labs(
title = "Top 10 Event Types by Economic Damage",
x = "Event Type",
y = "Total Economic Damage (USD)"
)
Figure 2. Bar plot showing the top 10 weather event types by total economic damage (property + crops).
The analysis shows that tornadoes are the most harmful weather events
to population health, causing the largest number of fatalities and
injuries.
In terms of economic consequences, floods and hurricanes/tropical storms
are responsible for the greatest financial losses.
These findings suggest that preventive resources and emergency planning
should prioritize these high-impact event types.