Impact of severe weather events in the US: analysis of the NOAA storm data

Synopsis

This report analyzes the impact of severe weather events in the United States using the NOAA Storm Database from 1993 onward. The analysis focuses on two key questions: which types of events are most harmful to population health, and which have the greatest economic consequences. To ensure accuracy and consistency, the data were cleaned, filtered for completeness, and damage estimates were standardized. Results show that tornadoes are the leading cause of fatalities and injuries nationwide, but the most harmful event types vary by state. For example, floods are the most harmful in Texas, while excessive heat is a major concern in Missouri. In terms of economic impact, floods have caused the highest total damage, followed by hurricanes/typhoons and storm surges. Hail, drought, and tornadoes also contribute significantly to economic losses. The analysis highlights the importance of considering both health and economic outcomes when preparing for severe weather events. Regional patterns reveal that risk priorities differ across states, underlining the need for tailored preparedness strategies.

Data processing

Loading data

The dataset was downloaded from the website (bz2 file). Through the upload function, the datset was loaded into the current working directroy. Now we are going to read the bz2 file:

storm_data <- read.csv("repdata_data_StormData.csv.bz2")

Cleaning data

For our analysis, we are going to focus on the following variables: EVTYPE, FATILITIES, INJURIES, PROPDMG, CROPDMG and STATE. Let’s set NA’s to zero’s.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

storm_data$FATALITIES[is.na(storm_data$FATALITIES)] <- 0
storm_data$INJURIES[is.na(storm_data$INJURIES)] <- 0
storm_data$PROPDMG[is.na(storm_data$PROPDMG)] <- 0
storm_data$CROPDMG[is.na(storm_data$CROPDMG)] <- 0
storm_data$STATE[is.na(storm_data$STATE)] <- 0

The data may be less complete for older years. Lets check by examining the number of unique event types reported in each year of the dataset.

First we convert the BGN_DATA to year

storm_data$BGN_DATE <- as.POSIXct(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
storm_data$year <- as.numeric(format(storm_data$BGN_DATE, "%Y"))

Count the number of unique event types per year:

event_types_by_year <- aggregate(EVTYPE ~ year, data = storm_data, function(x) length(unique(x)))

event_types_by_year

##    year EVTYPE
## 1  1950      1
## 2  1951      1
## 3  1952      1
## 4  1953      1
## 5  1954      1
## 6  1955      3
## 7  1956      3
## 8  1957      3
## 9  1958      3
## 10 1959      3
## 11 1960      3
## 12 1961      3
## 13 1962      3
## 14 1963      3
## 15 1964      3
## 16 1965      3
## 17 1966      3
## 18 1967      3
## 19 1968      3
## 20 1969      3
## 21 1970      3
## 22 1971      3
## 23 1972      3
## 24 1973      3
## 25 1974      3
## 26 1975      3
## 27 1976      3
## 28 1977      3
## 29 1978      3
## 30 1979      3
## 31 1980      3
## 32 1981      3
## 33 1982      3
## 34 1983      3
## 35 1984      3
## 36 1985      3
## 37 1986      3
## 38 1987      3
## 39 1988      3
## 40 1989      3
## 41 1990      3
## 42 1991      3
## 43 1992      3
## 44 1993    160
## 45 1994    267
## 46 1995    387
## 47 1996    228
## 48 1997    170
## 49 1998    126
## 50 1999    121
## 51 2000    112
## 52 2001    122
## 53 2002     99
## 54 2003     51
## 55 2004     38
## 56 2005     46
## 57 2006     50
## 58 2007     46
## 59 2008     46
## 60 2009     46
## 61 2010     46
## 62 2011     46

The table shows that from 1950 to 1992, only 1 or 3 event types were recorded each year, and then the number jumps dramatically starting in 1993. This demonstrates that the data before 1993 is incomplete in terms of event type coverage. Therefore, we are only going to use the more recent data (1993 onwards)

storm_data_filtered <- storm_data %>% filter(year >= 1993)

Now we select only the relevant columns:

storm_data2 <- storm_data_filtered[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "BGN_DATE", "STATE")]

Let’s create a new variable: total harmed by adding fatalities and injuries

storm_data2$total_harmed <- storm_data2$FATALITIES + storm_data2$INJURIES

To prepare for economic analysis, we convert the damage exponent variables to numeric multipliers and calculate total property, crop, and overall economic damage in U.S. dollars for each event.

Convert damage exponents and calculate total economic damage.

Define exponent mapping:

exp_map <- c('h' = 1e2, 'H' = 1e2,
             'k' = 1e3, 'K' = 1e3,
             'm' = 1e6, 'M' = 1e6,
             'b' = 1e9, 'B' = 1e9,
             '0' = 1, '1' = 10, '2' = 100, '3' = 1000, 
             '4' = 10000, '5' = 1e5, '6' = 1e6, '7' = 1e7, 
             '8' = 1e8, '9' = 1e9, 
             '+' = 1, '-' = 0, '?' = 0)

When mapping, set everything not matched to 1 Property damage:

storm_data2$prop_dmg_num <- storm_data2$PROPDMG * 
  ifelse(is.na(exp_map[as.character(storm_data2$PROPDMGEXP)]), 1, exp_map[as.character(storm_data2$PROPDMGEXP)])

Crop damage:

storm_data2$crop_dmg_num <- storm_data2$CROPDMG * 
  ifelse(is.na(exp_map[as.character(storm_data2$CROPDMGEXP)]), 1, exp_map[as.character(storm_data2$CROPDMGEXP)])

Total:

storm_data2$total_econ_dmg <- storm_data2$prop_dmg_num + storm_data2$crop_dmg_num

Results

Across the United States, which types of events are most harmful with respect to population health?

To determine which event types are most harmful to population health, we summed the total number of fatalities and injuries for each event type. The table and plot below show the top 10 event types that caused the highest combined number of fatalities and injuries. Note that due to inconsistencies in event type naming, some similar events may appear under different names. The results indicate that tornadoes are by far the most harmful event type, followed by excessive heat, TSTM wind, floods, and lightning.

evtype_health <- storm_data2 %>%
  group_by(EVTYPE) %>%
  summarise(total_harmed = sum(total_harmed, na.rm = TRUE)) %>%
  arrange(desc(total_harmed))

## `summarise()` ungrouping output (override with `.groups` argument)

Let’s look at the top 10:

head(evtype_health, 10)

## # A tibble: 10 x 2
##    EVTYPE            total_harmed
##    <chr>                    <dbl>
##  1 TORNADO                  24931
##  2 EXCESSIVE HEAT            8428
##  3 FLOOD                     7259
##  4 LIGHTNING                 6046
##  5 TSTM WIND                 3872
##  6 HEAT                      3037
##  7 FLASH FLOOD               2755
##  8 ICE STORM                 2064
##  9 THUNDERSTORM WIND         1621
## 10 WINTER STORM              1527

Now, lets visualize the top 10

top10_health <- evtype_health[1:10, ]
library(ggplot2)
ggplot(top10_health, aes(x = reorder(EVTYPE, total_harmed), y = total_harmed)) +
  geom_bar(stat = "identity", fill = "red") +
  coord_flip() +
  labs(
    title = "Top 10 Most Harmful Event Types (Population Health, 1993+)",
    x = "Event Type",
    y = "Total Harm (Fatalities + Injuries)"
  )

The table and plot above show that tornadoes are by far the most harmful weather event in terms of both fatalities and injuries. Other significant event types include excessive heat, TSTM wind, flood, and lightning. Please note that some event types may have inconsistent naming in the dataset, which could lead to similar events being listed separately.

Now we are going to look at the most harmful events by state

evtype_health_by_state <- storm_data2 %>%
  group_by(STATE, EVTYPE) %>%
  summarise(total_harmed = sum(total_harmed, na.rm = TRUE)) %>%
  arrange(STATE, desc(total_harmed))

## `summarise()` regrouping output by 'STATE' (override with `.groups` argument)

We are going to create a table with the top 10 by state

top10_by_state <- evtype_health_by_state %>%
  group_by(STATE) %>%
  slice_max(order_by = total_harmed, n = 10)

Now, lets make plots of the top 5 states with the most total harm

top_states <- evtype_health_by_state %>%
  group_by(STATE) %>%
  summarise(state_total = sum(total_harmed)) %>%
  arrange(desc(state_total)) %>%
  slice_head(n = 5) %>%
  pull(STATE)

## `summarise()` ungrouping output (override with `.groups` argument)

top10_events_top_states <- evtype_health_by_state %>%
  filter(STATE %in% top_states) %>%
  group_by(STATE) %>%
  slice_max(order_by = total_harmed, n = 10)

ggplot(top10_events_top_states, aes(x = reorder(EVTYPE, total_harmed), y = total_harmed, fill = STATE)) +
  geom_bar(stat = "identity") +
  coord_flip() +
  facet_wrap(~STATE, scales = "free_y") +
  labs(
    title = "Top 10 Most Harmful Event Types in the 5 Most Impacted States",
    x = "Event Type",
    y = "Total Harm (Fatalities + Injuries)"
  ) +
  theme(legend.position = "none")

The plot above shows the top 10 most harmful weather event types for the top 5 states with most harmfull events. The results demonstrate that the most impactful event types can vary considerably by state. For example, tornadoes are the leading cause of harm in Alabama, while floods have the greatest impact in Texas. This highlights the importance of considering regional variation when preparing for severe weather events.

Across the United States, which types of events are most harmful with respect to population health?

We summarized the total economic damage by event type

evtype_econ <- storm_data2 %>%
  group_by(EVTYPE) %>%
  summarise(total_econ_dmg = sum(total_econ_dmg, na.rm = TRUE)) %>%
  arrange(desc(total_econ_dmg))

## `summarise()` ungrouping output (override with `.groups` argument)

Lets view the top 10:

top10_econ <- head(evtype_econ, 10)
top10_econ

## # A tibble: 10 x 2
##    EVTYPE            total_econ_dmg
##    <chr>                      <dbl>
##  1 FLOOD              150319678257 
##  2 HURRICANE/TYPHOON   71913712800 
##  3 STORM SURGE         43323541000 
##  4 TORNADO             26764135376.
##  5 HAIL                18761221986.
##  6 FLASH FLOOD         18243991078.
##  7 DROUGHT             15018672000 
##  8 HURRICANE           14610229010 
##  9 RIVER FLOOD         10148404500 
## 10 ICE STORM            8967041360

And visualize:

ggplot(top10_econ, aes(x = reorder(EVTYPE, total_econ_dmg), y = total_econ_dmg / 1e9)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Economic Damage (1993+)",
    x = "Event Type",
    y = "Total Economic Damage (Billion USD)"
  )

To determine which event types have the greatest economic consequences, we calculated the total property and crop damage for each event type, converting all damage estimates to U.S. dollars.The analysis shows that floods have caused the highest total economic damage in the United States since 1993, followed by hurricanes/typhoons and storm surges. Other event types with significant economic impact include tornadoes, hail, flash floods, and droughts. These findings highlight the substantial financial risk associated with various severe weather events.