This report analyzes the impact of severe weather events in the United States using the NOAA Storm Database from 1993 onward. The analysis focuses on two key questions: which types of events are most harmful to population health, and which have the greatest economic consequences. To ensure accuracy and consistency, the data were cleaned, filtered for completeness, and damage estimates were standardized. Results show that tornadoes are the leading cause of fatalities and injuries nationwide, but the most harmful event types vary by state. For example, floods are the most harmful in Texas, while excessive heat is a major concern in Missouri. In terms of economic impact, floods have caused the highest total damage, followed by hurricanes/typhoons and storm surges. Hail, drought, and tornadoes also contribute significantly to economic losses. The analysis highlights the importance of considering both health and economic outcomes when preparing for severe weather events. Regional patterns reveal that risk priorities differ across states, underlining the need for tailored preparedness strategies.
The dataset was downloaded from the website (bz2 file). Through the upload function, the datset was loaded into the current working directroy. Now we are going to read the bz2 file:
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
For our analysis, we are going to focus on the following variables: EVTYPE, FATILITIES, INJURIES, PROPDMG, CROPDMG and STATE. Let’s set NA’s to zero’s.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
storm_data$FATALITIES[is.na(storm_data$FATALITIES)] <- 0
storm_data$INJURIES[is.na(storm_data$INJURIES)] <- 0
storm_data$PROPDMG[is.na(storm_data$PROPDMG)] <- 0
storm_data$CROPDMG[is.na(storm_data$CROPDMG)] <- 0
storm_data$STATE[is.na(storm_data$STATE)] <- 0
The data may be less complete for older years. Lets check by examining the number of unique event types reported in each year of the dataset.
First we convert the BGN_DATA to year
storm_data$BGN_DATE <- as.POSIXct(storm_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")
storm_data$year <- as.numeric(format(storm_data$BGN_DATE, "%Y"))
Count the number of unique event types per year:
event_types_by_year <- aggregate(EVTYPE ~ year, data = storm_data, function(x) length(unique(x)))
event_types_by_year
## year EVTYPE
## 1 1950 1
## 2 1951 1
## 3 1952 1
## 4 1953 1
## 5 1954 1
## 6 1955 3
## 7 1956 3
## 8 1957 3
## 9 1958 3
## 10 1959 3
## 11 1960 3
## 12 1961 3
## 13 1962 3
## 14 1963 3
## 15 1964 3
## 16 1965 3
## 17 1966 3
## 18 1967 3
## 19 1968 3
## 20 1969 3
## 21 1970 3
## 22 1971 3
## 23 1972 3
## 24 1973 3
## 25 1974 3
## 26 1975 3
## 27 1976 3
## 28 1977 3
## 29 1978 3
## 30 1979 3
## 31 1980 3
## 32 1981 3
## 33 1982 3
## 34 1983 3
## 35 1984 3
## 36 1985 3
## 37 1986 3
## 38 1987 3
## 39 1988 3
## 40 1989 3
## 41 1990 3
## 42 1991 3
## 43 1992 3
## 44 1993 160
## 45 1994 267
## 46 1995 387
## 47 1996 228
## 48 1997 170
## 49 1998 126
## 50 1999 121
## 51 2000 112
## 52 2001 122
## 53 2002 99
## 54 2003 51
## 55 2004 38
## 56 2005 46
## 57 2006 50
## 58 2007 46
## 59 2008 46
## 60 2009 46
## 61 2010 46
## 62 2011 46
The table shows that from 1950 to 1992, only 1 or 3 event types were recorded each year, and then the number jumps dramatically starting in 1993. This demonstrates that the data before 1993 is incomplete in terms of event type coverage. Therefore, we are only going to use the more recent data (1993 onwards)
storm_data_filtered <- storm_data %>% filter(year >= 1993)
Now we select only the relevant columns:
storm_data2 <- storm_data_filtered[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "BGN_DATE", "STATE")]
Let’s create a new variable: total harmed by adding fatalities and injuries
storm_data2$total_harmed <- storm_data2$FATALITIES + storm_data2$INJURIES
To prepare for economic analysis, we convert the damage exponent variables to numeric multipliers and calculate total property, crop, and overall economic damage in U.S. dollars for each event.
Convert damage exponents and calculate total economic damage.
Define exponent mapping:
exp_map <- c('h' = 1e2, 'H' = 1e2,
'k' = 1e3, 'K' = 1e3,
'm' = 1e6, 'M' = 1e6,
'b' = 1e9, 'B' = 1e9,
'0' = 1, '1' = 10, '2' = 100, '3' = 1000,
'4' = 10000, '5' = 1e5, '6' = 1e6, '7' = 1e7,
'8' = 1e8, '9' = 1e9,
'+' = 1, '-' = 0, '?' = 0)
When mapping, set everything not matched to 1 Property damage:
storm_data2$prop_dmg_num <- storm_data2$PROPDMG *
ifelse(is.na(exp_map[as.character(storm_data2$PROPDMGEXP)]), 1, exp_map[as.character(storm_data2$PROPDMGEXP)])
Crop damage:
storm_data2$crop_dmg_num <- storm_data2$CROPDMG *
ifelse(is.na(exp_map[as.character(storm_data2$CROPDMGEXP)]), 1, exp_map[as.character(storm_data2$CROPDMGEXP)])
Total:
storm_data2$total_econ_dmg <- storm_data2$prop_dmg_num + storm_data2$crop_dmg_num
To determine which event types are most harmful to population health, we summed the total number of fatalities and injuries for each event type. The table and plot below show the top 10 event types that caused the highest combined number of fatalities and injuries. Note that due to inconsistencies in event type naming, some similar events may appear under different names. The results indicate that tornadoes are by far the most harmful event type, followed by excessive heat, TSTM wind, floods, and lightning.
evtype_health <- storm_data2 %>%
group_by(EVTYPE) %>%
summarise(total_harmed = sum(total_harmed, na.rm = TRUE)) %>%
arrange(desc(total_harmed))
## `summarise()` ungrouping output (override with `.groups` argument)
Let’s look at the top 10:
head(evtype_health, 10)
## # A tibble: 10 x 2
## EVTYPE total_harmed
## <chr> <dbl>
## 1 TORNADO 24931
## 2 EXCESSIVE HEAT 8428
## 3 FLOOD 7259
## 4 LIGHTNING 6046
## 5 TSTM WIND 3872
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
Now, lets visualize the top 10
top10_health <- evtype_health[1:10, ]
library(ggplot2)
ggplot(top10_health, aes(x = reorder(EVTYPE, total_harmed), y = total_harmed)) +
geom_bar(stat = "identity", fill = "red") +
coord_flip() +
labs(
title = "Top 10 Most Harmful Event Types (Population Health, 1993+)",
x = "Event Type",
y = "Total Harm (Fatalities + Injuries)"
)
The table and plot above show that tornadoes are by far the most harmful weather event in terms of both fatalities and injuries. Other significant event types include excessive heat, TSTM wind, flood, and lightning. Please note that some event types may have inconsistent naming in the dataset, which could lead to similar events being listed separately.
Now we are going to look at the most harmful events by state
evtype_health_by_state <- storm_data2 %>%
group_by(STATE, EVTYPE) %>%
summarise(total_harmed = sum(total_harmed, na.rm = TRUE)) %>%
arrange(STATE, desc(total_harmed))
## `summarise()` regrouping output by 'STATE' (override with `.groups` argument)
We are going to create a table with the top 10 by state
top10_by_state <- evtype_health_by_state %>%
group_by(STATE) %>%
slice_max(order_by = total_harmed, n = 10)
Now, lets make plots of the top 5 states with the most total harm
top_states <- evtype_health_by_state %>%
group_by(STATE) %>%
summarise(state_total = sum(total_harmed)) %>%
arrange(desc(state_total)) %>%
slice_head(n = 5) %>%
pull(STATE)
## `summarise()` ungrouping output (override with `.groups` argument)
top10_events_top_states <- evtype_health_by_state %>%
filter(STATE %in% top_states) %>%
group_by(STATE) %>%
slice_max(order_by = total_harmed, n = 10)
ggplot(top10_events_top_states, aes(x = reorder(EVTYPE, total_harmed), y = total_harmed, fill = STATE)) +
geom_bar(stat = "identity") +
coord_flip() +
facet_wrap(~STATE, scales = "free_y") +
labs(
title = "Top 10 Most Harmful Event Types in the 5 Most Impacted States",
x = "Event Type",
y = "Total Harm (Fatalities + Injuries)"
) +
theme(legend.position = "none")
The plot above shows the top 10 most harmful weather event types for the top 5 states with most harmfull events. The results demonstrate that the most impactful event types can vary considerably by state. For example, tornadoes are the leading cause of harm in Alabama, while floods have the greatest impact in Texas. This highlights the importance of considering regional variation when preparing for severe weather events.
We summarized the total economic damage by event type
evtype_econ <- storm_data2 %>%
group_by(EVTYPE) %>%
summarise(total_econ_dmg = sum(total_econ_dmg, na.rm = TRUE)) %>%
arrange(desc(total_econ_dmg))
## `summarise()` ungrouping output (override with `.groups` argument)
Lets view the top 10:
top10_econ <- head(evtype_econ, 10)
top10_econ
## # A tibble: 10 x 2
## EVTYPE total_econ_dmg
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 STORM SURGE 43323541000
## 4 TORNADO 26764135376.
## 5 HAIL 18761221986.
## 6 FLASH FLOOD 18243991078.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
And visualize:
ggplot(top10_econ, aes(x = reorder(EVTYPE, total_econ_dmg), y = total_econ_dmg / 1e9)) +
geom_bar(stat = "identity", fill = "darkgreen") +
coord_flip() +
labs(
title = "Top 10 Event Types by Economic Damage (1993+)",
x = "Event Type",
y = "Total Economic Damage (Billion USD)"
)
To determine which event types have the greatest economic consequences, we calculated the total property and crop damage for each event type, converting all damage estimates to U.S. dollars.The analysis shows that floods have caused the highest total economic damage in the United States since 1993, followed by hurricanes/typhoons and storm surges. Other event types with significant economic impact include tornadoes, hail, flash floods, and droughts. These findings highlight the substantial financial risk associated with various severe weather events.