This document addresses the health and economic challenges arising from severe weather events in the US, analyzing the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The objective is to identify which types of events cause the greatest harm, with respect to health (measured by injuries and fatalities) and economy (measured by damages to property and crops). By analyzing the data below, it is evident that, in terms of health impact, the most harmful events are tornadoes, responsible for the highest numbers of fatalities and injuries. Another severe event causing over 1,000 fatalities is excessive heat, while all other events are considerably less harmful regarding injuries than tornadoes. The economic impact tells a different story. Regarding property damage, floods represent the most damaging events, followed by hurricanes/typhoons, tornadoes, and storm surges, in that order. Each of these event types incurred damages exceeding $40 billion USD during the evaluated period. For crop damage, droughts are the most detrimental, followed by floods, river floods, and ice storms. Considering total damage (the sum of property and crop damage), floods remain the most destructive event type, followed by typhoons and tornadoes.
The data for the analysis can be downloaded from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
The relative documentation can be found on https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
In order to perform the analysis, we need to load the following R packages.
library(knitr)
library(ggplot2)
library(dplyr)
First, we proceed to download and import the data from the online source.
## download file containing the data
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, "stormdata.csv.bz2", method="curl")
## importing data to R
stormdata <- read.csv("stormdata.csv.bz2")
Since the dataset contains many variables irrelevant to our purposes, we subset it to keep only the relevant variables, namely: - EVTYPE: the type of severe weather event - FATALITIES: number of fatalities caused by the event - INJURIES: number of injuries caused by the event - PROPDMG: monetary value of property damage caused by the event - PROPDMGEXP: exponent multiplier for property damage - CROPDMG: monetary value of crop damage caused by the event - CROPDMGEXP: exponent multiplier for crop damage
relevant_var <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG",
"CROPDMGEXP")
stormdata <- stormdata [relevant_var]
These economic damage variables combine a base value and an exponent. The data codebook provides limited details about the exponent levels, so we investigate the existing levels.
tabprop <- as.data.frame(table(stormdata$PROPDMGEXP))
colnames(tabprop) <- c("Value", "Frequency")
knitr::kable(tabprop, caption = "Frequencies of PROPDMGEXP", format ="markdown")
| Value | Frequency |
|---|---|
| 465934 | |
| - | 1 |
| ? | 8 |
| + | 5 |
| 0 | 216 |
| 1 | 25 |
| 2 | 13 |
| 3 | 4 |
| 4 | 4 |
| 5 | 28 |
| 6 | 4 |
| 7 | 5 |
| 8 | 1 |
| B | 40 |
| h | 1 |
| H | 6 |
| K | 424665 |
| m | 7 |
| M | 11330 |
tabcrop <- as.data.frame(table(stormdata$CROPDMGEXP))
colnames(tabcrop) <- c("Value", "Frequency")
knitr::kable(tabcrop, caption = "Frequencies of CROPDMGEXP", format ="markdown")
| Value | Frequency |
|---|---|
| 618413 | |
| ? | 7 |
| 0 | 19 |
| 2 | 1 |
| B | 9 |
| k | 21 |
| K | 281832 |
| m | 1 |
| M | 1994 |
The meaning of some codes is indicated in the codebook:
- H or h:
hundred
- K or k: thousand (kilo)
- M or m: million
- B
or b: billion
Numeric values are assumed to represent powers of ten, while symbols like ” “,”+“,”-“, and”?” are unclear and assumed to represent an exponent of zero.
Employing these assumptions, two new variables are created containing total property and crop damage values by multiplying the base values with corresponding exponent multipliers.
stormdata <- stormdata %>%
mutate(PROPDMG_USD = case_when(
PROPDMGEXP == "" ~ PROPDMG * 1e0,
PROPDMGEXP == "-" ~ PROPDMG * 1e0,
PROPDMGEXP == "?" ~ PROPDMG * 1e0,
PROPDMGEXP == "+" ~ PROPDMG * 1e0,
PROPDMGEXP == "0" ~ PROPDMG * 1e0,
PROPDMGEXP == "1" ~ PROPDMG * 1e1,
PROPDMGEXP == "2" ~ PROPDMG * 1e2,
PROPDMGEXP == "h" ~ PROPDMG * 1e2,
PROPDMGEXP == "H" ~ PROPDMG * 1e2,
PROPDMGEXP == "3" ~ PROPDMG * 1e3,
PROPDMGEXP == "k" ~ PROPDMG * 1e3,
PROPDMGEXP == "K" ~ PROPDMG * 1e3,
PROPDMGEXP == "4" ~ PROPDMG * 1e4,
PROPDMGEXP == "5" ~ PROPDMG * 1e5,
PROPDMGEXP == "6" ~ PROPDMG * 1e6,
PROPDMGEXP == "m" ~ PROPDMG * 1e6,
PROPDMGEXP == "M" ~ PROPDMG * 1e6,
PROPDMGEXP == "7" ~ PROPDMG * 1e7,
PROPDMGEXP == "8" ~ PROPDMG * 1e8,
PROPDMGEXP == "B" ~ PROPDMG * 1e9,
TRUE ~ PROPDMG
)) %>%
mutate(CROPDMG_USD = case_when(
CROPDMGEXP == "" ~ CROPDMG * 1e0,
CROPDMGEXP == "?" ~ CROPDMG * 1e0,
CROPDMGEXP == "0" ~ CROPDMG * 1e0,
CROPDMGEXP == "2" ~ CROPDMG * 1e2,
CROPDMGEXP == "k" ~ CROPDMG * 1e3,
CROPDMGEXP == "K" ~ CROPDMG * 1e3,
CROPDMGEXP == "m" ~ CROPDMG * 1e6,
CROPDMGEXP == "M" ~ CROPDMG * 1e6,
CROPDMGEXP == "B" ~ CROPDMG * 1e9,
TRUE ~ CROPDMG
))
Next, we group impact variables by event type to calculate overall health and economic impact. Since both property and crop damage are measured in USD, their sums represent total damage per event type. We create a new variable, TOT_DMG, as the sum of property and crop damages.
stormdata_tot <- stormdata %>%
group_by(EVTYPE) %>%
summarise(
TOT_FATALITIES = sum(FATALITIES),
TOT_INJURIES = sum(INJURIES),
TOT_PROPDMG = sum (PROPDMG_USD),
TOT_CROPDMG = sum (CROPDMG_USD),
) %>%
mutate(TOT_DMG = TOT_PROPDMG + TOT_CROPDMG)
The EVTYPE variable contains many unique values.
n_distinct(stormdata$EVTYPE)
## [1] 985
To better understand the phenomenon, we show only the top 10 event types by magnitude for each impact measure.
First, we examine the impact of severe weather events in terms of health.
fatalities_top10 <- stormdata_tot %>%
select(EVTYPE, TOT_FATALITIES) %>%
arrange(desc(TOT_FATALITIES)) %>%
slice_head(n = 10)
colnames(fatalities_top10) <- c("Event type", "Total fatalities")
knitr::kable(fatalities_top10, caption = "First ten events type by fatalities")
| Event type | Total fatalities |
|---|---|
| TORNADO | 5633 |
| EXCESSIVE HEAT | 1903 |
| FLASH FLOOD | 978 |
| HEAT | 937 |
| LIGHTNING | 816 |
| TSTM WIND | 504 |
| FLOOD | 470 |
| RIP CURRENT | 368 |
| HIGH WIND | 248 |
| AVALANCHE | 224 |
injuries_top10 <- stormdata_tot %>%
select(EVTYPE, TOT_INJURIES) %>%
arrange(desc(TOT_INJURIES)) %>%
slice_head(n = 10)
colnames(injuries_top10) <- c("Event type", "Total injuries")
knitr::kable(injuries_top10, caption = "First ten events type by injuries")
| Event type | Total injuries |
|---|---|
| TORNADO | 91346 |
| TSTM WIND | 6957 |
| FLOOD | 6789 |
| EXCESSIVE HEAT | 6525 |
| LIGHTNING | 5230 |
| HEAT | 2100 |
| ICE STORM | 1975 |
| FLASH FLOOD | 1777 |
| THUNDERSTORM WIND | 1488 |
| HAIL | 1361 |
par(mfrow = c(1,2),
mar = c(15, 6, 5, 4),
mgp=c(3, 1, 0),
cex = 0.8
)
yvals1 <- pretty(range(fatalities_top10$`Total fatalities`))
barplot(fatalities_top10$`Total fatalities`,
las = 3,
names.arg = fatalities_top10$`Event type`,
main = "Top 10 type of events\nwith highest fatalities",
ylab = "Number of fatalities (thousand)",
col = "red",
axes = FALSE
)
axis(2, at = yvals1,
labels = paste0(format(yvals1 / 1000, big.mark = ","))
)
yvals2 <- pretty(range(injuries_top10$`Total injuries`))
barplot(injuries_top10$`Total injuries`,
las = 3,
names.arg = injuries_top10$`Event type`,
main = "Top 10 type of events\nwith highest injuries",
ylab = "Number of injuries (thousand)",
col = "orange",
axes = FALSE
)
axis(2, at = yvals2,
labels = paste0(format(yvals2 / 1000, big.mark = ","))
)
These results show that tornadoes are the most harmful events in terms of fatalities and injuries. Excessive heat also caused more than 1,000 fatalities, whereas all other events caused significantly fewer injuries than tornadoes.
Next, we show the top 10 events by impact for property damage, crop damage, and total damage.
propdmg_top10 <- stormdata_tot %>%
select(EVTYPE, TOT_PROPDMG) %>%
arrange(desc(TOT_PROPDMG)) %>%
slice_head(n = 10)
colnames(propdmg_top10) <- c("Event type", "Total property damage")
knitr::kable(propdmg_top10, caption = "First ten events type by cost of property damage (USD)")
| Event type | Total property damage |
|---|---|
| FLOOD | 144657709807 |
| HURRICANE/TYPHOON | 69305840000 |
| TORNADO | 56947380676 |
| STORM SURGE | 43323536000 |
| FLASH FLOOD | 16822673978 |
| HAIL | 15735267513 |
| HURRICANE | 11868319010 |
| TROPICAL STORM | 7703890550 |
| WINTER STORM | 6688497251 |
| HIGH WIND | 5270046295 |
cropdmg_top10 <- stormdata_tot %>%
select(EVTYPE, TOT_CROPDMG) %>%
arrange(desc(TOT_CROPDMG)) %>%
slice_head(n = 10)
colnames(cropdmg_top10) <- c("Event type", "Total crop damage")
knitr::kable(cropdmg_top10, caption = "First ten events type by cost of crop damage (USD)")
| Event type | Total crop damage |
|---|---|
| DROUGHT | 13972566000 |
| FLOOD | 5661968450 |
| RIVER FLOOD | 5029459000 |
| ICE STORM | 5022113500 |
| HAIL | 3025954473 |
| HURRICANE | 2741910000 |
| HURRICANE/TYPHOON | 2607872800 |
| FLASH FLOOD | 1421317100 |
| EXTREME COLD | 1292973000 |
| FROST/FREEZE | 1094086000 |
totdmg_top10 <- stormdata_tot %>%
select(EVTYPE, TOT_DMG) %>%
arrange(desc(TOT_DMG)) %>%
slice_head(n = 10)
colnames(totdmg_top10) <- c("Event type", "Total damage")
knitr::kable(totdmg_top10, caption = "First ten events type by cost of total damage (USD)")
| Event type | Total damage |
|---|---|
| FLOOD | 150319678257 |
| HURRICANE/TYPHOON | 71913712800 |
| TORNADO | 57362333946 |
| STORM SURGE | 43323541000 |
| HAIL | 18761221986 |
| FLASH FLOOD | 18243991078 |
| DROUGHT | 15018672000 |
| HURRICANE | 14610229010 |
| RIVER FLOOD | 10148404500 |
| ICE STORM | 8967041360 |
par(mfrow = c(1,2),
mar = c(15, 6, 5, 4),
mgp=c(3, 1, 0),
cex = 0.8
)
yvals3 <- pretty(range(propdmg_top10$`Total property damage`))
barplot(propdmg_top10$`Total property damage`,
las = 3,
names.arg = propdmg_top10$`Event type`,
main = "Top 10 type of events\nwith highest property damage",
ylab = "Cost of property damage (billion USD)",
col = "dark blue",
axes = FALSE
)
axis(2, at = yvals3,
labels = paste0(format(yvals3 / 1000000000, big.mark = ","))
)
yvals4 <- pretty(range(cropdmg_top10$`Total crop damage`))
barplot(cropdmg_top10$`Total crop damage`,
las = 3,
names.arg = cropdmg_top10$`Event type`,
main = "Top 10 type of events\nwith highest crop damage",
ylab = "Cost of crop damage (billion USD)",
col = "dark green",
axes = FALSE
)
axis(2, at = yvals4,
labels = paste0(format(yvals4 / 1000000000, big.mark = ","))
)
yvals5 <- pretty(range(totdmg_top10$`Total damage`))
barplot(totdmg_top10$`Total damage`,
las = 3,
names.arg = totdmg_top10$`Event type`,
main = "Top 10 type of events\nwith highest total damage",
ylab = "Cost of total damage (billion USD)",
col = "black",
axes = FALSE
)
axis(2, at = yvals5,
labels = paste0(format(yvals5 / 1000000000, big.mark = ","))
)
The data show that floods are the most damaging events with respect to property damage, followed by hurricanes/typhoons, tornadoes, and storm surges, each causing over $40 billion USD in damages during the period considered. Regarding crop damage, droughts were the most destructive, followed by floods, river floods, and ice storms. Total damage amounts confirm that floods remain the most damaging event type, followed by typhoons and tornadoes.