Storms exact massive tolls on the United States every year. This
report aims to quantify the effects to inform policy decision.
We find that flooding stands out as the most damaging, with tornados,
cold weather and fires as top contributors as well.
We begin with the Storm Database from the National Oceanic and
Atmospheric Administration, which has tabulated over 250,000
storms between 1950 and 2011. We break effects from storms down into
three areas: fatalities, injuries and cost of damage. Some
subjectivity was required when sorting storms by type as the NOAA data
included many similar storm type names. The three deadliest
storm types were: tornadoes, heat events and flooding. The three most
injurious were: flooding, cold events and fires. And finally, the
costliest were: drought, fires and cold events.
The data is available from NOAA and needs to be downloaded and read into an object called ‘storms’:
library(tidyverse)
library(R.utils)
knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE, echo=TRUE, cache.lazy=FALSE, fig.path='figures/')
setwd("C:/Users/m304650/Documents/RStudio Projects/Reproducible-Research-Activity-Project")
if (!file.exists("repdata_data_StormData.csv.bz2"))
{
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata_data_StormData.csv.bz2")
}
if (!file.exists("repdata_data_StormData.csv"))
{
bunzip2("./repdata_data_StormData.csv.bz2", remove=FALSE)
}
storms <- read_csv("repdata_data_StormData.csv", show_col_types=FALSE)
Now that the data has been read in, it needs to be tidied.
First we need to clean up the property and crop damage. The values
are stored in two columns. The first a number from 1-999, then a
separate column of B, M or K for billion, million or thousand. We will
create new columns called PROPERTYDAMAGE and CROPDAMAGE
by multiplying the corresponding column by its power of 10.
storms <- storms %>%
mutate(PROPERTYDAMAGE = case_when(
PROPDMGEXP == "B" ~ PROPDMG * 1000000000, # B -> Billion
PROPDMGEXP == "M" ~ PROPDMG * 1000000, # M -> Million
PROPDMGEXP == "K" ~ PROPDMG * 1000, # K -> Thousand
TRUE ~ PROPDMG # Default case in case of any other value
))
storms <- storms %>%
mutate(CROPDAMAGE = case_when(
CROPDMGEXP == "B" ~ CROPDMG * 1000000000, # B -> Billion
CROPDMGEXP == "M" ~ CROPDMG * 1000000, # M -> Million
CROPDMGEXP == "K" ~ CROPDMG * 1000, # K -> Thousand
TRUE ~ CROPDMG # Default case in case of any other value
))
We can now remove all unnecessary columns. We only need a handful.
storms <- storms %>%
select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPERTYDAMAGE, CROPDAMAGE)
storms <- subset(storms, FATALITIES > 0 | INJURIES > 0 | PROPERTYDAMAGE > 0 | CROPDAMAGE > 0)
We still need to clean up the event types. There are 977 unique EVTYPE values in this dataset. We will group them in broader categories
storms$EVTYPE <- toupper(storms$EVTYPE)
#Heat
storms$EVTYPE <- gsub('.*HEAT.*', 'HEAT', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WARM.*', 'HEAT', storms$EVTYPE)
#Cold
storms$EVTYPE <- gsub('.*FREEZ.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*COLD.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LO.*TEMP.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BITTER.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*ICE.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WINT.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*FROST.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*HYPOTH.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*ICY.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*GLAZE.*', 'COLD', storms$EVTYPE)
#Drought
storms$EVTYPE <- gsub('.*DRY.*', 'DROUGHT', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DROUGHT.*', 'DROUGHT', storms$EVTYPE)
#Fire
storms$EVTYPE <- gsub('.*FIRE.*', 'FIRE', storms$EVTYPE)
#Flood
storms$EVTYPE <- gsub('.*FLOOD.*', 'FLOOD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TIDE.*', 'FLOOD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DAM.*', 'FLOOD', storms$EVTYPE)
#Landslide
storms$EVTYPE <- gsub('.*LANDSLIDE.*', 'LANDSLIDE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LANDSLUMP.*', 'LANDSLIDE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SLIDE.*', 'LANDSLIDE', storms$EVTYPE)
#Avalanche
storms$EVTYPE <- gsub('.*AVALANCHE.*', 'AVALANCHE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*AVALANCE.*', 'AVALANCHE', storms$EVTYPE)
#Ocean
storms$EVTYPE <- gsub('.*SURF.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*MARINE.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SEA.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BEACH.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*COASTAL.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WAVE.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*RIP.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TSUNAMI.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SURGE.*', 'OCEAN', storms$EVTYPE)
#Snow
storms$EVTYPE <- gsub('.*SNOW.*', 'SNOW', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BLIZZARD.*', 'SNOW', storms$EVTYPE)
#Tornado
storms$EVTYPE <- gsub('.*NADO.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TORNDAO.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LANDSPOUT.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DUST DEVIL.*', 'TORNADO', storms$EVTYPE)
#Hail
storms$EVTYPE <- gsub('.*HAIL.*', 'HAIL', storms$EVTYPE)
#Hurricane
storms$EVTYPE <- gsub('.*HURRICANE.*', 'HURRICANE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TROPICAL.*', 'HURRICANE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TYPHOON.*', 'HURRICANE', storms$EVTYPE)
#Lightning
storms$EVTYPE <- gsub('.*LIGHTNING.*', 'LIGHTNING', storms$EVTYPE)
storms$EVTYPE <- gsub('.*THUNDER.*', 'LIGHTNING', storms$EVTYPE)
#Volcanic
storms$EVTYPE <- gsub('.*VOLCA.*', 'VOLCANIC', storms$EVTYPE)
#Wind
storms$EVTYPE <- gsub('.*WIND.*', 'WIND', storms$EVTYPE)
#Rain
storms$EVTYPE <- gsub('.*RAIN.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BURST.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SPOUT.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WET.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*PRECIP.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*HAIL.*', 'RAIN', storms$EVTYPE)
##Remove any one-off storm types left over
evtype_counts <- table(storms$EVTYPE)
unique_evtypes <- names(evtype_counts[evtype_counts == 1])
storms <- storms[!storms$EVTYPE %in% unique_evtypes, ]
We want to look at different storm types and their fatalities,
injuries and economic impact. A separate dataframe is made to
analyze these three impacts:
stormFatalities <- storms %>%
group_by(EVTYPE) %>%
summarize(AVGFATAL = mean(FATALITIES), TOTALFATAL = sum(FATALITIES)) %>%
arrange(desc(TOTALFATAL))
stormInjuries <- storms %>%
group_by(EVTYPE) %>%
summarize(AVGINJURIES = mean(INJURIES), TOTALINJURIES = sum(INJURIES)) %>%
arrange(desc(TOTALINJURIES))
stormCost <- storms %>%
group_by(EVTYPE) %>%
summarize(AVGCOST = mean(PROPERTYDAMAGE + CROPDAMAGE), TOTALCOST = sum(PROPERTYDAMAGE + CROPDAMAGE))
Now that we have our tidy data, we want to visualize the 10 worst storms in each category. We will begin with the deadliest storms.
knitr::kable(stormFatalities[0:10,], caption = "10 Deadliest Storm Types")
| EVTYPE | AVGFATAL | TOTALFATAL |
|---|---|---|
| TORNADO | 0.1413171 | 5663 |
| HEAT | 3.2133468 | 3178 |
| FLOOD | 0.0471918 | 1536 |
| LIGHTNING | 0.0147193 | 1018 |
| WIND | 0.0129691 | 948 |
| COLD | 0.2417521 | 872 |
| OCEAN | 0.6574288 | 854 |
| SNOW | 0.1177862 | 249 |
| AVALANCHE | 0.8333333 | 225 |
| HURRICANE | 0.2917271 | 201 |
ggplot(stormFatalities[0:10,], aes(x = reorder(EVTYPE, TOTALFATAL), y = TOTALFATAL, fill = EVTYPE)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "10 Deadliest Storm Types",
x = "Storm",
y = "Total Fatalities") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
As you can see, tordanos kill the most Americans, followed by heat and flooding events. Four more: lightning, wind, cold and oceanic events make up most of the rest of fatalities.
knitr::kable(stormInjuries[0:10,], caption = "10 Most Injurious Storms")
| EVTYPE | AVGINJURIES | TOTALINJURIES |
|---|---|---|
| TORNADO | 2.2820852 | 91450 |
| HEAT | 9.3458038 | 9243 |
| WIND | 0.1204837 | 8807 |
| FLOOD | 0.2646553 | 8614 |
| LIGHTNING | 0.1110886 | 7683 |
| COLD | 1.3146659 | 4742 |
| SNOW | 0.9262062 | 1958 |
| RAIN | 0.0645971 | 1802 |
| HURRICANE | 2.4905660 | 1716 |
| FIRE | 1.2772041 | 1608 |
ggplot(stormInjuries[0:10,], aes(x = reorder(EVTYPE, TOTALINJURIES), y = TOTALINJURIES, fill = EVTYPE)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "10 Most Injurious Storms",
x = "Storm",
y = "Total Injuries") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Similar
results for injuries as seen for flooding, with tornadoes causing the
most damage
knitr::kable(stormCost[0:10,], caption = "10 Costliest Storms")
| EVTYPE | AVGCOST | TOTALCOST |
|---|---|---|
| AVALANCHE | 32302.96 | 8721800 |
| COLD | 5413232.37 | 19525529161 |
| DENSE FOG | 130729.73 | 9674000 |
| DROUGHT | 43551940.87 | 15025419600 |
| DUST STORM | 83970.87 | 8649000 |
| FIRE | 7073002.49 | 8904910130 |
| FLOOD | 5670187.33 | 184553257197 |
| FOG | 122948.60 | 13155500 |
| FUNNEL CLOUD | 14969.23 | 194600 |
| HEAT | 935091.03 | 924805030 |
costliestStorms <- stormCost[0:10, ]
ggplot(stormCost[0:10, ], aes(x = reorder(EVTYPE, TOTALCOST), y = TOTALCOST, fill = EVTYPE)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "10 Costliest Storms",
x = "Storm",
y = "Total Cost ($)") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
We see that flooding is by far the costliest storm type. Almost to an extenct that others do not matter.