Reproducible Research Project 2: NOAA Storm Data

Synopsis

Storms exact massive tolls on the United States every year. This report aims to quantify the effects to inform policy decision.
We find that flooding stands out as the most damaging, with tornados, cold weather and fires as top contributors as well.

We begin with the Storm Database from the National Oceanic and Atmospheric Administration, which has tabulated over 250,000
storms between 1950 and 2011. We break effects from storms down into three areas: fatalities, injuries and cost of damage. Some
subjectivity was required when sorting storms by type as the NOAA data included many similar storm type names. The three deadliest
storm types were: tornadoes, heat events and flooding. The three most injurious were: flooding, cold events and fires. And finally, the costliest were: drought, fires and cold events.

Data Processing

The data is available from NOAA and needs to be downloaded and read into an object called ‘storms’:

library(tidyverse)
library(R.utils)
knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE, echo=TRUE, cache.lazy=FALSE, fig.path='figures/')

setwd("C:/Users/m304650/Documents/RStudio Projects/Reproducible-Research-Activity-Project")

if (!file.exists("repdata_data_StormData.csv.bz2"))
{
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "repdata_data_StormData.csv.bz2")
}

if (!file.exists("repdata_data_StormData.csv"))
{
        bunzip2("./repdata_data_StormData.csv.bz2", remove=FALSE)
}


storms <- read_csv("repdata_data_StormData.csv", show_col_types=FALSE)

Now that the data has been read in, it needs to be tidied.

First we need to clean up the property and crop damage. The values are stored in two columns. The first a number from 1-999, then a separate column of B, M or K for billion, million or thousand. We will create new columns called PROPERTYDAMAGE and CROPDAMAGE
by multiplying the corresponding column by its power of 10.

storms <- storms %>%
        mutate(PROPERTYDAMAGE = case_when(
                PROPDMGEXP == "B" ~ PROPDMG * 1000000000,  # B -> Billion
                PROPDMGEXP == "M" ~ PROPDMG * 1000000,     # M -> Million
                PROPDMGEXP == "K" ~ PROPDMG * 1000,        # K -> Thousand
                TRUE ~ PROPDMG                       # Default case in case of any other value
))

storms <- storms %>%
        mutate(CROPDAMAGE = case_when(
                CROPDMGEXP == "B" ~ CROPDMG * 1000000000,  # B -> Billion
                CROPDMGEXP == "M" ~ CROPDMG * 1000000,     # M -> Million
                CROPDMGEXP == "K" ~ CROPDMG * 1000,        # K -> Thousand
                TRUE ~ CROPDMG                            # Default case in case of any other value
))

We can now remove all unnecessary columns. We only need a handful.

storms <- storms %>%
        select(BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPERTYDAMAGE, CROPDAMAGE)

storms <- subset(storms, FATALITIES > 0 | INJURIES > 0 | PROPERTYDAMAGE > 0 | CROPDAMAGE > 0)

We still need to clean up the event types. There are 977 unique EVTYPE values in this dataset. We will group them in broader categories

storms$EVTYPE <- toupper(storms$EVTYPE)

#Heat
storms$EVTYPE <- gsub('.*HEAT.*', 'HEAT', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WARM.*', 'HEAT', storms$EVTYPE)

#Cold
storms$EVTYPE <- gsub('.*FREEZ.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*COLD.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LO.*TEMP.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BITTER.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*ICE.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WINT.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*FROST.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*HYPOTH.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*ICY.*', 'COLD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*GLAZE.*', 'COLD', storms$EVTYPE)


#Drought
storms$EVTYPE <- gsub('.*DRY.*', 'DROUGHT', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DROUGHT.*', 'DROUGHT', storms$EVTYPE)

#Fire
storms$EVTYPE <- gsub('.*FIRE.*', 'FIRE', storms$EVTYPE)


#Flood
storms$EVTYPE <- gsub('.*FLOOD.*', 'FLOOD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TIDE.*', 'FLOOD', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DAM.*', 'FLOOD', storms$EVTYPE)

#Landslide
storms$EVTYPE <- gsub('.*LANDSLIDE.*', 'LANDSLIDE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LANDSLUMP.*', 'LANDSLIDE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SLIDE.*', 'LANDSLIDE', storms$EVTYPE)

#Avalanche
storms$EVTYPE <- gsub('.*AVALANCHE.*', 'AVALANCHE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*AVALANCE.*', 'AVALANCHE', storms$EVTYPE)

#Ocean
storms$EVTYPE <- gsub('.*SURF.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*MARINE.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SEA.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BEACH.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*COASTAL.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WAVE.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*RIP.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TSUNAMI.*', 'OCEAN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SURGE.*', 'OCEAN', storms$EVTYPE)

#Snow
storms$EVTYPE <- gsub('.*SNOW.*', 'SNOW', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BLIZZARD.*', 'SNOW', storms$EVTYPE)

#Tornado
storms$EVTYPE <- gsub('.*NADO.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TORNDAO.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*LANDSPOUT.*', 'TORNADO', storms$EVTYPE)
storms$EVTYPE <- gsub('.*DUST DEVIL.*', 'TORNADO', storms$EVTYPE)

#Hail
storms$EVTYPE <- gsub('.*HAIL.*', 'HAIL', storms$EVTYPE)

#Hurricane
storms$EVTYPE <- gsub('.*HURRICANE.*', 'HURRICANE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TROPICAL.*', 'HURRICANE', storms$EVTYPE)
storms$EVTYPE <- gsub('.*TYPHOON.*', 'HURRICANE', storms$EVTYPE)

#Lightning
storms$EVTYPE <- gsub('.*LIGHTNING.*', 'LIGHTNING', storms$EVTYPE)
storms$EVTYPE <- gsub('.*THUNDER.*', 'LIGHTNING', storms$EVTYPE)


#Volcanic
storms$EVTYPE <- gsub('.*VOLCA.*', 'VOLCANIC', storms$EVTYPE)

#Wind
storms$EVTYPE <- gsub('.*WIND.*', 'WIND', storms$EVTYPE)

#Rain
storms$EVTYPE <- gsub('.*RAIN.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*BURST.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*SPOUT.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*WET.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*PRECIP.*', 'RAIN', storms$EVTYPE)
storms$EVTYPE <- gsub('.*HAIL.*', 'RAIN', storms$EVTYPE)

##Remove any one-off storm types left over
evtype_counts <- table(storms$EVTYPE)
unique_evtypes <- names(evtype_counts[evtype_counts == 1])
storms <- storms[!storms$EVTYPE %in% unique_evtypes, ]

Data Analysis

We want to look at different storm types and their fatalities, injuries and economic impact. A separate dataframe is made to
analyze these three impacts:

stormFatalities <- storms %>%
        group_by(EVTYPE) %>%
        summarize(AVGFATAL = mean(FATALITIES), TOTALFATAL = sum(FATALITIES)) %>%
        arrange(desc(TOTALFATAL))

stormInjuries <- storms %>%
        group_by(EVTYPE) %>%
        summarize(AVGINJURIES = mean(INJURIES), TOTALINJURIES = sum(INJURIES)) %>%
        arrange(desc(TOTALINJURIES))


stormCost <- storms %>%
        group_by(EVTYPE) %>%
        summarize(AVGCOST = mean(PROPERTYDAMAGE + CROPDAMAGE), TOTALCOST = sum(PROPERTYDAMAGE + CROPDAMAGE)) 

Results

Now that we have our tidy data, we want to visualize the 10 worst storms in each category. We will begin with the deadliest storms.

Storms by Fatalities

knitr::kable(stormFatalities[0:10,], caption = "10 Deadliest Storm Types")
10 Deadliest Storm Types
EVTYPE AVGFATAL TOTALFATAL
TORNADO 0.1413171 5663
HEAT 3.2133468 3178
FLOOD 0.0471918 1536
LIGHTNING 0.0147193 1018
WIND 0.0129691 948
COLD 0.2417521 872
OCEAN 0.6574288 854
SNOW 0.1177862 249
AVALANCHE 0.8333333 225
HURRICANE 0.2917271 201
ggplot(stormFatalities[0:10,], aes(x = reorder(EVTYPE, TOTALFATAL), y = TOTALFATAL, fill = EVTYPE)) +
        geom_bar(stat = "identity") +
        coord_flip() +  
        labs(title = "10 Deadliest Storm Types",
                x = "Storm",
                y = "Total Fatalities") +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

As you can see, tordanos kill the most Americans, followed by heat and flooding events. Four more: lightning, wind, cold and oceanic events make up most of the rest of fatalities.

Storms by Injuries

knitr::kable(stormInjuries[0:10,], caption = "10 Most Injurious Storms")
10 Most Injurious Storms
EVTYPE AVGINJURIES TOTALINJURIES
TORNADO 2.2820852 91450
HEAT 9.3458038 9243
WIND 0.1204837 8807
FLOOD 0.2646553 8614
LIGHTNING 0.1110886 7683
COLD 1.3146659 4742
SNOW 0.9262062 1958
RAIN 0.0645971 1802
HURRICANE 2.4905660 1716
FIRE 1.2772041 1608
ggplot(stormInjuries[0:10,], aes(x = reorder(EVTYPE, TOTALINJURIES), y = TOTALINJURIES, fill = EVTYPE)) +
        geom_bar(stat = "identity") +
        coord_flip() +  
        labs(title = "10 Most Injurious Storms",
                x = "Storm",
                y = "Total Injuries") +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

Similar results for injuries as seen for flooding, with tornadoes causing the most damage

Costliest Storms

knitr::kable(stormCost[0:10,], caption = "10 Costliest Storms")
10 Costliest Storms
EVTYPE AVGCOST TOTALCOST
AVALANCHE 32302.96 8721800
COLD 5413232.37 19525529161
DENSE FOG 130729.73 9674000
DROUGHT 43551940.87 15025419600
DUST STORM 83970.87 8649000
FIRE 7073002.49 8904910130
FLOOD 5670187.33 184553257197
FOG 122948.60 13155500
FUNNEL CLOUD 14969.23 194600
HEAT 935091.03 924805030
costliestStorms <- stormCost[0:10, ]
        ggplot(stormCost[0:10, ], aes(x = reorder(EVTYPE, TOTALCOST), y = TOTALCOST, fill = EVTYPE)) +
        geom_bar(stat = "identity") +
        coord_flip() +  
        labs(title = "10 Costliest Storms",
                x = "Storm",
                y = "Total Cost ($)") +
        theme_minimal() +
        theme(axis.text.x = element_text(angle = 45, hjust = 1))

We see that flooding is by far the costliest storm type. Almost to an extenct that others do not matter.