Floods and Tornadoes: An Analysis of NOAA Storm Data

Charles McGuinness

Synopsis

An analysis of NOAA's database of over 900K oberservations was done to determine the leading weather related causes of injury, death, and destruction was performed.

From this analysis, we can see that deaths and injuries by tornadoes are by far the most common while floods are the largest cause of property damage.

Data Processing

The raw data is in a compressed CSV file located at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 The file was downloaded to the local working directory prior to running the analysis.

We begin by loading in the CSV data from the .csv.bz2 file provided from NOAA:

# This can take a while
stormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))

Since we are interested in the number of deaths & injuries and costs by event type, we preprocess the data by adding up the deaths,injuries, and costs by type to get a total for each type:

## Calculate the injuries and deaths by event type -- The aggregate function does precisely that
injuriesByType <- aggregate(stormData$INJURIES,by=list(stormData$EVTYPE),FUN=sum)
deathsByType <- aggregate(stormData$FATALITIES,by=list(stormData$EVTYPE),FUN=sum)

## We add up the two vectors for crop damage and property damage to get a total economic cost for storm damage
## The damage estimates also have "K" or "M" or "B" to indicate a multiplier, so we need to do some manipulation
## on them... it's not pretty, but we have to mutliply the values by 1, 1000, 1000000, or 1000000000
propMult <- 1+(stormData$PROPDMGEXP=="K")*999+(stormData$PROPDMGEXP=="k")*999+
        (stormData$PROPDMGEXP=="M")*999999+(stormData$PROPDMGEXP=="m")*999999+
        (stormData$PROPDMGEXP=="B")*999999999+(stormData$PROPDMGEXP=="b")*999999999
cropMult <- 1+(stormData$CROPDMGEXP=="K")*999+(stormData$CROPDMGEXP=="k")*999+
        (stormData$CROPDMGEXP=="M")*999999+(stormData$CROPDMGEXP=="m")*999999+
        (stormData$CROPDMGEXP=="B")*999999999+(stormData$CROPDMGEXP=="b")*999999999
totalDamage <- stormData$CROPDMG*cropMult + stormData$PROPDMG*propMult
damagesByType <- aggregate(totalDamage,by=list(stormData$EVTYPE),FUN=sum)

Next, we want to determine which storm types are the worst. We do that by ordering the aggregated data. We can look the first in each category to find the absolute worst:

## The order function gives us the indices of of the data in sorted order
## We want to look at the largest damages, so we sort decreasing:
injuriesSortedIndex <- order(injuriesByType$x, decreasing=TRUE)
deathsSortedIndex <- order(deathsByType$x, decreasing=TRUE)
damagesSortedIndex <- order(damagesByType$x, decreasing=TRUE)

## For our summary, we want to get the very worst of each of the 3 types:

worstInjuryType <- as.character(injuriesByType[[1]][head(injuriesSortedIndex,1)])
worstInjuryValue <- as.numeric(injuriesByType[[2]][head(injuriesSortedIndex,1)])
worstDeathType <- as.character(deathsByType[[1]][head(deathsSortedIndex,1)])
worstDeathValue <- as.numeric(deathsByType[[2]][head(deathsSortedIndex,1)])
worstDamageType <- as.character(damagesByType[[1]][head(damagesSortedIndex,1)])
worstDamageValue <- as.numeric(damagesByType[[2]][head(damagesSortedIndex,1)])

Results

Summary data

The event type TORNADO was the most injurious, with 91346 injuries.
The event type TORNADO was the most fatal, with 5633 deaths.
The event type FLOOD was the most expensive, with $1.5032 × 1011 in damages.

Top 5 causes of death from weather events

require(ggplot2)
## Loading required package: ggplot2
## In order to get the factors to plot in order of the values, I have to redo the factors...
deathFactors <- factor(as.character(deathsByType[head(deathsSortedIndex,5),1]), levels = as.character(deathsByType[head(deathsSortedIndex,5),1]))
qplot(deathFactors,deathsByType[head(deathsSortedIndex,5),2],xlab="Type",ylab="Deaths",,main="Leading causes of death due to weather")

plot of chunk deathsPlot

Top 5 causes of injury from weather events

## In order to get the factors to plot in order of the values, I have to redo the factors...
injuryFactors <- factor(as.character(injuriesByType[head(injuriesSortedIndex,5),1]), levels = as.character(injuriesByType[head(injuriesSortedIndex,5),1]))
qplot(injuryFactors,injuriesByType[head(injuriesSortedIndex,5),2],xlab="Type",ylab="Injuries",main="Leading causes of injuries due to weather")

plot of chunk injuriesPlot

Top 5 causes of economic losses from weather events

## In order to get the factors to plot in order of the values, I have to redo the factors...
damageFactors <- factor(as.character(damagesByType[head(damagesSortedIndex,5),1]), levels = as.character(damagesByType[head(damagesSortedIndex,5),1]))
qplot(damageFactors,damagesByType[head(damagesSortedIndex,5),2],xlab="Type",ylab="$ Damages",main="Leading causes of economic damage due to weather")

plot of chunk damagesPlot