Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Configure the environment

library(ggplot2)

1: Code for reading in the dataset and processing the data.

1.a: Load the Data

NSD <- read.csv(bzfile("data/StormData.csv.bz2"))

1.b: Apply the filter to only select certain fields.

NSDR <- data.frame(NSD$EVTYPE, NSD$FATALITIES, NSD$INJURIES, NSD$PROPDMG, NSD$PROPDMGEXP, NSD$CROPDMG, NSD$CROPDMGEXP)

1.c: Clean the Data by applying the following trasformations.

  • Empty values will be replaced with zeros
  • Damages values in USD will be normalised
  • Data will be converted to numeric data types
# replace missing NSD with 0's
NSDR$NSD.FATALITIES[(NSDR$NSD.FATALITIES == "")] <- 0
NSDR$NSD.INJURIES[(NSDR$NSD.INJURIES == "")] <- 0
NSDR$NSD.PROPDMG[(NSDR$NSD.PROPDMG == "")] <- 0
NSDR$NSD.CROPDMG[(NSDR$NSD.CROPDMG == "")] <- 0

# Convert to characters
NSDR$NSD.PROPDMGEXP <- as.character(NSDR$NSD.PROPDMGEXP)
NSDR$NSD.CROPDMGEXP <- as.character(NSDR$NSD.CROPDMGEXP)

# Transform to uniform numbers to allow to calculate 
# a baseline on numbers.

# Field Definition: Measure of property damage in (thousands, millions USD, etc.)
# clean and stabalize
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "")] <- 0
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "+")] <- 1
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "-")] <- 1
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "?")] <- 1
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "h")] <- 2
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "H")] <- 2
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "k")] <- 3
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "K")] <- 3
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "m")] <- 6
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "M")] <- 6
NSDR$NSD.PROPDMGEXP[(NSDR$NSD.PROPDMGEXP == "B")] <- 9

# Field Definition: Measure of crop damage in (thousands, millions USD, etc.)
# clean and stabalize
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "")] <- 0
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "B")] <- 9
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "+")] <- 1
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "-")] <- 1
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "?")] <- 1
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "h")] <- 2
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "H")] <- 2
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "k")] <- 3
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "K")] <- 3
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "m")] <- 6
NSDR$NSD.CROPDMGEXP[(NSDR$NSD.CROPDMGEXP == "M")] <- 6

# Now convert to integer so that we can create a 
# transformation on base numbers.
NSDR$NSD.PROPDMGEXP <- as.integer(NSDR$NSD.PROPDMGEXP)
NSDR$NSD.CROPDMGEXP <- as.integer(NSDR$NSD.CROPDMGEXP)

2: Aggregation Section

## Calculate Total Damage of each event using numbers and tranformation
totalDamageUSD <- NSDR$NSD.PROPDMG * 10^NSDR$NSD.PROPDMGEXP + NSDR$NSD.CROPDMG * 10^NSDR$NSD.CROPDMGEXP
NSDR <- cbind(NSDR, totalDamageUSD)
NSDR <- NSDR[,c(1,2,3,8)]

## Summarize Values
NSDRAggData <-aggregate(. ~ NSD.EVTYPE, data = NSDR, FUN=sum)

## Get the top 10 items of Fatalities, Injuries, and Damages
stormTopFatalities <- head(NSDRAggData[order(NSDRAggData$NSD.FATALITIES,decreasing=T),],10)
stormTopInjuries   <- head(NSDRAggData[order(NSDRAggData$NSD.INJURIES,decreasing=T),],10)
stormTopDamages    <- head(NSDRAggData[order(NSDRAggData$totalDamageUSD,decreasing=T),],10)

3 Results of Analysis

3.1 Top 10 events that are most harmful with respect to population health?

ggplot(data = stormTopFatalities, aes(x = stormTopFatalities$NSD.EVTYPE, y = stormTopFatalities$NSD.FATALITIES)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + ylab("# of Fatalities") + ggtitle("NOAA Top 10: Highest Fatality Counts, 1950-2011")

3.2 Top 10 events that cause the highest number of injuries.

ggplot(data = stormTopInjuries, aes(x = stormTopInjuries$NSD.EVTYPE, y = stormTopInjuries$NSD.INJURIES)) + geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +  ylab("# of Injuries") + ggtitle("NOAA Top 10: Highest Injuries Counts, 1950-2011")

3.3 Tornados appear to be the most harmful. What are the counts for Fatalities and Injuries.

stormTopFatalities[stormTopFatalities$NSD.EVTYPE=="TORNADO",c(1,2,3)]
##     NSD.EVTYPE NSD.FATALITIES NSD.INJURIES
## 834    TORNADO           5633        91346

3.4 Show the cost of damages of the Top 10 most expensive types of events.

ggplot(data = stormTopDamages, aes(x = stormTopDamages$NSD.EVTYPE, y = stormTopDamages$totalDamageUSD)) + geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") + 
  ylab("Damages in USD") + ggtitle("NOAA Top 10: Most Expensive Type of Events, 1950-2011")

4 Total Flood Damage

4.1 Top Flood Damage

stormTopFatalities[stormTopFatalities$NSD.EVTYPE=="FLOOD",c(1,4)]
##     NSD.EVTYPE totalDamageUSD
## 170      FLOOD   150319678257