Project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

In this report we will analysis the NOAA Storm Events data. The raw data start from year 1950 to November 2011, with 902297 records. Each record contains 37 variables, we picked 7 variables that seemed relevent to our analysis from the dataset. By aggregating the data by storm event type, we ultimately produced bar charts showing the top 6 most significant events in term of fatalities, injuries and economic damage caused. Overall, tornadoes are the most hazardous to human health with 5633 reported fatalities and 91346 reported injuries, and floods have been responsible for the most economic damage ($150+ billion).

Data Processing

Load required R libraries and set the global option:

library(reshape2)
suppressMessages(library(dplyr))
library(ggplot2)
options(warn=-1)

Download the National Oceanic and Atmospheric Administration's and read it as csv file:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile= "NOAA.csv.bz2")
## Error: unsupported URL scheme
storm <- read.csv("NOAA.csv.bz2", header = TRUE,  comment.char="", fileEncoding = "ISO-8859-15")

Total population health loss is the total number of Fatalities and Injuries happened from disasters Event Type, so SUM up the “FATALITIES” and “INJURIES” corresponding to each “EVENT TYPE”

healthData <- storm[,c("EVTYPE","FATALITIES","INJURIES")]

sumData <- cbind(healthData$EVTYPE,as.data.frame(apply(healthData[,c(2,3)], 1 ,sum)))

colnames(sumData) <- c("EVTYPE","SUM")

melted <- melt(sumData, id.vars="EVTYPE", measure.var = "SUM" )
totalLoss <- dcast(melted, EVTYPE ~ variable, sum)

totalLoss <- arrange(totalLoss, desc(SUM))
topTenEvent <- head(totalLoss, 10)

Total Economic loss is the total number of Property Damange and Crop Damange happened from disasters Event Type, so SUM up the “PROPDMG” and “CROPDMG” according to the units defined in “PROPDMGEXP” and “CROPDMGEXP” respectively, corresponding to each “EVENT TYPE”

propDMGData <- storm[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]


propDMGData$PROPDMGEXP <- as.character(propDMGData$PROPDMGEXP)
propDMGData$CROPDMGEXP <- as.character(propDMGData$CROPDMGEXP)

# Convert the units of "PROPDMGEXP" and "CROPDMGEXP" into numerical form
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "")] <- 0
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "+") | (propDMGData$PROPDMGEXP == "-") | (propDMGData$PROPDMGEXP == "?")] <- 1
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "h") | (propDMGData$PROPDMGEXP == "H")] <- 2
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "k") | (propDMGData$PROPDMGEXP == "K")] <- 3
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "m") | (propDMGData$PROPDMGEXP == "M")] <- 6
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "b") | (propDMGData$PROPDMGEXP == "B")] <- 9

propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "")] <- 0
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "+") | (propDMGData$CROPDMGEXP == "-") | (propDMGData$CROPDMGEXP == "?")] <- 1
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "h") | (propDMGData$CROPDMGEXP == "H")] <- 2
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "k") | (propDMGData$CROPDMGEXP == "K")] <- 3
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "m") | (propDMGData$CROPDMGEXP ==  "M")] <- 6
propDMGData$CROPDMGEXP[(propDMGData$PROPDMGEXP == "b") | (propDMGData$CROPDMGEXP == "B")] <- 9

# convert to "PROPDMGEXP" and "CROPDMGEXP" values as integer.
propDMGData$PROPDMGEXP <- as.integer(propDMGData$PROPDMGEXP)
propDMGData$CROPDMGEXP <- as.integer(propDMGData$CROPDMGEXP)

propDMGData$PROPDMG <- propDMGData$PROPDMG * 10^propDMGData$PROPDMGEXP
propDMGData$CROPDMG <- propDMGData$CROPDMG * 10^propDMGData$CROPDMGEXP

sumpropDMGData <- cbind(propDMGData$EVTYPE,as.data.frame(apply(propDMGData[,c(2,3)], 1 ,sum)))
colnames(sumpropDMGData) <- c("EVTYPE","SUM")

meltedProp <- melt(sumpropDMGData, id.vars="EVTYPE", measure.var = "SUM" )
totalPropLoss <- dcast(meltedProp, EVTYPE ~ variable, sum)

totalPropLoss <- arrange(totalPropLoss, desc(SUM))
topTenEventPropDMG <- head(totalPropLoss, 10)

Results

Health Loss

As per the above Data processing below is the graph of total population health loss from top 10 Event Type.

topTenEvent
##               EVTYPE   SUM
## 1            TORNADO 96979
## 2     EXCESSIVE HEAT  8428
## 3          TSTM WIND  7461
## 4              FLOOD  7259
## 5          LIGHTNING  6046
## 6               HEAT  3037
## 7        FLASH FLOOD  2755
## 8          ICE STORM  2064
## 9  THUNDERSTORM WIND  1621
## 10      WINTER STORM  1527
qplot(EVTYPE,SUM, data = topTenEvent, xlab= "Event Type", ylab = "Health Loss", stat="identity",color = EVTYPE, geom="bar", fill = EVTYPE) + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

plot of chunk unnamed-chunk-6

Economic Loss

As per the above Data processing below is the graph of total Economic loss from top 10 Event Type.

options(scipen=999)
topTenEventPropDMG
##               EVTYPE          SUM
## 1              FLOOD 144657766597
## 2  HURRICANE/TYPHOON  69305840420
## 3            TORNADO  56947550175
## 4        STORM SURGE  43323536663
## 5        FLASH FLOOD  16822777370
## 6               HAIL  15735546089
## 7          HURRICANE  11868319637
## 8     TROPICAL STORM   7703892572
## 9       WINTER STORM   6688520210
## 10         HIGH WIND   5270089890
qplot(EVTYPE,SUM, data = topTenEventPropDMG, xlab= "Event Type", ylab = "Economic Loss ", stat="identity",color = EVTYPE, geom="bar", fill = EVTYPE) + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))

plot of chunk unnamed-chunk-8

Summary

As you can see from previous plots. Tornadoes are most harmful with respect to population health and floods have the greatest economic consequences.