This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
In this report we will analysis the NOAA Storm Events data. The raw data start from year 1950 to November 2011, with 902297 records. Each record contains 37 variables, we picked 7 variables that seemed relevent to our analysis from the dataset. By aggregating the data by storm event type, we ultimately produced bar charts showing the top 6 most significant events in term of fatalities, injuries and economic damage caused. Overall, tornadoes are the most hazardous to human health with 5633 reported fatalities and 91346 reported injuries, and floods have been responsible for the most economic damage ($150+ billion).
Load required R libraries and set the global option:
library(reshape2)
suppressMessages(library(dplyr))
library(ggplot2)
options(warn=-1)
Download the National Oceanic and Atmospheric Administration's and read it as csv file:
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile= "NOAA.csv.bz2")
## Error: unsupported URL scheme
storm <- read.csv("NOAA.csv.bz2", header = TRUE, comment.char="", fileEncoding = "ISO-8859-15")
Total population health loss is the total number of Fatalities and Injuries happened from disasters Event Type, so SUM up the “FATALITIES” and “INJURIES” corresponding to each “EVENT TYPE”
healthData <- storm[,c("EVTYPE","FATALITIES","INJURIES")]
sumData <- cbind(healthData$EVTYPE,as.data.frame(apply(healthData[,c(2,3)], 1 ,sum)))
colnames(sumData) <- c("EVTYPE","SUM")
melted <- melt(sumData, id.vars="EVTYPE", measure.var = "SUM" )
totalLoss <- dcast(melted, EVTYPE ~ variable, sum)
totalLoss <- arrange(totalLoss, desc(SUM))
topTenEvent <- head(totalLoss, 10)
Total Economic loss is the total number of Property Damange and Crop Damange happened from disasters Event Type, so SUM up the “PROPDMG” and “CROPDMG” according to the units defined in “PROPDMGEXP” and “CROPDMGEXP” respectively, corresponding to each “EVENT TYPE”
propDMGData <- storm[,c("EVTYPE","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
propDMGData$PROPDMGEXP <- as.character(propDMGData$PROPDMGEXP)
propDMGData$CROPDMGEXP <- as.character(propDMGData$CROPDMGEXP)
# Convert the units of "PROPDMGEXP" and "CROPDMGEXP" into numerical form
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "")] <- 0
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "+") | (propDMGData$PROPDMGEXP == "-") | (propDMGData$PROPDMGEXP == "?")] <- 1
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "h") | (propDMGData$PROPDMGEXP == "H")] <- 2
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "k") | (propDMGData$PROPDMGEXP == "K")] <- 3
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "m") | (propDMGData$PROPDMGEXP == "M")] <- 6
propDMGData$PROPDMGEXP[(propDMGData$PROPDMGEXP == "b") | (propDMGData$PROPDMGEXP == "B")] <- 9
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "")] <- 0
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "+") | (propDMGData$CROPDMGEXP == "-") | (propDMGData$CROPDMGEXP == "?")] <- 1
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "h") | (propDMGData$CROPDMGEXP == "H")] <- 2
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "k") | (propDMGData$CROPDMGEXP == "K")] <- 3
propDMGData$CROPDMGEXP[(propDMGData$CROPDMGEXP == "m") | (propDMGData$CROPDMGEXP == "M")] <- 6
propDMGData$CROPDMGEXP[(propDMGData$PROPDMGEXP == "b") | (propDMGData$CROPDMGEXP == "B")] <- 9
# convert to "PROPDMGEXP" and "CROPDMGEXP" values as integer.
propDMGData$PROPDMGEXP <- as.integer(propDMGData$PROPDMGEXP)
propDMGData$CROPDMGEXP <- as.integer(propDMGData$CROPDMGEXP)
propDMGData$PROPDMG <- propDMGData$PROPDMG * 10^propDMGData$PROPDMGEXP
propDMGData$CROPDMG <- propDMGData$CROPDMG * 10^propDMGData$CROPDMGEXP
sumpropDMGData <- cbind(propDMGData$EVTYPE,as.data.frame(apply(propDMGData[,c(2,3)], 1 ,sum)))
colnames(sumpropDMGData) <- c("EVTYPE","SUM")
meltedProp <- melt(sumpropDMGData, id.vars="EVTYPE", measure.var = "SUM" )
totalPropLoss <- dcast(meltedProp, EVTYPE ~ variable, sum)
totalPropLoss <- arrange(totalPropLoss, desc(SUM))
topTenEventPropDMG <- head(totalPropLoss, 10)
As per the above Data processing below is the graph of total population health loss from top 10 Event Type.
topTenEvent
## EVTYPE SUM
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
qplot(EVTYPE,SUM, data = topTenEvent, xlab= "Event Type", ylab = "Health Loss", stat="identity",color = EVTYPE, geom="bar", fill = EVTYPE) + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
As per the above Data processing below is the graph of total Economic loss from top 10 Event Type.
options(scipen=999)
topTenEventPropDMG
## EVTYPE SUM
## 1 FLOOD 144657766597
## 2 HURRICANE/TYPHOON 69305840420
## 3 TORNADO 56947550175
## 4 STORM SURGE 43323536663
## 5 FLASH FLOOD 16822777370
## 6 HAIL 15735546089
## 7 HURRICANE 11868319637
## 8 TROPICAL STORM 7703892572
## 9 WINTER STORM 6688520210
## 10 HIGH WIND 5270089890
qplot(EVTYPE,SUM, data = topTenEventPropDMG, xlab= "Event Type", ylab = "Economic Loss ", stat="identity",color = EVTYPE, geom="bar", fill = EVTYPE) + theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))
As you can see from previous plots. Tornadoes are most harmful with respect to population health and floods have the greatest economic consequences.