Based on NOAA storm database, from 1993 to 2011, the data analysis shows that: - Excessive Heat and Tornado are the major contributor to health impact - Flash Flood and Tornado are the major contributor to economic impact.
Libraries
library(ggplot2)
library(plyr)
library(dplyr)
library(lubridate)
library(reshape2)
Loading data from course website into “data” variable. StormData
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./storm.bz2")
data <- read.csv("./storm.bz2")
data$date <- sapply(data$BGN_DATE, function(x) strsplit(x," ")[[1]][[1]])
data$YEAR <- year(mdy(data$date))
data$EVTYPE <- toupper(data$EVTYPE)
data$EVTYPE <- sapply(data$EVTYPE, function(x) gsub("^[[:space:]]", "",x))
data$EVTYPE <- sapply(data$EVTYPE, function(x) gsub("$[[:space:]]", "",x))
data.peryear <- aggregate(x = data[c("INJURIES","FATALITIES", "PROPDMG", "CROPDMG")],
by = data["YEAR"],
FUN = sum, na.rm=TRUE)
data.peryear.allyear <- melt(data.peryear, id=c("YEAR"))
A quick look at the evolution of health and economic data (see following graph) shows two very distincts periods : before and after 1993
ggplot(data = data.peryear.allyear, aes(x=YEAR,y=value)) + geom_bar(stat="identity") + facet_wrap(.~variable, scales = "free") + labs(title = "Health and economic impact between 1953 and 2016")
Our hypothesis is that this is due to a change in data collection methods. So, in the following, only data after 1993 have been considered.
data.post1992 <- data[data["YEAR"]>1992, ]
data.perevtype <- aggregate(x = data.post1992[c("INJURIES","FATALITIES", "PROPDMG", "CROPDMG")],
by = data.post1992["EVTYPE"],
FUN = sum, na.rm=TRUE)
health.top <- data.perevtype[order(-data.perevtype["FATALITIES"]),][1:10,]
ggplot(data = health.top, aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) + geom_bar(stat="identity") + coord_flip() + geom_bar(stat="identity") + labs(y = "Number of fatalities", x = "Event Type") + labs(title = "Total Helath impact in USA by weather events in 1993-2011")
data.perevtype$DMG <-data.perevtype$PROPDMG + data.perevtype$CROPDMG
economic.top <- data.perevtype[order(-data.perevtype["DMG"]),][1:10,]
ggplot(data = economic.top, aes(x = reorder(EVTYPE, DMG), y = DMG)) + geom_bar(stat="identity") + coord_flip() + geom_bar(stat="identity") + labs(y = "Cost USD", x = "Event Type") + labs(title = "Total Economic impact in USA by weather events in 1993-2011")