Synopsis

Based on NOAA storm database, from 1993 to 2011, the data analysis shows that: - Excessive Heat and Tornado are the major contributor to health impact - Flash Flood and Tornado are the major contributor to economic impact.

Data Processing

Libraries

library(ggplot2)
library(plyr)
library(dplyr)
library(lubridate)
library(reshape2)

Loading data from course website into “data” variable. StormData

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./storm.bz2")
data <- read.csv("./storm.bz2")

Preprocessing data:

  • Date formating and YEAR extraction
  • Cleaning EVTYPE names (to upper cases, with no space at the start of end of names)
  • Grouping by YEAR
data$date <- sapply(data$BGN_DATE, function(x) strsplit(x," ")[[1]][[1]])
data$YEAR <- year(mdy(data$date))
data$EVTYPE <- toupper(data$EVTYPE)
data$EVTYPE <- sapply(data$EVTYPE, function(x) gsub("^[[:space:]]", "",x))
data$EVTYPE <- sapply(data$EVTYPE, function(x) gsub("$[[:space:]]", "",x))

data.peryear <- aggregate(x = data[c("INJURIES","FATALITIES", "PROPDMG", "CROPDMG")],
          by = data["YEAR"],
          FUN = sum, na.rm=TRUE)
data.peryear.allyear <- melt(data.peryear, id=c("YEAR"))

A quick look at the evolution of health and economic data (see following graph) shows two very distincts periods : before and after 1993

ggplot(data = data.peryear.allyear, aes(x=YEAR,y=value)) + geom_bar(stat="identity") + facet_wrap(.~variable, scales = "free") + labs(title = "Health and economic impact between 1953 and 2016")

Our hypothesis is that this is due to a change in data collection methods. So, in the following, only data after 1993 have been considered.

Results

Population health impact is judged by fatalities.

The main contributors to health impact, as shown on following graph, are:

  • Excessive Heat
  • Tornado
  • Flash Flood
data.post1992 <- data[data["YEAR"]>1992, ]
data.perevtype <- aggregate(x = data.post1992[c("INJURIES","FATALITIES", "PROPDMG", "CROPDMG")],
          by = data.post1992["EVTYPE"],
          FUN = sum, na.rm=TRUE)
health.top <- data.perevtype[order(-data.perevtype["FATALITIES"]),][1:10,]
ggplot(data = health.top, aes(x = reorder(EVTYPE, FATALITIES), y = FATALITIES)) + geom_bar(stat="identity") + coord_flip()  + geom_bar(stat="identity") + labs(y = "Number of fatalities", x = "Event Type") + labs(title = "Total Helath impact in USA by weather events in 1993-2011")

Economic impact is judged by property and crop damages.

The main contributors to health impact, as shown on following graph, are:

  • Flash Flood
  • Tornado
  • TSTM Wind
data.perevtype$DMG <-data.perevtype$PROPDMG + data.perevtype$CROPDMG

economic.top <- data.perevtype[order(-data.perevtype["DMG"]),][1:10,]
ggplot(data = economic.top, aes(x = reorder(EVTYPE, DMG), y = DMG)) + geom_bar(stat="identity") + coord_flip()  + geom_bar(stat="identity") + labs(y = "Cost USD", x = "Event Type") + labs(title = "Total Economic impact in USA by weather events in 1993-2011")