Reproducible Assignment 2 - NOAA Storm Data Analysis

Zhizheng Wang

Synopsis

This is the R Markdown document for the Reproducible Research Assignment 2. The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. Questions are basically, 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Before we go into the data analysis, we need to download the data from the url and unzip it. We do it outside the Rstudio and we go directly with our csv file. After loading the file, refering to the documents, since we only analyze the health and economical impact of the disasters, we only need 7 variables which are related to these questions:

opts_chunk$set(echo=TRUE, results = "asis")
#load the data
#url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
setwd("C:/Users/User/Documents/R/rep")
storm_full <- read.csv("repdata-data-StormData.csv",  header = TRUE)

my_data <- data.frame(storm_full$EVTYPE, storm_full$FATALITIES, storm_full$INJURIES, 
                      storm_full$PROPDMG, storm_full$PROPDMGEXP, storm_full$CROPDMG, 
                      storm_full$CROPDMGEXP)

We first analyze the public health aspect, among which two import variables are analyzed, population killed and population injured. We group all the events based on the event type and sort them from mostly deadly to least.

#top 10 fatality/injury event(classed by different type)
library(plyr)
cdata <- ddply(my_data, c("storm_full.EVTYPE"), summarise,
               sum_fatal  = sum(storm_full.FATALITIES, na.rm=TRUE),
               sum_injury = sum(storm_full.INJURIES, na.rm=TRUE))
colnames(cdata) = c("event.type", "fatality.total", "injury.total")
cdata.sorted1 <- cdata[order(-cdata$fatality.total),] 
topcdata.fatality <- cdata.sorted1[1:10,]
cdata.sorted2 <- cdata[order(-cdata$injury.total),] 
topcdata.injury <- cdata.sorted2[1:10,]

Then, we continue to analyze the natural disaster with greatest economic consequences. We combine the property damage as well as the crop damage and the trick part here is they are measured in different scale or magnitude. So we need unit them into the same scale. There we carry out the following prepocessing. For those with missing value, we give it 0 and for those with - or +, we give them 1 in scale. For the other, we give them according to their short form. For example, k to 10 to power 3, m to 10 to power 6 and B to 10 to power 9 etc.

#top disasters with high economic loss
#table(my_data$storm_full.PROPDMGEXP)
#table(my_data$storm_full.CROPDMGEXP)
my_data$storm_full.PROPDMGEXP <- as.character(my_data$storm_full.PROPDMGEXP)
my_data$storm_full.CROPDMGEXP <- as.character(my_data$storm_full.CROPDMGEXP)

my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "")] <- 0
my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "+") | (my_data$storm_full.PROPDMGEXP == 
                                                          "-") | (my_data$storm_full.PROPDMGEXP == "?")] <- 1
my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "h") | (my_data$storm_full.PROPDMGEXP == 
                                                          "H")] <- 2
my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "k") | (my_data$storm_full.PROPDMGEXP == 
                                                          "K")] <- 3
my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "m") | (my_data$storm_full.PROPDMGEXP == 
                                                          "M")] <- 6
my_data$storm_full.PROPDMGEXP[(my_data$storm_full.PROPDMGEXP == "B")] <- 9

my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "")] <- 0
my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "+") | (my_data$storm_full.CROPDMGEXP == 
                                                          "-") | (my_data$storm_full.CROPDMGEXP == "?")] <- 1
my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "h") | (my_data$storm_full.CROPDMGEXP == 
                                                          "H")] <- 2
my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "k") | (my_data$storm_full.CROPDMGEXP == 
                                                          "K")] <- 3
my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "m") | (my_data$storm_full.CROPDMGEXP == 
                                                          "M")] <- 6
my_data$storm_full.CROPDMGEXP[(my_data$storm_full.CROPDMGEXP == "B")] <- 9

# convert to integer for computation
my_data$storm_full.PROPDMGEXP <- as.integer(my_data$storm_full.PROPDMGEXP)
my_data$storm_full.CROPDMGEXP <- as.integer(my_data$storm_full.CROPDMGEXP)

The same analyze goes here where we group by event type and sort the loss from greatest to least.

my_data$damage_total <- my_data$storm_full.PROPDMG * 10 ^ my_data$storm_full.PROPDMGEXP +
  my_data$storm_full.CROPDMG * 10 ^ my_data$storm_full.CROPDMGEXP
ddata <- ddply(my_data, c("storm_full.EVTYPE"), summarise,
               sum_damage_total  = sum(damage_total, na.rm=TRUE))
colnames(ddata) = c("event.type", "damage_total")
ddata.sorted <- ddata[order(-ddata$damage_total),] 
topddata.damage <- ddata.sorted[1:10,]

Results

Regarding the most deadly and injurying disasters, we plot two plots together. As we can see, tornado, heat, and flood, lightning are among the top 10.

par(mfrow=c(2,1))
barinfo1<-barplot(topcdata.fatality$fatality.total, names.arg = topcdata.fatality$event.type,
                  space = 0.4,  xlab="Event Type", ylab="Numbers Killed", 
                  main="Most Deadly Natural Disasters", col="lightgreen")
barinfo2<-barplot(topcdata.injury$injury.total, names.arg = topcdata.injury$event.type, 
                  space = 0.4,  xlab="Event Type", ylab="Numbers Injured", 
                  main="Most Deadly Natural Disasters II", col="lightgreen")

plot of chunk summary1

Regarding the disasters with most economic damage, we plot one plot. As we can see, flood, hurricane and typhoon and then tornado sitting the top 3.

barinfo3<-barplot(height = topddata.damage$damage_total, names.arg = topddata.damage$event.type, 
                  space = 0.4,  xlab="Event Type", ylab="Economic Loss (Dollar)", 
                  main="Most Economically Damaged Disasters", col="lightgreen")

plot of chunk summary2