Reproducible Research Peer Assignment 2

Title: To priortise resource for harmful event such as Tornado, thunderstrom wind, flood, excessive heat while flood, hurricane, tornado, storm surge and hail have the most economic consequences.

1. Synopsis

In this assignment, we analyzed the data of natural events from he U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. We first read the data and clean up some event types by looking into the cookbook.

Then we aggregate the fatality, body injury, property damag, and crop damage by using the aggregate function according to different event types. With data processing and analyzing, we summarized the most harmful events to human health and the events have strongest damage to property and crop by table and figures.

The results are tornado, thunderstrom wind, flood, excessive heat are the most harmful events to human health while while flood, hurricane, tornado, storm surge and hail have the most economic consequences.

2. Data processing

Firts, we are going to do the data processing.

if (!file.exists("../ProjectData2/FStormData.csv.bz2")) {
    dir.create("../ProjectData2")
    download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
        destfile = "../FStormData.csv.bz2", method = "auto")
}
data <- read.csv("../ProjectData2/FStormData.csv.bz2")
data$EVTYPE = toupper(data$EVTYPE)
str(data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

From the page 6 of the cookbook, we found there are several types of events which are represented in more than one names, such as “TSTM WIND” and “TUNDERSTORM WIND”. Therefore, we change the name of these events as following.

data$EVTYPE[data$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
data$EVTYPE[data$EVTYPE == "THUNDERSTORM WINDS"] <- "THUNDERSTORM WIND"
data$EVTYPE[data$EVTYPE == "RIVER FLOOD"] <- "FLOOD"
data$EVTYPE[data$EVTYPE == "HURRICANE/TYPHOON"] <- "HURRICANE-TYPHOON"
data$EVTYPE[data$EVTYPE == "HURRICANE"] <- "HURRICANE-TYPHOON"

3. Results

3.1 The most harmful events to human healths

First, let's revisit the information of this data set in the second part.

We find that this sets data counts four types of damage: fatality (FATALITIES) , injury (INJURIES), property damage (PROPDMG) and crop damage (CROPDMG) , while the latter two should be calculated with magnitude, PROPDMGEXP and CROPDMGEXP. Since only the first two items related to human health directly, we summarized these two types of data here.

library(plyr)
library(ggplot2)
fatal <- ddply(data, c("EVTYPE"), function(x) apply(x[23], 2, sum))
fatal <- fatal[order(fatal$FATALITIES, decreasing = T), ]
fatal <- fatal[1:10, ]
qplot(EVTYPE, data = fatal, geom = "bar", weight = FATALITIES, ylab = "FATALITIES", 
    fill = EVTYPE)

plot of chunk unnamed-chunk-3

The code above aggregates the fatality data by event type and rank them in the decreasing order. We can find that tornado and excessive heat are two of the most events which cause fatality in the past years since 1950. Next, we will summary the data of injury.

library(plyr)
library(ggplot2)
injury <- ddply(data, c("EVTYPE"), function(x) apply(x[24], 2, sum))
injury <- injury[order(injury$INJURIES, decreasing = T), ]
injury <- injury[1:10, ]
qplot(EVTYPE, data = injury, geom = "bar", weight = INJURIES, ylab = "INJURIES", 
    fill = EVTYPE)

plot of chunk unnamed-chunk-4

We can also find what events cause both major fatality and body injury.

intersect(fatal[1:10, 1], injury[1:10, 1])
## [1] "TORNADO"           "EXCESSIVE HEAT"    "FLASH FLOOD"      
## [4] "HEAT"              "LIGHTNING"         "THUNDERSTORM WIND"
## [7] "FLOOD"

There are 7 types of events which are listed in top 10 of fatality and body injury. Definitely, tornado is the most harmful event to human health while others uinclude excceise heat, flash flood, and thunderstorm wind.

3.2 The most harmful events to properties

In this chapter, we try to summarize the property damage and crop damage caused by these natural events.

unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

From the page 12 of the cookbook, the letter “K” stands for thousands, while “M” for millions and “B” for billions. however, we find both upper and lower case from these letters. The first thing we need to do is to transform the exponential terms back into actual values.

data[data$PROPDMGEXP == "K", ]$PROPDMG <- data[data$PROPDMGEXP == "K", ]$PROPDMG * 
    1000
data[data$PROPDMGEXP == "M", ]$PROPDMG <- data[data$PROPDMGEXP == "M", ]$PROPDMG * 
    1e+06
data[data$PROPDMGEXP == "m", ]$PROPDMG <- data[data$PROPDMGEXP == "m", ]$PROPDMG * 
    1e+06
data[data$PROPDMGEXP == "B", ]$PROPDMG <- data[data$PROPDMGEXP == "B", ]$PROPDMG * 
    1e+09
data[data$CROPDMGEXP == "K", ]$CROPDMG <- data[data$CROPDMGEXP == "K", ]$CROPDMG * 
    1000
data[data$CROPDMGEXP == "k", ]$CROPDMG <- data[data$CROPDMGEXP == "k", ]$CROPDMG * 
    1000
data[data$CROPDMGEXP == "M", ]$CROPDMG <- data[data$CROPDMGEXP == "M", ]$CROPDMG * 
    1e+06
data[data$CROPDMGEXP == "m", ]$CROPDMG <- data[data$CROPDMGEXP == "m", ]$CROPDMG * 
    1e+06
data[data$CROPDMGEXP == "B", ]$CROPDMG <- data[data$CROPDMGEXP == "B", ]$CROPDMG * 
    1e+09

Therefore, we can aggregate the propery damage and crop damage by event types and rank them in decreasing order.

library(plyr)
library(ggplot2)
damage <- ddply(data, c("EVTYPE"), function(x) apply(x[25], 2, sum))
damage <- damage[order(damage$PROPDMG, decreasing = T), ]
damage <- damage[1:10, ]
qplot(EVTYPE, data = damage, geom = "bar", weight = PROPDMG, ylab = "PROPDMG", 
    fill = EVTYPE)

plot of chunk unnamed-chunk-8

We find the flood is the most harmful event as regarding to property damage, while the second most harmful event is hurricane(typhoon).

library(plyr)
library(ggplot2)
cropdmg <- ddply(data, c("EVTYPE"), function(x) apply(x[27], 2, sum))
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = T), ]
cropdmg <- cropdmg[1:10, ]
head(cropdmg)
##                EVTYPE   CROPDMG
## 84            DROUGHT 1.397e+10
## 154             FLOOD 1.069e+10
## 364 HURRICANE-TYPHOON 5.350e+09
## 386         ICE STORM 5.022e+09
## 212              HAIL 3.026e+09
## 138       FLASH FLOOD 1.421e+09

We can see the sequence of two types of damages are different, so we can add these two types of damage together to see to sum.

totaldmg <- merge(damage, cropdmg, by = "EVTYPE")
totaldmg$total = totaldmg$PROPDMG + totaldmg$CROPDMG
totaldmgorder <- totaldmg[order(totaldmg$total, decreasing = TRUE), ]
totaldmgorder[1:5, ]
##              EVTYPE   PROPDMG   CROPDMG     total
## 2             FLOOD 1.498e+11 1.069e+10 1.605e+11
## 4 HURRICANE-TYPHOON 8.117e+10 5.350e+09 8.652e+10
## 3              HAIL 1.573e+10 3.026e+09 1.876e+10
## 1       FLASH FLOOD 1.614e+10 1.421e+09 1.756e+10
## 5 THUNDERSTORM WIND 9.704e+09 1.160e+09 1.086e+10