In this assignment, we analyzed the data of natural events from he U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database. We first read the data and clean up some event types by looking into the cookbook.
Then we aggregate the fatality, body injury, property damag, and crop damage by using the aggregate function according to different event types. With data processing and analyzing, we summarized the most harmful events to human health and the events have strongest damage to property and crop by table and figures.
The results are tornado, thunderstrom wind, flood, excessive heat are the most harmful events to human health while while flood, hurricane, tornado, storm surge and hail have the most economic consequences.
Firts, we are going to do the data processing.
if (!file.exists("../ProjectData2/FStormData.csv.bz2")) {
dir.create("../ProjectData2")
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "../FStormData.csv.bz2", method = "auto")
}
data <- read.csv("../ProjectData2/FStormData.csv.bz2")
data$EVTYPE = toupper(data$EVTYPE)
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
From the page 6 of the cookbook, we found there are several types of events which are represented in more than one names, such as “TSTM WIND” and “TUNDERSTORM WIND”. Therefore, we change the name of these events as following.
data$EVTYPE[data$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
data$EVTYPE[data$EVTYPE == "THUNDERSTORM WINDS"] <- "THUNDERSTORM WIND"
data$EVTYPE[data$EVTYPE == "RIVER FLOOD"] <- "FLOOD"
data$EVTYPE[data$EVTYPE == "HURRICANE/TYPHOON"] <- "HURRICANE-TYPHOON"
data$EVTYPE[data$EVTYPE == "HURRICANE"] <- "HURRICANE-TYPHOON"
First, let's revisit the information of this data set in the second part.
We find that this sets data counts four types of damage: fatality (FATALITIES) , injury (INJURIES), property damage (PROPDMG) and crop damage (CROPDMG) , while the latter two should be calculated with magnitude, PROPDMGEXP and CROPDMGEXP. Since only the first two items related to human health directly, we summarized these two types of data here.
library(plyr)
library(ggplot2)
fatal <- ddply(data, c("EVTYPE"), function(x) apply(x[23], 2, sum))
fatal <- fatal[order(fatal$FATALITIES, decreasing = T), ]
fatal <- fatal[1:10, ]
qplot(EVTYPE, data = fatal, geom = "bar", weight = FATALITIES, ylab = "FATALITIES",
fill = EVTYPE)
The code above aggregates the fatality data by event type and rank them in the decreasing order. We can find that tornado and excessive heat are two of the most events which cause fatality in the past years since 1950. Next, we will summary the data of injury.
library(plyr)
library(ggplot2)
injury <- ddply(data, c("EVTYPE"), function(x) apply(x[24], 2, sum))
injury <- injury[order(injury$INJURIES, decreasing = T), ]
injury <- injury[1:10, ]
qplot(EVTYPE, data = injury, geom = "bar", weight = INJURIES, ylab = "INJURIES",
fill = EVTYPE)
We can also find what events cause both major fatality and body injury.
intersect(fatal[1:10, 1], injury[1:10, 1])
## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD"
## [4] "HEAT" "LIGHTNING" "THUNDERSTORM WIND"
## [7] "FLOOD"
There are 7 types of events which are listed in top 10 of fatality and body injury. Definitely, tornado is the most harmful event to human health while others uinclude excceise heat, flash flood, and thunderstorm wind.
In this chapter, we try to summarize the property damage and crop damage caused by these natural events.
unique(data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
From the page 12 of the cookbook, the letter “K” stands for thousands, while “M” for millions and “B” for billions. however, we find both upper and lower case from these letters. The first thing we need to do is to transform the exponential terms back into actual values.
data[data$PROPDMGEXP == "K", ]$PROPDMG <- data[data$PROPDMGEXP == "K", ]$PROPDMG *
1000
data[data$PROPDMGEXP == "M", ]$PROPDMG <- data[data$PROPDMGEXP == "M", ]$PROPDMG *
1e+06
data[data$PROPDMGEXP == "m", ]$PROPDMG <- data[data$PROPDMGEXP == "m", ]$PROPDMG *
1e+06
data[data$PROPDMGEXP == "B", ]$PROPDMG <- data[data$PROPDMGEXP == "B", ]$PROPDMG *
1e+09
data[data$CROPDMGEXP == "K", ]$CROPDMG <- data[data$CROPDMGEXP == "K", ]$CROPDMG *
1000
data[data$CROPDMGEXP == "k", ]$CROPDMG <- data[data$CROPDMGEXP == "k", ]$CROPDMG *
1000
data[data$CROPDMGEXP == "M", ]$CROPDMG <- data[data$CROPDMGEXP == "M", ]$CROPDMG *
1e+06
data[data$CROPDMGEXP == "m", ]$CROPDMG <- data[data$CROPDMGEXP == "m", ]$CROPDMG *
1e+06
data[data$CROPDMGEXP == "B", ]$CROPDMG <- data[data$CROPDMGEXP == "B", ]$CROPDMG *
1e+09
Therefore, we can aggregate the propery damage and crop damage by event types and rank them in decreasing order.
library(plyr)
library(ggplot2)
damage <- ddply(data, c("EVTYPE"), function(x) apply(x[25], 2, sum))
damage <- damage[order(damage$PROPDMG, decreasing = T), ]
damage <- damage[1:10, ]
qplot(EVTYPE, data = damage, geom = "bar", weight = PROPDMG, ylab = "PROPDMG",
fill = EVTYPE)
We find the flood is the most harmful event as regarding to property damage, while the second most harmful event is hurricane(typhoon).
library(plyr)
library(ggplot2)
cropdmg <- ddply(data, c("EVTYPE"), function(x) apply(x[27], 2, sum))
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = T), ]
cropdmg <- cropdmg[1:10, ]
head(cropdmg)
## EVTYPE CROPDMG
## 84 DROUGHT 1.397e+10
## 154 FLOOD 1.069e+10
## 364 HURRICANE-TYPHOON 5.350e+09
## 386 ICE STORM 5.022e+09
## 212 HAIL 3.026e+09
## 138 FLASH FLOOD 1.421e+09
We can see the sequence of two types of damages are different, so we can add these two types of damage together to see to sum.
totaldmg <- merge(damage, cropdmg, by = "EVTYPE")
totaldmg$total = totaldmg$PROPDMG + totaldmg$CROPDMG
totaldmgorder <- totaldmg[order(totaldmg$total, decreasing = TRUE), ]
totaldmgorder[1:5, ]
## EVTYPE PROPDMG CROPDMG total
## 2 FLOOD 1.498e+11 1.069e+10 1.605e+11
## 4 HURRICANE-TYPHOON 8.117e+10 5.350e+09 8.652e+10
## 3 HAIL 1.573e+10 3.026e+09 1.876e+10
## 1 FLASH FLOOD 1.614e+10 1.421e+09 1.756e+10
## 5 THUNDERSTORM WIND 9.704e+09 1.160e+09 1.086e+10