In this report we explore the NOAA Storm Database and find the most harmful events with respect to population health and economic consequences.The events in the database start in the year 1950 and end in November 2011 acrossing the United States. From this dataset, we found that, accross all the years and all the country, Tornado killed and injured the largest amount of people and Flood had the greatest economic consequences.
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.10.5 (Yosemite)
##
## locale:
## [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.2.1 tools_3.2.2 htmltools_0.2.6
## [5] yaml_2.1.13 stringi_1.0-1 rmarkdown_0.8.1 knitr_1.11
## [9] stringr_1.0.0 digest_0.6.8 evaluate_0.8
options(scipen=10)
# first setting working directory
setwd("~/Desktop/courseraR/reproducible/assignment2")
# reading the dataset into R
zz <- bzfile("repdata-data-StormData.csv.bz2", "r")
mydata <- read.csv(zz)
close(zz)
Here we define the effect on population health as “Number directly killed and injured”.There are two variables: FATALITIES (Number directly killed) and INJURIES (Number directly injured), and the effect is the sum of these two variables.
population <- mydata$FATALITIES + mydata$INJURIES
dangerp <- tapply(population, mydata$EVTYPE, sum)
dangerp[which(dangerp == max(dangerp))]
## TORNADO
## 96979
mostdanger <- sort(dangerp, decreasing = TRUE)[1:5]
mostdanger <- data.frame(effect = mostdanger, eventtype = names(mostdanger))
library(ggplot2)
qplot(mostdanger$eventtype, mostdanger$effect,geom = "bar",stat = "identity", xlab = "Eventtype", ylab = "Number directly killed and injured")
From the figure above we can see that, respect to population health the most harmful event is Tornado. Tornado have killed and injured 96979 people in total. Other Top 5 events are Excessive heat, TSTM wind, Flood and Lighting, but they caused much less damage respect to population health.
There are four variables: PROPDMG, PROPDMGEXP,CROPDMG and CROPDMGEXP, which can be used to calculate numeric values of Property damage and Crop damage and we regard the sum of these two numeric values as the effect on economic.
mydata1 <- mydata
# proccessing value for propdmg
mydata1$PROPDMGEXP <- as.character(mydata1$PROPDMGEXP)
mydata1$PROPDMGEXP[which(mydata1$PROPDMGEXP == "K")] <- "1000"
mydata1$PROPDMGEXP[which(mydata1$PROPDMGEXP == "H"| mydata1$PROPDMGEXP == "h")] <- "100"
mydata1$PROPDMGEXP[which(mydata1$PROPDMGEXP == "M"| mydata1$PROPDMGEXP == "m")] <- "1000000"
mydata1$PROPDMGEXP[which(mydata1$PROPDMGEXP == "B")] <- "1000000000"
mydata1$PROPDMGEXP <- as.numeric(mydata1$PROPDMGEXP)
## Warning: NAs introduced by coercion
valuepro <- mydata1$PROPDMG*mydata1$PROPDMGEXP
# proccessing value for cropdmg
mydata1$CROPDMGEXP <- as.character(mydata1$CROPDMGEXP)
mydata1$CROPDMGEXP[which(mydata1$CROPDMGEXP == "K" |mydata1$CROPDMGEXP == "k") ] <- "1000"
mydata1$CROPDMGEXP[which(mydata1$CROPDMGEXP == "M"| mydata1$CROPDMGEXP == "m")] <- "1000000"
mydata1$CROPDMGEXP[which(mydata1$CROPDMGEXP == "B")] <- "1000000000"
mydata1$CROPDMGEXP <- as.numeric(mydata1$CROPDMGEXP)
## Warning: NAs introduced by coercion
valuecro <- mydata1$CROPDMG*mydata1$CROPDMGEXP
# total value
valuecro[which(is.na(valuecro))] <- 0
valuepro[which(is.na(valuepro))] <- 0
sumvalue <- valuecro+valuepro
# seeking the most harmful event
dangere <- tapply(sumvalue, mydata$EVTYPE, sum)
dangere[which(dangere == max(dangere))]
## FLOOD
## 150319678250
mostdangere <- sort(dangere, decreasing = TRUE)[1:5]
mostdangere <- data.frame(effect = mostdangere, eventtype = names(mostdangere))
library(ggplot2)
qplot(mostdangere$eventtype, mostdangere$effect,geom = "bar",stat = "identity", xlab = "Eventtype", ylab = "Economic damage value")
From the figure above we can see that, respect to economic consequences the most harmful event is Flood. Flood have caused 150319678250 cash damage in total. Other Top 5 events are Hurricane/Typhoon, Tornado, Storm surge and Hail, but they caused much less damage compared with Flood.