Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project aims at exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We found that tornado is the most detrimental in terms of both population health and economic consequences.
First, we download the data from Storm Data. We load the data into R as follows:
setwd('/Users/chengnie/Dropbox/code/R/coursera/05_ReproducibleResearch')
storm <- read.csv('./data/repdata-data-StormData.csv.bz2', header = T)
We are interested in the following 2 questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
We found in Storm Data Documentation keep the detriment to human population in terms of fatalties and injuries, we would use a total of the fatalties and injuries as the measure to evaluate the impact of different event types.
library(dplyr)
storm %>%
mutate(fat_inj = FATALITIES + INJURIES) %>%
group_by(EVTYPE) %>%
summarize(total_fat_inj= sum(fat_inj)) %>%
arrange(desc(total_fat_inj))
## Source: local data frame [985 x 2]
##
## EVTYPE total_fat_inj
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
## .. ... ...
As we can see from the results, tornado is the most detrimental to population health measured by the total number of fatalties and injuries.
Since the dmage are kept in 4 columns: PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. Of these 4 columns, the first two are for property damage and the exponent of such property damage. The last two are similar for croporation damage.
We need to translate the PROPDMGEXP and CROPDMGEXP accordingly so that we can calculate the total damage to compare different types of events.
# property damage
#################################################################################
# treat 1,2,3,4,5,6,7,8 as the power of ten
storm$PROPDMGEXP_DIGIT = 0
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% 1:8] = as.numeric(as.character(storm$PROPDMGEXP[storm$PROPDMGEXP %in% 1:8]))
# H or h into 2 (hundreds), K into 3 (thousands), M or m into 6 (millions), B into 9 (billions)
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('H','h')] = 2
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('K','k')] = 3
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('M','m')] = 6
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('B','b')] = 9
# Get the total property damage value
storm$PROPDMG_NUM = storm$PROPDMG * 10^storm$PROPDMGEXP_DIGIT
# croporation damage
#################################################################################
# consider multiply PROPDMG with H/K/M/B hundread/thousand/million/billion
storm$CROPDMGEXP_DIGIT = 0
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('2')] = 2
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('K','k')] = 3
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('M','m')] = 6
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('B')] = 9
# get total crop damage number
storm$CROPDMG_NUM = storm$CROPDMG * 10^storm$CROPDMGEXP_DIGIT
# use sume of prop and crop damage as the measure
# One grader picked up the error here. I am really grateful for his(her) effort of looking at my code in such great detail.
# storm$DMG_NUM = storm$PROPDMG + storm$CROPDMG
storm$DMG_NUM = storm$PROPDMG_NUM + storm$CROPDMG_NUM
toPlot <- storm %>%
group_by(EVTYPE) %>%
summarize(total_dmg= sum(DMG_NUM)) %>%
arrange(desc(total_dmg))
head(toPlot, 10)
## Source: local data frame [10 x 2]
##
## EVTYPE total_dmg
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991078
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
As we can see, tornado is still flood is number one in terms of total damage. We can also show it through plot as follows. We plot only the top 5 event types.
toPlotSub <- toPlot[1:5,]
# with(toPlotSub, barplot(total_dmg, main = "Top 5 event types ranked by total dmage", ylab = "Total damage in dollars", xlab = "Types of events"))
library(ggplot2)
qplot(EVTYPE, data=toPlotSub, geom="bar", weight = total_dmg, binwidth = 1)+
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.")
# http://rpubs.com/pjmorenoa/peerassessment2
options(scipen = 1) # Turn off scientific notations for numbers
library(ggplot2)
qplot(EVTYPE, data=toPlotSub, geom="bar", weight = total_dmg, binwidth = 1)+
scale_y_continuous("Number of Fatalities") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Severe Weather Type") +
ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.")
# put the URL in the download
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
datafile <- "StormData.csv.bz2"
download.file(fileUrl, datafile)
stormdata <- read.csv(datafile)
# http://rpubs.com/pavelanni/121498 Great submission
# get the top records
stormdata.health.top <- stormdata %>%
group_by(EVTYPE) %>%
summarise(Injuries = sum(INJURIES), Fatalities = sum(FATALITIES)) %>%
filter(Injuries != 0 & Fatalities != 0) %>%
arrange(desc(Fatalities+Injuries)) %>%
head(10)
# According to this
# https://github.com/flyingdisc/RepData_PeerAssessment2/blob/master/how-to-handle-PROPDMGEXP.md
# let's make a conversion table for PRODDMGEXP and CROPDMGEXP
exptonum <- function(x) {
switch(as.character(x),
"h" = 100,
"H" = 100,
"k" = 1000,
"K" = 1000,
"m" = 1000000,
"M" = 1000000,
"B" = 1000000000,
"b" = 1000000000,
"+" = 1,
"-" = 0,
"?" = 0,
"0" = 10,
"1" = 10,
"2" = 10,
"3" = 10,
"4" = 10,
"5" = 10,
"6" = 10,
"7" = 10,
"8" = 10,
" " = 0,
0)
}
stormdata.cash <- stormdata %>%
rowwise() %>%
mutate(PROPDMGCASH = PROPDMG * exptonum(PROPDMGEXP),
CROPDMGCASH = CROPDMG * exptonum(CROPDMGEXP))
# clever use of transformation
stormData <- stormData %>% mutate(totalCost = PROPDMG*( (PROPDMGEXP == "K")*1000 +
(PROPDMGEXP == "M")*1000000 +
(PROPDMGEXP == "m")*1000000 +
(PROPDMGEXP == "B")*1000000000) +
CROPDMG*( (CROPDMGEXP == "K")*1000 +
(CROPDMGEXP == "k")*1000 +
(CROPDMGEXP == "M")*1000000 +
(CROPDMGEXP == "m")*1000000 +
(CROPDMGEXP == "B")*1000000000))