Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project aims at exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We found that tornado is the most detrimental in terms of both population health and economic consequences.

Data Processing

First, we download the data from Storm Data. We load the data into R as follows:

setwd('/Users/chengnie/Dropbox/code/R/coursera/05_ReproducibleResearch')
storm <- read.csv('./data/repdata-data-StormData.csv.bz2', header = T)

Research Questions

We are interested in the following 2 questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Results

Research Question 1

We found in Storm Data Documentation keep the detriment to human population in terms of fatalties and injuries, we would use a total of the fatalties and injuries as the measure to evaluate the impact of different event types.

library(dplyr)
storm %>% 
    mutate(fat_inj = FATALITIES + INJURIES) %>%
    group_by(EVTYPE) %>% 
    summarize(total_fat_inj= sum(fat_inj)) %>%
    arrange(desc(total_fat_inj)) 
## Source: local data frame [985 x 2]
## 
##               EVTYPE total_fat_inj
## 1            TORNADO         96979
## 2     EXCESSIVE HEAT          8428
## 3          TSTM WIND          7461
## 4              FLOOD          7259
## 5          LIGHTNING          6046
## 6               HEAT          3037
## 7        FLASH FLOOD          2755
## 8          ICE STORM          2064
## 9  THUNDERSTORM WIND          1621
## 10      WINTER STORM          1527
## ..               ...           ...

As we can see from the results, tornado is the most detrimental to population health measured by the total number of fatalties and injuries.

Research Question 2

Since the dmage are kept in 4 columns: PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. Of these 4 columns, the first two are for property damage and the exponent of such property damage. The last two are similar for croporation damage.

We need to translate the PROPDMGEXP and CROPDMGEXP accordingly so that we can calculate the total damage to compare different types of events.

# property damage 
#################################################################################

# treat 1,2,3,4,5,6,7,8 as the power of ten

storm$PROPDMGEXP_DIGIT = 0
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% 1:8] = as.numeric(as.character(storm$PROPDMGEXP[storm$PROPDMGEXP %in% 1:8]))

# H or h into 2 (hundreds), K into 3 (thousands), M or m into 6 (millions), B into 9 (billions)

storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('H','h')] = 2
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('K','k')] = 3
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('M','m')] = 6
storm$PROPDMGEXP_DIGIT[storm$PROPDMGEXP %in% c('B','b')] = 9

# Get the total property damage value

storm$PROPDMG_NUM = storm$PROPDMG * 10^storm$PROPDMGEXP_DIGIT

# croporation damage
#################################################################################

# consider multiply PROPDMG with H/K/M/B hundread/thousand/million/billion

storm$CROPDMGEXP_DIGIT = 0
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('2')] = 2
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('K','k')] = 3
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('M','m')] = 6
storm$CROPDMGEXP_DIGIT[storm$CROPDMGEXP %in% c('B')] = 9

# get total crop damage number
storm$CROPDMG_NUM = storm$CROPDMG * 10^storm$CROPDMGEXP_DIGIT

# use sume of prop and crop damage as the measure

# One grader picked up the error here. I am really grateful for his(her) effort of looking at my code in such great detail. 
# storm$DMG_NUM = storm$PROPDMG + storm$CROPDMG
storm$DMG_NUM = storm$PROPDMG_NUM + storm$CROPDMG_NUM

toPlot <- storm %>% 
    group_by(EVTYPE) %>% 
    summarize(total_dmg= sum(DMG_NUM)) %>%
    arrange(desc(total_dmg))  

head(toPlot, 10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE    total_dmg
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333946
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221986
## 6        FLASH FLOOD  18243991078
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360

As we can see, tornado is still flood is number one in terms of total damage. We can also show it through plot as follows. We plot only the top 5 event types.

toPlotSub <- toPlot[1:5,]
# with(toPlotSub, barplot(total_dmg, main = "Top 5 event types ranked by total dmage", ylab = "Total damage in dollars", xlab = "Types of events"))

library(ggplot2)
qplot(EVTYPE, data=toPlotSub, geom="bar", weight = total_dmg, binwidth = 1)+
        scale_y_continuous("Number of Fatalities") + 
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
        xlab("Severe Weather Type") + 
        ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.")

Learn from others’ submission

# http://rpubs.com/pjmorenoa/peerassessment2
options(scipen = 1)  # Turn off scientific notations for numbers
library(ggplot2)
qplot(EVTYPE, data=toPlotSub, geom="bar", weight = total_dmg, binwidth = 1)+
        scale_y_continuous("Number of Fatalities") + 
        theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
        xlab("Severe Weather Type") + 
        ggtitle("Total Fatalities by Severe Weather\n Events in the U.S.")


# put the URL in the download
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
datafile <- "StormData.csv.bz2"
download.file(fileUrl, datafile)
stormdata <- read.csv(datafile)

# http://rpubs.com/pavelanni/121498 Great submission
# get the top records
stormdata.health.top <- stormdata %>% 
        group_by(EVTYPE) %>% 
        summarise(Injuries = sum(INJURIES), Fatalities = sum(FATALITIES)) %>%
        filter(Injuries != 0 & Fatalities != 0) %>%
        arrange(desc(Fatalities+Injuries)) %>%
        head(10)


# According to this 
# https://github.com/flyingdisc/RepData_PeerAssessment2/blob/master/how-to-handle-PROPDMGEXP.md
# let's make a conversion table for PRODDMGEXP and CROPDMGEXP

exptonum <- function(x) {
        switch(as.character(x), 
               "h" = 100,
               "H" = 100,
               "k" = 1000,
               "K" = 1000,
               "m" = 1000000,
               "M" = 1000000,
               "B" = 1000000000,
               "b" = 1000000000,
               "+" = 1,
               "-" = 0,
               "?" = 0,
               "0" = 10,
               "1" = 10,
               "2" = 10,
               "3" = 10,
               "4" = 10,
               "5" = 10,
               "6" = 10,
               "7" = 10,
               "8" = 10,
               " " = 0,
               0)
}

stormdata.cash <- stormdata %>% 
        rowwise() %>%
        mutate(PROPDMGCASH = PROPDMG * exptonum(PROPDMGEXP), 
               CROPDMGCASH = CROPDMG * exptonum(CROPDMGEXP))

# clever use of transformation
stormData <- stormData %>% mutate(totalCost = PROPDMG*( (PROPDMGEXP ==  "K")*1000 +
                                                        (PROPDMGEXP ==  "M")*1000000 +
                                                        (PROPDMGEXP ==  "m")*1000000 +
                                                        (PROPDMGEXP ==  "B")*1000000000) +
                                              CROPDMG*( (CROPDMGEXP ==  "K")*1000 +
                                                        (CROPDMGEXP ==  "k")*1000 +
                                                        (CROPDMGEXP ==  "M")*1000000 +
                                                        (CROPDMGEXP ==  "m")*1000000 +
                                                        (CROPDMGEXP ==  "B")*1000000000))