Synopsis

In this report we aim to figure out what types of severe weather events in the United States are most harmful. We want to address the following questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

In order to investigate this questions, we obtained the storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA). The events in the database start in the year 1950 and end in November 2011. From these data, we found that tornado has the biggest total numbers of fatalities and injuries and flood has the greatest economic damage.

Data Processing

From the Reproducible Research course web site we obtained data on characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We first read in the data from the comma-separated-value file compressed via the bzip2 algorithm.

con <- bzfile("repdata-data-StormData.csv.bz2")
open(con, "r")
data <- read.csv(con)
close(con)

After reading in the data we convert it to tbl in order to use it with dplyr package.

library(dplyr)
data <- tbl_df(data)

Results

Across the United States, which types of events are most harmful with respect to population health?

Fatalities

First we create a fatalities.data tbl which contains top 10 events that have the biggest total numbers of fatalities.

fatalities.data <- group_by(data, EVTYPE) %>%
        summarize(total.fatalities = sum(FATALITIES)) %>%
        arrange(desc(total.fatalities)) %>%
        top_n(10, total.fatalities)

Then we print fatalities.data.

fatalities.data
## Source: local data frame [10 x 2]
## 
##            EVTYPE total.fatalities
##            (fctr)            (dbl)
## 1         TORNADO             5633
## 2  EXCESSIVE HEAT             1903
## 3     FLASH FLOOD              978
## 4            HEAT              937
## 5       LIGHTNING              816
## 6       TSTM WIND              504
## 7           FLOOD              470
## 8     RIP CURRENT              368
## 9       HIGH WIND              248
## 10      AVALANCHE              224

As we can see, tornado has the biggest total number of fatalities with 5633 fatalities, followed by excessive heat and flash flood with 1903 and 978 fatalities respectively. An avalanche has the 10th place with 224 fatalities.

We can construct a barplot to visualize our data.

library(ggplot2)
qplot(EVTYPE, data = fatalities.data, geom = "bar", weight = total.fatalities,
      xlab = "Type of events", ylab = "Total number of fatalities",
      main = "Top 10 events that have the biggest total numbers of fatalities") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Injuries

We do the analogy with injuries by creating an injuries.data tbl which contains top 10 events that have the biggest total numbers of injuries.

injuries.data <- group_by(data, EVTYPE) %>%
        summarize(total.injuries = sum(INJURIES)) %>%
        arrange(desc(total.injuries)) %>%
        top_n(10, total.injuries)

Then we print injuries.data.

injuries.data
## Source: local data frame [10 x 2]
## 
##               EVTYPE total.injuries
##               (fctr)          (dbl)
## 1            TORNADO          91346
## 2          TSTM WIND           6957
## 3              FLOOD           6789
## 4     EXCESSIVE HEAT           6525
## 5          LIGHTNING           5230
## 6               HEAT           2100
## 7          ICE STORM           1975
## 8        FLASH FLOOD           1777
## 9  THUNDERSTORM WIND           1488
## 10              HAIL           1361

As we can see, tornado has the biggest total number of injuries with 91346 injuries, followed by tstm wind and flood with 6957 and 6789 injuries respectively. A hail has the 10th place with 1361 injuries.

We can construct a barplot to visualize our data.

qplot(EVTYPE, data = injuries.data, geom = "bar", weight = total.injuries,
      xlab = "Type of events", ylab = "Total number of injuries",
      main = "Top 10 events that have the biggest total numbers of injuries") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))

Across the United States, which types of events have the greatest economic consequences?

There are estimates of property damage and crop damage in the database. Variables PROPDMG and CROPDMG contain numbers of estimates of property damage and crop damage respectively, while PROPDMGEXP and CROPDMGEXP contain alphabetical characters signifying the magnitude of estimates, i.e., “K” for thousands, “M” for millions, “B” for billions, etc. In order to transform PROPDMGEXP and CROPDMGEXP variables into convenient form we create extract.magnitude function which takes alphabetical character signifying the magnitude and returns order of the magnitude.

extract.magnitude <- function(x) {
        mag <- if (x %in% c("", "-", "?", "+")) {
                0
        }
        else if (x == "B") {
                9
        }
        else if (x %in% c("h", "H")) {
                2
        }
        else if (x %in% c("k", "K")) {
                3
        }
        else if (x %in% c("m", "M")) {
                6
        }
        else {
                as.numeric(as.character(x))
        }
        mag
}

Then we create a damage.data tbl which contains top 10 events that have the greatest total damage amounts.

damage.data <- mutate(data, PROPDMGEXP = sapply(PROPDMGEXP, extract.magnitude),
                      CROPDMGEXP = sapply(CROPDMGEXP, extract.magnitude),
                      damage = PROPDMG * 10^PROPDMGEXP + CROPDMG * 10^CROPDMGEXP) %>%
        group_by(EVTYPE) %>%
        summarize(total.damage = sum(damage)) %>%
        arrange(desc(total.damage)) %>%
        top_n(10, total.damage)

We take a look at damage.data.

damage.data
## Source: local data frame [10 x 2]
## 
##               EVTYPE total.damage
##               (fctr)        (dbl)
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333946
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221986
## 6        FLASH FLOOD  18243991078
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360

As we can see, flood has the greatest total damage amounts with $150,319,678,257, followed by hurricane/typhoon and tornado with $71,913,712,800 and $57,362,333,946 respectively. An ice storm has the 10th place with $8,967,041,360.

We can construct a barplot to visualize our data.

qplot(EVTYPE, data = damage.data, geom = "bar", weight = total.damage,
      xlab = "Type of events", ylab = "Total damage amounts",
      main = "Top 10 events that have the greatest total damage amounts") +
        theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))