Peer Assessment 2 for Reproducible Research

Synopsis

Using data from the U.S. National Oceanic and Atmospheric Administration (NOAA) storm database, we attempt to determine which types of storms are most dangerous in terms of fatalities and injuries, and which types of storms are most destructive in terms of damage to property and crops.

Based on the data, it is clear that tornadoes are by far the most dangerous storms in terms of fatalities and injuries. Hurricanes, storms surges, droughts, and floods cause the most property and crop damage.

Data Processing

This analysis uses the following R packages.

library(plyr)
library(dplyr)
library(ggplot2)
library(reshape2)

We download and read the NOAA storm data as follows:

if (!file.exists("./data")) {
    dir.create("./data")
}
    
if (!file.exists("./data/repdata-data-StormData.csv.bz2")) {
    dataUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(dataUrl, destfile="./data/repdata-data-StormData.csv.bz2", method="libcurl")
}

storm <- read.csv(bzfile("./data/repdata-data-StormData.csv.bz2"))

There is a lot of data in this dataset.

dim(storm)
## [1] 902297     37

We are interested in which types of weather events are most destructive in terms of (1) injuries and fatalities, and (2) property and crop damage. Let’s get rid of the columns we don’t need.

storm <- select(storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(storm)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Since we want to calculate various totals by type of weather event, we’ll group the data accordingly.

storm_by_evtype <- group_by(storm, EVTYPE)

Fatalities and Injuries

To analyze fatalities and injuries caused by storms, we’ll create a new dataframe with a “long” structure that will facilitate summing and plotting injuries and fatalities by storm type.

storm_fatalities_and_injuries <- select(storm_by_evtype, EVTYPE, FATALITIES, INJURIES)
storm_fatalities_and_injuries <- melt(storm_fatalities_and_injuries, id = "EVTYPE")
head(storm_fatalities_and_injuries)
##    EVTYPE   variable value
## 1 TORNADO FATALITIES     0
## 2 TORNADO FATALITIES     0
## 3 TORNADO FATALITIES     0
## 4 TORNADO FATALITIES     0
## 5 TORNADO FATALITIES     0
## 6 TORNADO FATALITIES     0

Let’s identify the 10 most dangerous types of weather events based on fatalities and injuries.

top_ten <- storm_fatalities_and_injuries %>%
    ddply(.(EVTYPE), summarize, totalCasualties = sum(value)) %>%
    arrange(desc(totalCasualties)) %>%
    slice(1:10)
top_ten
##               EVTYPE totalCasualties
## 1            TORNADO           96979
## 2     EXCESSIVE HEAT            8428
## 3          TSTM WIND            7461
## 4              FLOOD            7259
## 5          LIGHTNING            6046
## 6               HEAT            3037
## 7        FLASH FLOOD            2755
## 8          ICE STORM            2064
## 9  THUNDERSTORM WIND            1621
## 10      WINTER STORM            1527

A picture is worth a thousand words.

ss <- filter(storm_fatalities_and_injuries, EVTYPE %in% top_ten$EVTYPE)
ss$EVTYPE <- factor(ss$EVTYPE, levels = rev(top_ten$EVTYPE), ordered = TRUE)
p <- ggplot(ss, aes(EVTYPE, value, fill=variable)) + 
    geom_bar(stat = 'identity') + 
    ggtitle("Most Dangerous Storm Types") +
    xlab("Storm Type") +
    ylab("Number of Injuries and Fatalities") +
    labs(fill = "") +
    coord_flip()
p

The plot shows the top 10 most dangerous storm types in terms of fatalities and injuries.

Damage to Property and Crops

Property damage values are stored in two columns, one containing a number and one containing a letter like “H” for hundreds, “K” for thousands, “M” for millions, “B” for billions. So the combination of, say 25 and “M” would mean $25,000,000. Crop damage values are stored similarly. We need to convert the property and crop damage data to dollars before we can sum them.

The rest of the processing follows the same steps as we did for fatalities and injuries.

toDollars <- function (num, alpha) {
    if (num == 0) {
        return(num)
    }

    if (alpha %in% c("h", "H")) {
        return(num * 100)
    }

    if (alpha %in% c("k", "K")) {
        return(num * 1e3)
    }

    if (alpha %in% c("m", "M")) {
        return(num * 1e6)
    }
    
    if (alpha %in% c("b", "B")) {
        return(num * 1e9)
    }
    
    return(NA)
}

vToDollars <- Vectorize(toDollars)

storm_damage <- select(storm_by_evtype, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
storm_damage$PropertyDamage <- vToDollars(storm_damage$PROPDMG, storm_damage$PROPDMGEXP)
storm_damage$CropDamage <- vToDollars(storm_damage$CROPDMG, storm_damage$CROPDMGEXP)

storm_damage <- select(storm_damage, EVTYPE, PropertyDamage, CropDamage)
storm_damage <- melt(storm_damage, id = "EVTYPE")
head(storm_damage)
##    EVTYPE       variable value
## 1 TORNADO PropertyDamage 25000
## 2 TORNADO PropertyDamage  2500
## 3 TORNADO PropertyDamage 25000
## 4 TORNADO PropertyDamage  2500
## 5 TORNADO PropertyDamage  2500
## 6 TORNADO PropertyDamage  2500

Now we’ll identify the top 10 most destructive types of weather events based on property damage and crop damage.

top_ten <- storm_damage %>%
    ddply(.(EVTYPE), summarize, totalDamage = sum(value)) %>%
    arrange(desc(totalDamage)) %>%
    slice(1:10)
top_ten
##               EVTYPE totalDamage
## 1  HURRICANE/TYPHOON 71913712800
## 2        STORM SURGE 43323541000
## 3            DROUGHT 15018672000
## 4          HURRICANE 14610229010
## 5        RIVER FLOOD 10148404500
## 6     TROPICAL STORM  8382236550
## 7           WILDFIRE  5060586800
## 8   STORM SURGE/TIDE  4642038000
## 9     HURRICANE OPAL  3191846000
## 10  WILD/FOREST FIRE  3108626330

Let’s plot the results.

ss <- filter(storm_damage, EVTYPE %in% top_ten$EVTYPE)
ss$EVTYPE <- factor(ss$EVTYPE, levels = rev(top_ten$EVTYPE), ordered = TRUE)
p <- ggplot(ss, aes(EVTYPE, value, fill=variable)) + 
    geom_bar(stat = 'identity') + 
    ggtitle("Most Destructive Storm Types") +
    xlab("Storm Type") +
    ylab("Total Damage") +
    labs(fill = "") +
    coord_flip()
p

The plot shows the top 10 most destructive storm types in terms of damage to property and crops.

Results

A couple of caveats are in order before we draw conclusions from this data.

First, the data is messy. In particular, the EVTYPE variable which is so important to our analysis is inconsistent and contains some suspect values. A more thorough analysis would require cleaning this data. However, because we’re aggregating data and looking at only the top 10 most dangerous or destructive types of storms, my judgement is that the inconsistencies in EVTYPE won’t affect the overall results. This is strictly a judgement call, and not based on rigorous analysis.

Second, the property and crop damage values are dollar amounts reported between 1950 and 2011. It might be appropriate to adjust for inflation, especially if we wanted to analyze changes over time. But since we’re aggregating and just trying identify the destructive types of storms, I’ll argue that not adjusting for inflation is unlikely to affect the overall results. Again, this is a judgement call that I can’t fully back up.

With those caveats in mind, what conclusions can we draw?

The types of weather events that kill and injure people are not the same as those that the most property and crop damage. Tornadoes are far and way the most dangerous storms in terms of deaths and injuries by an order of magnitude.

Hurricanes, typhoons, and storm surges cause the most property damage, droughts and floods cause the most crop damage. Of these, only floods are in the top 10 types of storms that injure and kill people.

Overall, the storms most dangerous to people tend to be violent but relatively small in terms of geographic effect. The storms that are most destructive economically tend to be less violent but spread over larger geographic areas, and often involve too much or too little water.