Summary

In this report we analyze the impact of specific types of weather events on public health (injuries and fatalities caused) and damage to property and crops. based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1950 - 2011. We will use the estimates of fatalities, injuries, property and crop damage to decide which types of event are the most significant in each of these four areas.

Data Processing

#options to show all output and turn off scientific notation
echo = TRUE
options(scipen = 1)
library(ggplot2)

Now we need to download and read the file (if its not already available)

#download the file containing the data (if necessary)
if (!"StormData.csv.bz2" %in% dir()) {
    download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "StormData.csv.bz2")
}
#read the file (if necessary)
if (!"stormData" %in% ls()) {
    stormData <- read.csv("stormData.csv.bz2")
}

#Read the begin date for each record to get the year,  creating a new column in the data frame
stormData$year <- as.numeric(format(as.Date(stormData$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y"))

Results

Now we look at the top event types sorted by injuries and fatalities.

#we want to see the events with the most injuries, so we add them up (by event type) and show the top 10
#we want to see the events with the most injuries/fatalities so we add them up (by event type) and show the top 10
topInjuryEvents <- as.matrix(head(sort(tapply(stormData$INJURIES, stormData$EVTYPE, sum), decreasing=TRUE),10))
topFatalityEvents <- as.matrix(head(sort(tapply(stormData$FATALITIES, stormData$EVTYPE, sum), decreasing=TRUE),10))

topInjuryEvents <- as.data.frame(topInjuryEvents)
names(topInjuryEvents) <- "Injuries"

topFatalityEvents <- as.data.frame(topFatalityEvents)
names(topFatalityEvents) <- "Fatalities"

topInjuryEvents
##                   Injuries
## TORNADO              91346
## TSTM WIND             6957
## FLOOD                 6789
## EXCESSIVE HEAT        6525
## LIGHTNING             5230
## HEAT                  2100
## ICE STORM             1975
## FLASH FLOOD           1777
## THUNDERSTORM WIND     1488
## HAIL                  1361
topFatalityEvents
##                Fatalities
## TORNADO              5633
## EXCESSIVE HEAT       1903
## FLASH FLOOD           978
## HEAT                  937
## LIGHTNING             816
## TSTM WIND             504
## FLOOD                 470
## RIP CURRENT           368
## HIGH WIND             248
## AVALANCHE             224

It is apparent that the vast majority of injuries are caused by tornadoes, while fatalities are mostly caused by tornadoes, excessive heat and flooding.

ggplot(topFatalityEvents, aes(x = rownames(topFatalityEvents), y = Fatalities)) + geom_bar(stat = "identity") + xlab("Event Type") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplot(topInjuryEvents, aes(x = rownames(topInjuryEvents), y = Injuries)) + geom_bar(stat = "identity") + xlab("Event Type") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

Economic impact (property and crop damage)

Each of Property and crop damage estimates are stored in two separate fields. One containing a number (i.e. 2.5), and the other containing a multiplier (i.e. hundreds, millions, or billions of dollars). This is explained (albeit inadequately) in the codebook and open to some interpretation as certain values are ambiguous or strange. The codebook can be found at http://ire.org/nicar/database-library/databases/storm-events/ (click on “Record Layout” and read the entry for PROPDMGEXP).

unique(stormData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

That is to say, we know that “M” or “m” probably indicate “millions” and “K” or “k” indicate thousands. There is also a blank multiplier which we take to mean “dollars.” Symbols like “?”, “-” and “4” are also used and it is not clear whether for instance “8” indicates 10^8 (100 million) or anything else, so we treat the other multipliers as “1.” Luckily the vast majority of the data consists of data in the hundreds, thousands and millions which we believe have been interpreted correctly (the symbols h and H indicating 100’s). The Crop damage data is similarly strange and we dealt with it in a similar fashion.

#We have to deal with both M and m, K and k, etc. forcing uppercase will save a few lines of code
stormData$PROPDMGEXP <- toupper(stormData$PROPDMGEXP)
stormData$CROPDMGEXP <- toupper(stormData$CROPDMGEXP)

#function we use to convert "B" to "1000000000" and "K" to "1000" as needed
mult <- function(t) {
    if (t == "B") 1e9
    else if (t == "M") 1e6
    else if (t == "K") 1e3
    else if (t == "H") 100
    else 1
}
#we apply the mult function to the entire data set and store the result in a new column
stormData$PropDmgMult <- sapply(stormData$PROPDMGEXP, mult)
stormData$CropDmgMult <- sapply(stormData$CROPDMGEXP, mult)

#now we multiply the storm damage by the multiplier to get the actual dollar amounts
stormData$PropDmgAmount <- stormData$PropDmgMult * stormData$PROPDMG
stormData$CropDmgAmount <- stormData$CropDmgMult * stormData$CROPDMG

#we add up the dollar amounts by event type and look at the top 10
topPropDmg <- as.matrix(head(sort(tapply(stormData$PropDmgAmount, stormData$EVTYPE, sum), decreasing = TRUE), 10))
topCropDmg <- as.matrix(head(sort(tapply(stormData$CropDmgAmount, stormData$EVTYPE, sum), decreasing = TRUE), 10))

topPropDmg
##                           [,1]
## FLOOD             144657709807
## HURRICANE/TYPHOON  69305840000
## TORNADO            56937160779
## STORM SURGE        43323536000
## FLASH FLOOD        16140812067
## HAIL               15732267543
## HURRICANE          11868319010
## TROPICAL STORM      7703890550
## WINTER STORM        6688497251
## HIGH WIND           5270046295
topCropDmg
##                          [,1]
## DROUGHT           13972566000
## FLOOD              5661968450
## RIVER FLOOD        5029459000
## ICE STORM          5022113500
## HAIL               3025954473
## HURRICANE          2741910000
## HURRICANE/TYPHOON  2607872800
## FLASH FLOOD        1421317100
## EXTREME COLD       1292973000
## FROST/FREEZE       1094086000

As you can see Floods, Hurricanes/Typhoons, and Tornadoes caused the most property damage (other events are variations of the top 3, i.e. Flood, Flash Flood, a separate Hurricane category, and Tropical Storm).

The vast majority of crop damage appears to be caused by drought.