Purpose

The purpose of this report is to answer two questions using the NOAA Storm Database using data from the earliest recordings in 1950 to November 2011. Those questions are:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Synopsis of Findings

The analysis of this data shows a pareto-like (“80/20” rule) distribution of human impacts (fatalities and injuries) and economic impact by storm event type where the vast majority of economic and human inpact is concentrated in a small subset of the event types.

The most significant impacts on humans from these events comes from the following sources in this order and magnitude:

Event         Injuries and Fatalities   Percent of Total
  1. TORNADO 96,997 62.3%
  2. WIND 12,555 8.1%
  3. HEAT 12,362 7.9%
  4. FLOOD 10,125 6.5%
  5. LIGHTNING 6,049 3.9%

The most significant economic impacts come from the following storm events in this order and magnitude:

Event         Economic Impact           Percent of Total
  1. FLOOD $179,734,099,570 37.7%
  2. HURRICANE $90,271,397,810 18.9%
  3. TORNADO $57,357,351,690 12.0%
  4. STORM SURGE $43,323,541,000 9.1%
  5. HAIL $20,734,573,770 4.4%

It is important to remember that this analysis covers a very long period (1950-2011) and both the coverage in reporting and the quality of the reporting has changed dramatically over time (improved).

Recommended further studies:

Data Processing

Data Sources:

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data (47Mb)

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Download source data from NOAA…

#  cache is NOT set.  Ran into strange errors during testing.
#
#
sourceFile <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destFile <- "NOAAStormData.csv.bz2"
# next line commented out to avoid duplication
# download.file(sourceFile, destFile, method="curl")

dirList <- dir()
ifelse("NOAAStormData.csv.bz2" %in% dirList, print("Successful download"), print("Download problem"))
## [1] "Successful download"
## [1] "Successful download"
stormData <- read.csv(destFile)

require(data.table) # this will speed processing of this large data frame
## Loading required package: data.table
require(plyr)  # this will make it easier to query this large file and aggregate results
## Loading required package: plyr
stormDataDT <- data.table(stormData)  # converting to a data.table will speed queries

EVTypes <- sort(unique(stormDataDT$EVTYPE))

Clean up Weather/Storm Event Types…

#  the event type classification system raw data includes typos and levels of granularity too low for this report
#  clean up event classfications and rename EVTYPE to EVTYPECLEAN

stormDataDT$EVTYPECLEAN <- stormDataDT$EVTYPE

stormDataDT[grep("THUN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("THUD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("TSTM", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("TROPICAL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TROPICAL STORM"
stormDataDT[grep("TORNADO", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TORNADO"
stormDataDT[grep("FLOOD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FLOOD"
stormDataDT[grep("WATER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WATERSPOUT"
stormDataDT[grep("WINTER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTER STORM"
stormDataDT[grep("WINTERY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTERY MIX"
stormDataDT[grep("WINTRY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTERY MIX"
stormDataDT[grep("SNOW", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "SNOW"
stormDataDT[grep("RECORD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RECORD SETTING EVENT"
stormDataDT[grep("ICE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "ICE"
stormDataDT[grep("WIND", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WIND"
stormDataDT[grep("RAIN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("PRECIPITATION", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("SHOWER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("HAIL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HAIL"
stormDataDT[grep("HEAT", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HEAT"
stormDataDT[grep("HOT", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HEAT"
stormDataDT[grep("COOL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "COOL"
stormDataDT[grep("WARM", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WARM"
stormDataDT[grep("COLD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "COLD"
stormDataDT[grep("DUST", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "DUST"
stormDataDT[grep("WET", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WET"
stormDataDT[grep("DRY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "DRY"
stormDataDT[grep("BLIZZARD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "BLIZZARD"
stormDataDT[grep("FREEZ", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FREEZE"
stormDataDT[grep("FROST", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FREEZE"
stormDataDT[grep("SURF", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "SURF"
stormDataDT[grep("HURRICANE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HURRICANE"
stormDataDT[grep("FIRE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FIRE"
stormDataDT[grep("FUNNEL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FUNNEL"
stormDataDT[grep("TIDE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TIDE"
stormDataDT[grep("LIGHTNING", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "LIGHTNING"
stormDataDT[grep("SUMMARY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "IGNORE"
stormDataDT[grep("MUD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "MUDSLIDE"
stormDataDT[grep("VOLCANIC", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "VOLCANIC"
stormDataDT[grep("UNSEASON", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "UNSEASONABLE"
stormDataDT[grep("LANDSLIDE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "LANDSLIDE"
stormDataDT[grep("SWELLS", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WAVES"
stormDataDT[grep("URBAN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "URBAN EVENT"

Start Data Analysis and summarize by Human Impact and Economic Impact…

# Begin analysis of the data
# Analysis on Human Impacts
stormDataHuman <- stormDataDT[,list(numEVTYPE = sum(EVTYPECLEAN),Fatalmean=mean(FATALITIES, na.rm=TRUE), Fatalsd=sd(FATALITIES,na.rm=TRUE), Fatalsum=sum(FATALITIES, na.rm=TRUE), InjuryMean=mean(INJURIES, na.rm=TRUE), Injurysd=sd(INJURIES, na.rm=TRUE), Injurysum=sum(INJURIES, na.rm=TRUE)), by=EVTYPECLEAN] # use EVTYPECLEAN to get aggregated Types

stormDataHuman$FatalandInjury <- stormDataHuman$Fatalsum + stormDataHuman$Injurysum
stormDataHumanSortFatal <- stormDataHuman[order(-Fatalsum)]  # sort by fatalities in descending order
stormDataHumanSortInjury <- stormDataHuman[order(-Injurysum)] # sort by injuries in descending order
stormDataHumanSortFatalandInjury <- stormDataHuman[order(-FatalandInjury)]

# Analysis on Economic Impacts
#   The economic data is captured in two columns.  They are: PROPDMG "Property Damage" & CROPDMG "Crop Damage"
#   The economic data needs to be converted into a common numeric format.  Field PROPDMGEXP and CROPDMGEXP use the notation of K for thousands of $'s, M ofr millions, and B for Billions - need to convert these values via  function

# not necessarily the fastest approach but it works.  Missing how to handle numeric chars, +, - , and ? symbols per NOAA code book.   
convert2dollars <- function(amount, unit) {
  unit <- tolower(unit)
  damageCost <- amount * 1000 # (to be confirmed) assumes the figures are in k unless otherwise noted. +, - symbols are unclear
  if(unit =="h") {return(amount * 1e2)}
  if(unit =="k") {return(amount * 1e3)}
  if(unit =="m") {return(amount * 1e6)}
  if(unit =="b") {return(amount * 1e9)}
  return(damageCost)
}

# this routine is slow relative to other approaches but it works.  Replace in a future upgrade.
# Process the Property Damage numbers to be in whole dollars
stormDataDT$PROPDMGDOLLARS <- mapply(convert2dollars, as.numeric(stormDataDT$PROPDMG), as.character(stormDataDT$PROPDMGEXP))
# Process the Crop Damage numbers to be in whole dollars
stormDataDT$CROPDMGDOLLARS <- mapply(convert2dollars, as.numeric(stormDataDT$CROPDMG), as.character(stormDataDT$CROPDMGEXP))

# order Economic Data
stormDataDT$YEAR <- format(as.Date(stormDataDT$BGN_DATE, format="%m/%d/%Y"),"%Y")  # this is to simplify later processing on year of event

stormDataEconomicExtract <- data.frame(YEAR=stormDataDT$YEAR, EVTYPECLEAN= stormDataDT$EVTYPECLEAN, PROPERTYDMGDOLLARS=stormDataDT$PROPDMGDOLLARS, CROPDMGDOLLARS=stormDataDT$CROPDMGDOLLARS )

stormDataEconomicExtractDT <- data.table(stormDataEconomicExtract)

stormDataEconomicExtractDT$TOTALECONOMIC <- stormDataEconomicExtractDT$PROPERTYDMGDOLLARS + stormDataEconomicExtractDT$CROPDMGDOLLARS

stormDataEconomicProperty <- stormDataEconomicExtractDT[order(-PROPERTYDMGDOLLARS)]
stormDataEconomicCrop <- stormDataEconomicExtractDT[order(-CROPDMGDOLLARS)]
stormDataEconomicTotal <- stormDataEconomicExtractDT[order(-TOTALECONOMIC)]

setkey(stormDataEconomicTotal, EVTYPECLEAN)  # note use of Event Type Clean version 

# note use of EVTYPECLEAN to group by hte cleaned up Event Types
stormDataEconomicSummary <- as.data.frame(stormDataEconomicTotal[, sum(TOTALECONOMIC, na.rm=TRUE), by = EVTYPECLEAN])
colnames(stormDataEconomicSummary)[2] <- "TOTALECONOMIC"
# sort the data by total economic damage by event type
stormDataEconomicSummarySorted <- stormDataEconomicSummary[order(-stormDataEconomicSummary$TOTALECONOMIC),]

Results

Construct Plots…

par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.
barplot(stormDataHumanSortFatalandInjury[1:10]$FatalandInjury/1e3, names.arg=stormDataHumanSortFatalandInjury[1:10,]$EVTYPECLEAN, main="Total Human Impact (Fatalities & Injuries)\n in thousands by Storm Event Type\n U.S.1950 to Nov 2011 Cumulative", horiz=TRUE, cex.names=0.8, axis.lty = 1, xlim=c(0,100))
grid(col="blue", lwd=2)

plot of chunk readData

# Economic Im
par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.
barplot(stormDataEconomicSummarySorted[1:10,]$TOTALECONOMIC/1e9, names.arg=stormDataEconomicSummarySorted[1:10,]$EVTYPECLEAN, main="Total Economic Impact in $B's Cumulative\n by Storm Event Type, U.S.\n 1950 to Nov 2011", horiz=TRUE, cex.names=0.8, axis.lty = 1, xlim=c(0,200))
grid(col="blue", lwd=2)

plot of chunk readData

totalHumanImpact <- sum(stormDataHumanSortFatalandInjury$FatalandInjury)
stormDataHumanSortFatalandInjury$PERCENTOFHUMAN <- 100*stormDataHumanSortFatalandInjury$FatalandInjury/totalHumanImpact

totalEconomicImpact <- sum(stormDataEconomicSummarySorted$TOTALECONOMIC)
stormDataEconomicSummarySorted$PERCENTOFECONOMIC <- 100*stormDataEconomicSummarySorted$TOTALECONOMIC/totalEconomicImpact

stormDataHumanSortFatalandInjury[1:10,]
##      EVTYPECLEAN numEVTYPE Fatalmean  Fatalsd Fatalsum InjuryMean Injurysd
##  1:      TORNADO       834  0.092825  1.41167     5633   1.505570  17.1747
##  2:         WIND       960  0.003297  0.07626     1194   0.031370   0.5754
##  3:         HEAT       275  1.183258 12.06219     3138   3.478130  23.9464
##  4:        FLOOD       170  0.018515  0.23004     1523   0.104576   6.1848
##  5:    LIGHTNING       464  0.051788  0.24053      817   0.331643   1.2746
##  6:          ICE       417  0.046512  0.32340      102   0.987688  33.5846
##  7: WINTER STORM       972  0.014137  0.18479      277   0.095744   2.0990
##  8:         FIRE       992  0.021231  0.35031       90   0.379335   3.4559
##  9:         HAIL       244  0.000155  0.04764       45   0.005052   0.3872
## 10:    HURRICANE       402  0.473684  1.63269      135   4.652632  50.1499
##     Injurysum FatalandInjury PERCENTOFHUMAN
##  1:     91364          96997        62.3082
##  2:     11361          12555         8.0650
##  3:      9224          12362         7.9410
##  4:      8602          10125         6.5040
##  5:      5232           6049         3.8857
##  6:      2166           2268         1.4569
##  7:      1876           2153         1.3830
##  8:      1608           1698         1.0907
##  9:      1467           1512         0.9713
## 10:      1326           1461         0.9385
stormDataEconomicSummarySorted[1:10,]
##        EVTYPECLEAN TOTALECONOMIC PERCENTOFECONOMIC
## 26           FLOOD     1.797e+11            37.725
## 42       HURRICANE     9.027e+10            18.947
## 95         TORNADO     5.736e+10            12.039
## 93     STORM SURGE     4.332e+10             9.093
## 34            HAIL     2.073e+10             4.352
## 104           WIND     1.784e+10             3.744
## 21         DROUGHT     1.502e+10             3.152
## 47             ICE     8.998e+09             1.889
## 114           FIRE     8.900e+09             1.868
## 97  TROPICAL STORM     8.411e+09             1.765
# unified table via merge function
#mergeddf <- merge(stormDataEconomicSummarySorted, stormDataHumanSortFatalandInjury, by="EVTYPECLEAN")
#mergeddf[order(-mergeddf$PERCENTOFECONOMIC),][1:10]