The purpose of this report is to answer two questions using the NOAA Storm Database using data from the earliest recordings in 1950 to November 2011. Those questions are:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The analysis of this data shows a pareto-like (“80/20” rule) distribution of human impacts (fatalities and injuries) and economic impact by storm event type where the vast majority of economic and human inpact is concentrated in a small subset of the event types.
The most significant impacts on humans from these events comes from the following sources in this order and magnitude:
Event Injuries and Fatalities Percent of Total
The most significant economic impacts come from the following storm events in this order and magnitude:
Event Economic Impact Percent of Total
It is important to remember that this analysis covers a very long period (1950-2011) and both the coverage in reporting and the quality of the reporting has changed dramatically over time (improved).
Recommended further studies:
Review of findings in the last 5,10,20, and 30 years to assess any differences in the trends. Is more recent data showing changes that should affect government policy decisions and emergency management training and budgeting?
Review of how the volume of injuries, fatalities, property, and crop damage is distributed across individual storm events to look at differences between persistent yearly threats vs. long period catastrophes. This analysis would be valuable to government and business disaster planners to identify potential peak demand for emergency response vs. time period.
Analysis of the relative impacts of Fatalities as a % of Total Human impact and relative cost of property damage vs. crop damage. This deeper analysis may be useful to government policy formulation.
Analysis of events vs. years and also during various seasons of the year.
Regional and locational analysis using the additional location attributes in the data.
Data Sources:
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Download source data from NOAA…
# cache is NOT set. Ran into strange errors during testing.
#
#
sourceFile <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
destFile <- "NOAAStormData.csv.bz2"
# next line commented out to avoid duplication
# download.file(sourceFile, destFile, method="curl")
dirList <- dir()
ifelse("NOAAStormData.csv.bz2" %in% dirList, print("Successful download"), print("Download problem"))
## [1] "Successful download"
## [1] "Successful download"
stormData <- read.csv(destFile)
require(data.table) # this will speed processing of this large data frame
## Loading required package: data.table
require(plyr) # this will make it easier to query this large file and aggregate results
## Loading required package: plyr
stormDataDT <- data.table(stormData) # converting to a data.table will speed queries
EVTypes <- sort(unique(stormDataDT$EVTYPE))
Clean up Weather/Storm Event Types…
# the event type classification system raw data includes typos and levels of granularity too low for this report
# clean up event classfications and rename EVTYPE to EVTYPECLEAN
stormDataDT$EVTYPECLEAN <- stormDataDT$EVTYPE
stormDataDT[grep("THUN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("THUD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("TSTM", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "THUNDERSTORM"
stormDataDT[grep("TROPICAL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TROPICAL STORM"
stormDataDT[grep("TORNADO", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TORNADO"
stormDataDT[grep("FLOOD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FLOOD"
stormDataDT[grep("WATER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WATERSPOUT"
stormDataDT[grep("WINTER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTER STORM"
stormDataDT[grep("WINTERY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTERY MIX"
stormDataDT[grep("WINTRY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WINTERY MIX"
stormDataDT[grep("SNOW", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "SNOW"
stormDataDT[grep("RECORD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RECORD SETTING EVENT"
stormDataDT[grep("ICE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "ICE"
stormDataDT[grep("WIND", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WIND"
stormDataDT[grep("RAIN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("PRECIPITATION", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("SHOWER", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "RAIN"
stormDataDT[grep("HAIL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HAIL"
stormDataDT[grep("HEAT", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HEAT"
stormDataDT[grep("HOT", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HEAT"
stormDataDT[grep("COOL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "COOL"
stormDataDT[grep("WARM", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WARM"
stormDataDT[grep("COLD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "COLD"
stormDataDT[grep("DUST", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "DUST"
stormDataDT[grep("WET", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WET"
stormDataDT[grep("DRY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "DRY"
stormDataDT[grep("BLIZZARD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "BLIZZARD"
stormDataDT[grep("FREEZ", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FREEZE"
stormDataDT[grep("FROST", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FREEZE"
stormDataDT[grep("SURF", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "SURF"
stormDataDT[grep("HURRICANE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "HURRICANE"
stormDataDT[grep("FIRE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FIRE"
stormDataDT[grep("FUNNEL", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "FUNNEL"
stormDataDT[grep("TIDE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "TIDE"
stormDataDT[grep("LIGHTNING", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "LIGHTNING"
stormDataDT[grep("SUMMARY", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "IGNORE"
stormDataDT[grep("MUD", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "MUDSLIDE"
stormDataDT[grep("VOLCANIC", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "VOLCANIC"
stormDataDT[grep("UNSEASON", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "UNSEASONABLE"
stormDataDT[grep("LANDSLIDE", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "LANDSLIDE"
stormDataDT[grep("SWELLS", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "WAVES"
stormDataDT[grep("URBAN", stormDataDT$EVTYPE, ignore.case = TRUE), ]$EVTYPECLEAN <- "URBAN EVENT"
Start Data Analysis and summarize by Human Impact and Economic Impact…
# Begin analysis of the data
# Analysis on Human Impacts
stormDataHuman <- stormDataDT[,list(numEVTYPE = sum(EVTYPECLEAN),Fatalmean=mean(FATALITIES, na.rm=TRUE), Fatalsd=sd(FATALITIES,na.rm=TRUE), Fatalsum=sum(FATALITIES, na.rm=TRUE), InjuryMean=mean(INJURIES, na.rm=TRUE), Injurysd=sd(INJURIES, na.rm=TRUE), Injurysum=sum(INJURIES, na.rm=TRUE)), by=EVTYPECLEAN] # use EVTYPECLEAN to get aggregated Types
stormDataHuman$FatalandInjury <- stormDataHuman$Fatalsum + stormDataHuman$Injurysum
stormDataHumanSortFatal <- stormDataHuman[order(-Fatalsum)] # sort by fatalities in descending order
stormDataHumanSortInjury <- stormDataHuman[order(-Injurysum)] # sort by injuries in descending order
stormDataHumanSortFatalandInjury <- stormDataHuman[order(-FatalandInjury)]
# Analysis on Economic Impacts
# The economic data is captured in two columns. They are: PROPDMG "Property Damage" & CROPDMG "Crop Damage"
# The economic data needs to be converted into a common numeric format. Field PROPDMGEXP and CROPDMGEXP use the notation of K for thousands of $'s, M ofr millions, and B for Billions - need to convert these values via function
# not necessarily the fastest approach but it works. Missing how to handle numeric chars, +, - , and ? symbols per NOAA code book.
convert2dollars <- function(amount, unit) {
unit <- tolower(unit)
damageCost <- amount * 1000 # (to be confirmed) assumes the figures are in k unless otherwise noted. +, - symbols are unclear
if(unit =="h") {return(amount * 1e2)}
if(unit =="k") {return(amount * 1e3)}
if(unit =="m") {return(amount * 1e6)}
if(unit =="b") {return(amount * 1e9)}
return(damageCost)
}
# this routine is slow relative to other approaches but it works. Replace in a future upgrade.
# Process the Property Damage numbers to be in whole dollars
stormDataDT$PROPDMGDOLLARS <- mapply(convert2dollars, as.numeric(stormDataDT$PROPDMG), as.character(stormDataDT$PROPDMGEXP))
# Process the Crop Damage numbers to be in whole dollars
stormDataDT$CROPDMGDOLLARS <- mapply(convert2dollars, as.numeric(stormDataDT$CROPDMG), as.character(stormDataDT$CROPDMGEXP))
# order Economic Data
stormDataDT$YEAR <- format(as.Date(stormDataDT$BGN_DATE, format="%m/%d/%Y"),"%Y") # this is to simplify later processing on year of event
stormDataEconomicExtract <- data.frame(YEAR=stormDataDT$YEAR, EVTYPECLEAN= stormDataDT$EVTYPECLEAN, PROPERTYDMGDOLLARS=stormDataDT$PROPDMGDOLLARS, CROPDMGDOLLARS=stormDataDT$CROPDMGDOLLARS )
stormDataEconomicExtractDT <- data.table(stormDataEconomicExtract)
stormDataEconomicExtractDT$TOTALECONOMIC <- stormDataEconomicExtractDT$PROPERTYDMGDOLLARS + stormDataEconomicExtractDT$CROPDMGDOLLARS
stormDataEconomicProperty <- stormDataEconomicExtractDT[order(-PROPERTYDMGDOLLARS)]
stormDataEconomicCrop <- stormDataEconomicExtractDT[order(-CROPDMGDOLLARS)]
stormDataEconomicTotal <- stormDataEconomicExtractDT[order(-TOTALECONOMIC)]
setkey(stormDataEconomicTotal, EVTYPECLEAN) # note use of Event Type Clean version
# note use of EVTYPECLEAN to group by hte cleaned up Event Types
stormDataEconomicSummary <- as.data.frame(stormDataEconomicTotal[, sum(TOTALECONOMIC, na.rm=TRUE), by = EVTYPECLEAN])
colnames(stormDataEconomicSummary)[2] <- "TOTALECONOMIC"
# sort the data by total economic damage by event type
stormDataEconomicSummarySorted <- stormDataEconomicSummary[order(-stormDataEconomicSummary$TOTALECONOMIC),]
Construct Plots…
par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.
barplot(stormDataHumanSortFatalandInjury[1:10]$FatalandInjury/1e3, names.arg=stormDataHumanSortFatalandInjury[1:10,]$EVTYPECLEAN, main="Total Human Impact (Fatalities & Injuries)\n in thousands by Storm Event Type\n U.S.1950 to Nov 2011 Cumulative", horiz=TRUE, cex.names=0.8, axis.lty = 1, xlim=c(0,100))
grid(col="blue", lwd=2)
# Economic Im
par(las=2) # make label text perpendicular to axis
par(mar=c(5,8,4,2)) # increase y-axis margin.
barplot(stormDataEconomicSummarySorted[1:10,]$TOTALECONOMIC/1e9, names.arg=stormDataEconomicSummarySorted[1:10,]$EVTYPECLEAN, main="Total Economic Impact in $B's Cumulative\n by Storm Event Type, U.S.\n 1950 to Nov 2011", horiz=TRUE, cex.names=0.8, axis.lty = 1, xlim=c(0,200))
grid(col="blue", lwd=2)
totalHumanImpact <- sum(stormDataHumanSortFatalandInjury$FatalandInjury)
stormDataHumanSortFatalandInjury$PERCENTOFHUMAN <- 100*stormDataHumanSortFatalandInjury$FatalandInjury/totalHumanImpact
totalEconomicImpact <- sum(stormDataEconomicSummarySorted$TOTALECONOMIC)
stormDataEconomicSummarySorted$PERCENTOFECONOMIC <- 100*stormDataEconomicSummarySorted$TOTALECONOMIC/totalEconomicImpact
stormDataHumanSortFatalandInjury[1:10,]
## EVTYPECLEAN numEVTYPE Fatalmean Fatalsd Fatalsum InjuryMean Injurysd
## 1: TORNADO 834 0.092825 1.41167 5633 1.505570 17.1747
## 2: WIND 960 0.003297 0.07626 1194 0.031370 0.5754
## 3: HEAT 275 1.183258 12.06219 3138 3.478130 23.9464
## 4: FLOOD 170 0.018515 0.23004 1523 0.104576 6.1848
## 5: LIGHTNING 464 0.051788 0.24053 817 0.331643 1.2746
## 6: ICE 417 0.046512 0.32340 102 0.987688 33.5846
## 7: WINTER STORM 972 0.014137 0.18479 277 0.095744 2.0990
## 8: FIRE 992 0.021231 0.35031 90 0.379335 3.4559
## 9: HAIL 244 0.000155 0.04764 45 0.005052 0.3872
## 10: HURRICANE 402 0.473684 1.63269 135 4.652632 50.1499
## Injurysum FatalandInjury PERCENTOFHUMAN
## 1: 91364 96997 62.3082
## 2: 11361 12555 8.0650
## 3: 9224 12362 7.9410
## 4: 8602 10125 6.5040
## 5: 5232 6049 3.8857
## 6: 2166 2268 1.4569
## 7: 1876 2153 1.3830
## 8: 1608 1698 1.0907
## 9: 1467 1512 0.9713
## 10: 1326 1461 0.9385
stormDataEconomicSummarySorted[1:10,]
## EVTYPECLEAN TOTALECONOMIC PERCENTOFECONOMIC
## 26 FLOOD 1.797e+11 37.725
## 42 HURRICANE 9.027e+10 18.947
## 95 TORNADO 5.736e+10 12.039
## 93 STORM SURGE 4.332e+10 9.093
## 34 HAIL 2.073e+10 4.352
## 104 WIND 1.784e+10 3.744
## 21 DROUGHT 1.502e+10 3.152
## 47 ICE 8.998e+09 1.889
## 114 FIRE 8.900e+09 1.868
## 97 TROPICAL STORM 8.411e+09 1.765
# unified table via merge function
#mergeddf <- merge(stormDataEconomicSummarySorted, stormDataHumanSortFatalandInjury, by="EVTYPECLEAN")
#mergeddf[order(-mergeddf$PERCENTOFECONOMIC),][1:10]