This report analyses U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to understand which types of events are most harmful to population health and the economy.
Tornadoes represent the most harm. Although only representing 4% of total events since records began, they have contributed more than 97,000 incidents involving a direct injury or fatality.
Whilst “Floods” are responsible for the greatest economic damage (crops and property) nearing US$180 billion. Only 4% of total recorded flood events represent 38% of the total damages inflicted.
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries and property damage. Preventing such outcomes to the extent possible is a key concern.
This report will investigate the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to answer two questions:-
The definition of economic consequence is the value of destruction to
property and crop damage
Population health is defined as fatality and injuries arising directly from
a natural cause
Loading the storm data directly from the bz2 file (listed in the references section) which contains a header, with missing values coded as zero length field.
dataSD <- read.csv("repdata_data_StormData.csv.bz2", header = TRUE, quote = "\"",
sep = ",", na.strings = "", as.is = T)
totalRowCount <- nrow(dataSD) # 902297
Reducing the dataset to events that have fatalities, injuries, crops or property damage (PROPDMG,CROPDMG). This greatly reduces the information necessary for onward calculations (and reduces the memory and processing requirements).
dataSD <- subset(x = dataSD, subset = INJURIES > 0 | FATALITIES > 0 | PROPDMG >
0 | CROPDMG > 0, select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
filteredRowCount <- nrow(dataSD) #254633
As the EVTYPE field has 488 (985 pre-filtered) different values, including mis-spellings of the same category, and sub-categories (“Flood” & “Flash Flood”) and mixed cases, I will re-code these to a New Event type by searching for partial matches and re-tagging them from the list below.
# For each new event type below, locate any partial matches and set their
# new category
new_evtypes <- c("SNOW", "ICE", "WIND", "TSUNAMI", "SMOKE", "RAIN", "TORNADO",
"VOLCANIC", "WIND", "HEAT", "FLOOD", "THUNDERSTORM", "LIGHTNING", "HURRICANE",
"STORM")
dataSD$EVTYPE <- toupper(dataSD$EVTYPE)
for (i in new_evtypes) {
dataSD$EVTYPE[grep(as.character(i), dataSD$EVTYPE)] <- as.character(i)
}
Population health will be defined as INJURIES + FATALITIES and the sum will be calculated for each distinct Event type (EVTYPES) with the results sorted in descending order.
aggFatals <- as.data.frame(tapply(dataSD$FATALITIES + dataSD$INJURIES, list(dataSD$EVTYPE),
sum))
names(aggFatals) <- "total"
aggFatals <- sort(aggFatals$total, decreasing = T)
head(aggFatals, 5) # Top 5
## TORNADO WIND HEAT FLOOD LIGHTNING
## 97043 12901 12362 10126 6048
Now calculating percentage statistics to be used in Synopsis and Results.
pctEventsWithInjuriesFatalty <- round(sum(dataSD$INJURIES > 0 | dataSD$FATALITIES)/totalRowCount,
2) * 100 # % Events with Injuries or Fatalities
pctEventsTornado <- round(nrow(dataSD[dataSD$EVTYPE == "TORNADO", ])/totalRowCount,
2) * 100 # % Tornado events in dataset
Calculating the damage per event at a property and crop level and producing
a total damage value to find the number one cause of economic damage from
natural weather events.
The dataset contains a damage value for crop and property with a multiplier
field (e.g, CROPDMGEXP) and thus K,M,B represent thousands,millions and
billions in US dollars; any corrupt or miscoded values are ignored.
money <- c(K = 10^3, M = 10^6, B = 10^9) # Create lookup
dataSD$PROPDMGVAL <- dataSD$PROPDMG * money[as.character(dataSD$PROPDMGEXP)]
dataSD$PROPDMGVAL[is.na(dataSD$PROPDMGVAL)] <- 0
dataSD$CROPDMGVAL <- dataSD$CROPDMG * money[as.character(dataSD$CROPDMGEXP)]
dataSD$CROPDMGVAL[is.na(dataSD$CROPDMGVAL)] <- 0
dataSD$TOTALDMG <- dataSD$PROPDMGVAL + dataSD$CROPDMGVAL
Now calculate which event causes (EVTYPE) the most economic impact (property+crop damage) and sort the results in descending order.
aggDmg <- NULL
aggDmg <- as.data.frame(tapply(dataSD$TOTALDMG, list(dataSD$EVTYPE), sum))
names(aggDmg) <- "total"
aggDmg <- sort(aggDmg$total, decreasing = T)
head(aggDmg, 5)
## FLOOD HURRICANE STORM TORNADO WIND
## 1.798e+11 9.013e+10 6.457e+10 5.740e+10 1.989e+10
Derive statistics relating to economic damages, total damages in billions, top category (most damaging event type) and percentage of events relating to Flooding in the dataset.
totalDamagesBn <- sum(aggDmg)/10^9
topCategoryDamagesBn <- aggDmg[1]/10^9
topCategoryPctDamages <- round(topCategoryDamagesBn/totalDamagesBn, 2) * 100
pctEventsFlood <- round(nrow(dataSD[dataSD$EVTYPE == "FLOOD", ])/totalRowCount,
2) * 100
With a combined fatalty and injury total, “Tornadoes” is a clear out-right cause of population health issues from natural storm sources. Tornado events in general represented 4% of the recorded observations.
aggFatals <- aggFatals/1000
barplot(head(aggFatals, 5), col = rainbow(5), ylim = c(0, 100), ylab = "Number of events (thousands)",
xlab = "Sources of events", main = "Top 5 harmful causes to Population Health",
cex.names = 0.6, cex.axis = 0.6)
This figure clearly shows that Tornadoes are the number one cause of fatalities and injuries to population health with second place hotly contested between Wind and Heat.
The most economic impact is caused by Flooding, when combining crop
and property damage to a value near 180 billion across all years.
Flooding represented 38% of total damages from only 4% of the total
recorded events.
aggDmgPlot <- head(aggDmg, 5)/10^9
barplot(aggDmgPlot, col = rainbow(5), ylim = c(0, 200), ylab = "Economic Damage (Billions)",
xlab = "Sources of events", main = "Top 5 causes of Economic Consequence",
cex.names = 0.6)
This figure clearly shows that “Floods” are responsible for the greatest economic damage (crops and property) nearing 180Billion from only 4% of recorded events whilst representing 38% of total damages recorded.
Operations and Services Performance
Storm Data FAQ
NWS Counties
NWS does not guarantee the accuracy or validity for Storm Data Information.
sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.5
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.4 evaluate_0.5.5 formatR_0.10 stringr_0.6.2
## [5] tools_3.1.0