Impact from Storms to Population Health and Economy (PA2)

Synopsis

This report analyses U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to understand which types of events are most harmful to population health and the economy.

Tornadoes represent the most harm. Although only representing 4% of total events since records began, they have contributed more than 97,000 incidents involving a direct injury or fatality.

Whilst “Floods” are responsible for the greatest economic damage (crops and property) nearing US$180 billion. Only 4% of total recorded flood events represent 38% of the total damages inflicted.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries and property damage. Preventing such outcomes to the extent possible is a key concern.

This report will investigate the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to answer two questions:-

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The definition of economic consequence is the value of destruction to property and crop damage
Population health is defined as fatality and injuries arising directly from a natural cause

Data Processing

Loading the storm data directly from the bz2 file (listed in the references section) which contains a header, with missing values coded as zero length field.

dataSD <- read.csv("repdata_data_StormData.csv.bz2", header = TRUE, quote = "\"", 
    sep = ",", na.strings = "", as.is = T)
totalRowCount <- nrow(dataSD)  # 902297

Data Filtering

Reducing the dataset to events that have fatalities, injuries, crops or property damage (PROPDMG,CROPDMG). This greatly reduces the information necessary for onward calculations (and reduces the memory and processing requirements).

dataSD <- subset(x = dataSD, subset = INJURIES > 0 | FATALITIES > 0 | PROPDMG > 
    0 | CROPDMG > 0, select = c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", 
    "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
filteredRowCount <- nrow(dataSD)  #254633

Data Cleansing

As the EVTYPE field has 488 (985 pre-filtered) different values, including mis-spellings of the same category, and sub-categories (“Flood” & “Flash Flood”) and mixed cases, I will re-code these to a New Event type by searching for partial matches and re-tagging them from the list below.

# For each new event type below, locate any partial matches and set their
# new category
new_evtypes <- c("SNOW", "ICE", "WIND", "TSUNAMI", "SMOKE", "RAIN", "TORNADO", 
    "VOLCANIC", "WIND", "HEAT", "FLOOD", "THUNDERSTORM", "LIGHTNING", "HURRICANE", 
    "STORM")
dataSD$EVTYPE <- toupper(dataSD$EVTYPE)
for (i in new_evtypes) {
    dataSD$EVTYPE[grep(as.character(i), dataSD$EVTYPE)] <- as.character(i)
}

Fatality and Injury calculations

Population health will be defined as INJURIES + FATALITIES and the sum will be calculated for each distinct Event type (EVTYPES) with the results sorted in descending order.

aggFatals <- as.data.frame(tapply(dataSD$FATALITIES + dataSD$INJURIES, list(dataSD$EVTYPE), 
    sum))
names(aggFatals) <- "total"
aggFatals <- sort(aggFatals$total, decreasing = T)
head(aggFatals, 5)  # Top 5 
##   TORNADO      WIND      HEAT     FLOOD LIGHTNING 
##     97043     12901     12362     10126      6048

Now calculating percentage statistics to be used in Synopsis and Results.

pctEventsWithInjuriesFatalty <- round(sum(dataSD$INJURIES > 0 | dataSD$FATALITIES)/totalRowCount, 
    2) * 100  # % Events with Injuries or Fatalities
pctEventsTornado <- round(nrow(dataSD[dataSD$EVTYPE == "TORNADO", ])/totalRowCount, 
    2) * 100  # % Tornado events in dataset

Calculate the damage

Calculating the damage per event at a property and crop level and producing a total damage value to find the number one cause of economic damage from natural weather events.
The dataset contains a damage value for crop and property with a multiplier field (e.g, CROPDMGEXP) and thus K,M,B represent thousands,millions and billions in US dollars; any corrupt or miscoded values are ignored.

money <- c(K = 10^3, M = 10^6, B = 10^9)  # Create lookup
dataSD$PROPDMGVAL <- dataSD$PROPDMG * money[as.character(dataSD$PROPDMGEXP)]
dataSD$PROPDMGVAL[is.na(dataSD$PROPDMGVAL)] <- 0
dataSD$CROPDMGVAL <- dataSD$CROPDMG * money[as.character(dataSD$CROPDMGEXP)]
dataSD$CROPDMGVAL[is.na(dataSD$CROPDMGVAL)] <- 0
dataSD$TOTALDMG <- dataSD$PROPDMGVAL + dataSD$CROPDMGVAL

Now calculate which event causes (EVTYPE) the most economic impact (property+crop damage) and sort the results in descending order.

aggDmg <- NULL
aggDmg <- as.data.frame(tapply(dataSD$TOTALDMG, list(dataSD$EVTYPE), sum))
names(aggDmg) <- "total"
aggDmg <- sort(aggDmg$total, decreasing = T)
head(aggDmg, 5)
##     FLOOD HURRICANE     STORM   TORNADO      WIND 
## 1.798e+11 9.013e+10 6.457e+10 5.740e+10 1.989e+10

Derive statistics relating to economic damages, total damages in billions, top category (most damaging event type) and percentage of events relating to Flooding in the dataset.

totalDamagesBn <- sum(aggDmg)/10^9
topCategoryDamagesBn <- aggDmg[1]/10^9
topCategoryPctDamages <- round(topCategoryDamagesBn/totalDamagesBn, 2) * 100
pctEventsFlood <- round(nrow(dataSD[dataSD$EVTYPE == "FLOOD", ])/totalRowCount, 
    2) * 100

Results

Impact to Public Health

With a combined fatalty and injury total, “Tornadoes” is a clear out-right cause of population health issues from natural storm sources. Tornado events in general represented 4% of the recorded observations.

aggFatals <- aggFatals/1000
barplot(head(aggFatals, 5), col = rainbow(5), ylim = c(0, 100), ylab = "Number of events (thousands)", 
    xlab = "Sources of events", main = "Top 5 harmful causes to Population Health", 
    cex.names = 0.6, cex.axis = 0.6)

plot of chunk popresults

This figure clearly shows that Tornadoes are the number one cause of fatalities and injuries to population health with second place hotly contested between Wind and Heat.

Economic Consequences

The most economic impact is caused by Flooding, when combining crop and property damage to a value near 180 billion across all years.
Flooding represented 38% of total damages from only 4% of the total recorded events.

aggDmgPlot <- head(aggDmg, 5)/10^9
barplot(aggDmgPlot, col = rainbow(5), ylim = c(0, 200), ylab = "Economic Damage (Billions)", 
    xlab = "Sources of events", main = "Top 5 causes of Economic Consequence", 
    cex.names = 0.6)

plot of chunk resulteconomic

This figure clearly shows that “Floods” are responsible for the greatest economic damage (crops and property) nearing 180Billion from only 4% of recorded events whilst representing 38% of total damages recorded.

Acknowledgements

Operations and Services Performance
Storm Data FAQ
NWS Counties

Disclaimer

NWS does not guarantee the accuracy or validity for Storm Data Information.

Environment

sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] knitr_1.5
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.4   evaluate_0.5.5 formatR_0.10   stringr_0.6.2 
## [5] tools_3.1.0