Synopsis

The NOAA Storm Database contains weather events from 1950 through November 2011. This data contains information about the human and economic impact of adverse weather events during this timeframe. This document analyzes which events have the greatest impact so that local officials can better prioritize resources for different types of events.

Data Processing

Load all dependencies:

library(dplyr)
library(lubridate)
library(ggplot2)
library(gridExtra)

The data was loaded as follows:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata.csv.bz2", method="curl")
storm_data=read.csv(bzfile("stormdata.csv.bz2", "rt"), header=TRUE)

Note: Had the author had more time for this analysis, some scrubbing/de-duplication would have been applied. An initial attempt at merging similar event types was made, but it was not complete in time for this paper to be submitted. In any case, this did not affect the results much, as will be shown below. It would have only resulted in a re-ordering of some of the top 10 types of events.

To better understand the overall trends, a year was extracted from BNG_DATE:

storm_data <- mutate(storm_data, BEGIN_YEAR = year(as.Date(BGN_DATE, format="%m/%d/%Y")))

Some processing was applied to calculate total casualties by event type (“EVTYPE”), and this was used to determine which types of weather events contribute the most casualties:

storm_data <- mutate(storm_data, CASUALTIES = FATALITIES + INJURIES)
casualties_by_event <- aggregate(CASUALTIES ~ EVTYPE, data=storm_data, FUN=sum)
top_10_events_by_casualties <- casualties_by_event[order(-casualties_by_event$CASUALTIES),][0:10,]

Next, we looked at monetary damages by aggregating the crop and property damage:

# Property damage
storm_data <- storm_data %>% mutate(NEW_PROPDMG_MONETARY = 
                                      ifelse(toupper(PROPDMGEXP) == 'K',
                                             10^3 * PROPDMG,
                                             ifelse(toupper(PROPDMGEXP) == 'M',
                                                    10^6 * PROPDMG,
                                                    ifelse(toupper(PROPDMGEXP) == 'B',
                                                           10^9 * PROPDMG,
                                                           PROPDMG))))


# Crop damage
storm_data <- storm_data %>% mutate(NEW_CROPDMG_MONETARY = 
                                      ifelse(toupper(CROPDMGEXP) == 'K',
                                             10^3 * CROPDMG,
                                             ifelse(toupper(CROPDMGEXP) == 'M',
                                                    10^6 * CROPDMG,
                                                    ifelse(toupper(CROPDMGEXP) == 'B',
                                                           10^9 * CROPDMG,
                                                           CROPDMG))))

# Total monetary damage
storm_data <- storm_data %>% mutate(TOTAL_MONETARY_DAMAGE = NEW_CROPDMG_MONETARY + NEW_PROPDMG_MONETARY)

# Aggregate by event type
damage_by_event_type <- aggregate(TOTAL_MONETARY_DAMAGE ~ EVTYPE, data=storm_data, FUN=sum)
top_10_events_by_monetary_damage <- damage_by_event_type[order(-damage_by_event_type$TOTAL_MONETARY_DAMAGE),][0:10,c("EVTYPE", "TOTAL_MONETARY_DAMAGE")]

These totals by event type are shown in “Results” (below) using a custom rendering function that uses the gridExtra library:

render_table <- function(table, caption) {
  grid.newpage()
  h <- grobHeight(table)
  w <- grobWidth(table)
  title <- textGrob(caption, y=unit(0.5,"npc") + 0.5*h, 
                    vjust=0, gp=gpar(fontsize=20))
  gt <- gTree(children=gList(table, title))
  grid.draw(gt)
}

We also analyzed the casualties and damages by year to see if there was a trend over time:

# Let's look at the overall trend
casualties_summary_by_year <- aggregate(CASUALTIES ~ BEGIN_YEAR, data=storm_data, FUN=sum)
damage_summary_by_year <- aggregate(TOTAL_MONETARY_DAMAGE ~ BEGIN_YEAR, data=storm_data, FUN=sum)

Results

The trend over the time frame studied was an overall increase in impact from storms:

barplot(casualties_summary_by_year$CASUALTIES, xlab="1950-2011", main="Figure 1: Casualties by Year")

In particular, the years 1998 and 2006 stand out as particularly destructive timeframes:

casualties_summary_by_year[order(-casualties_summary_by_year$CASUALTIES),]$BEGIN_YEAR[1]
## [1] 1998
damage_summary_by_year[order(-damage_summary_by_year$TOTAL_MONETARY_DAMAGE),]$BEGIN_YEAR[1]
## [1] 2006

Shown here is a table of the top events in terms of total casualties:

t <- tableGrob(top_10_events_by_casualties, cols=c("EVTYPE", "casualties"), rows=seq(1,10))
render_table(t, "Figure 2: Casualties")

Similarly, here is a summary of the top events in terms of monetary damages:

t <- tableGrob(top_10_events_by_monetary_damage, cols=c("EVTYPE", "TOTAL_MONETARY_DAMAGE"), rows=seq(1,10))
render_table(t, "Figure 3: Monetary Damage")

Conclusion and Recommendations

Tornados stand out as particularly deadly, killing or injuring more than 10 times as many as the next event type, “Excessive Heat”. Flooding of various kinds (“FLOODING”, “FLASH FLOOD”, “RIVER FLOOD”, “STORM SURGE”, etc.) creates the most property damage. Tornados and Hurricane/Typhoon’s are the next most expensive in terms of economic damage.

One might speculate on the frequency of tornados and other catastrophic events versus that of flooding, and this could be the subject of further analysis. Although they might be less frequent, the magnitude of their destruction is not to be discounted by those within areas prone to either event, such as Tornado Alley or the Atlantic coast. Officials would be wise to include preparation for these types of events due to their potential for both human and economic descruction.