Synopsis

In this analysis, we are leveraging the National Oceanic and Atomospheric Adiministration’s (NOAA) storm database to explore the impacts (fatalities, injuries, and property damage) of events. Specifically, we are focused on understanding which types of events are associated with more damage than others. The data set we are using for this analysis includes from 1950-2011.

Supporting documentation for the data set can be found here:

Data Processing

The first steps we performed were to load the relevant packages we will utilize, load in the data set, and prepare it for analysis.

Packages

library(data.table)
library(ggplot2)
library(dplyr)
library(lubridate)
library(knitr)

Download and Load Data

Due to the size of the dataset, it takes some time to load in the data (3-4 minutes).

#commented out download once downloaded to save time
#download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2")
FullData <- read.csv("StormData.csv.bz2")

Preparing Data

The data set includes 902,297 observations and 37 variables for each. The data provided does not seem to align directly with the format shown on the NOAA website (separate files for the events and fatality details), but most of the variables seem self explantory.

For the purposes of this analysis, some of the variables will not be used, so we are selecting only relevant variables for the data we will use for the rest of the analysis. Prior to eliminating many variables, we edit the BGN_DATE variable to the date format, and the F (Strength of storms) was changed to a factor. We are also only using data reported in 1996 or later, because before that time, only data on some event types was recorded.

FullData$BGN_DATE <- mdy_hms(FullData$BGN_DATE)
FullData$F <- as.factor(FullData$F)

StormData <- select(FullData,
                    BGN_DATE,
                    BGN_TIME,
                    TIME_ZONE,
                    EVTYPE,
                    LENGTH,
                    WIDTH,
                    F,
                    MAG,
                    FATALITIES,
                    INJURIES,
                    PROPDMG,
                    PROPDMGEXP,
                    CROPDMG,
                    CROPDMGEXP,
                    REFNUM
                    ) %>%
            filter(year(BGN_DATE)>=1996)

After reducing the size of the data set, there were some additional fields to calcualte. The first was converting the PROPDMG and CROPDMG variables to be the total number based on the respective EXP variable. Based on the number that will be impacted for each, we are only going to multiply anything with a K by 1,000, a M by 1,000,000, and a B by 1,000,000,000 (ie. a 10 PROPDMG with a PROPDMGEXP of K, will be converted to a TotPropDmg of 10,000). In addition to adjusting the Crop and Property damages, we added extra fields that summed up a a total HumanCost (injuries and fatalities) and total EconCost (Property and Crops)

summary(StormData$PROPDMGEXP)
##             -      ?      +      0      1      2      3      4      5 
## 276185      0      0      0      1      0      0      0      0      0 
##      6      7      8      B      h      H      K      m      M 
##      0      0      0     32      0      0 369938      0   7374
summary(StormData$CROPDMGEXP)
##             ?      0      2      B      k      K      m      M 
## 373069      0      0      0      4      0 278686      0   1771
StormData$TotPropDmg <- mutate(StormData,TotPropDmg = ifelse(
      PROPDMGEXP == "k"|PROPDMGEXP == "K",1000,ifelse(
            PROPDMGEXP == "m"|PROPDMGEXP == "M",1000000,ifelse(
                  PROPDMGEXP=="b"|(PROPDMGEXP=="B"&REFNUM!=605943),1000000000,1)))*PROPDMG)$TotPropDmg

StormData$TotCropDmg <- mutate(StormData,TotCropDmg = ifelse(
      CROPDMGEXP == "k"|CROPDMGEXP == "K",1000,ifelse(
            CROPDMGEXP == "m"|CROPDMGEXP == "M",1000000,ifelse(
                  CROPDMGEXP=="b"|CROPDMGEXP=="B",1000000000,1)))*CROPDMG)$TotCropDmg

StormData$HumanCost = mutate(StormData,HumanCost=FATALITIES+INJURIES)$HumanCost
StormData$EconCost = mutate(StormData,EconCost=TotCropDmg+TotPropDmg)$EconCost

The other additional field adjustment made was to acommodate the fact that there were many individual event types. We categorized all EVTYPEs into seven Event Types for analysis. The seven buckets we chose were: 1)Thunderstorms 2) Tornados 3) Hurricanes 4) Floods 5) Hail 6) Winter Storms 7) Other

StormData$EventType <- "Other"
StormData[grepl("Thunder|Lightn|TSTM",StormData$EVTYPE,ignore.case=T),]$EventType <- "Thunderstorm"
StormData[grepl("hail",StormData$EVTYPE,ignore.case=T),]$EventType <- "Hail"
StormData[grepl("flood|wetness",StormData$EVTYPE,ignore.case=T),]$EventType <- "Flood"
StormData[grepl("snow|ice|blizz|freez|cold|winter|frost",StormData$EVTYPE,ignore.case=T),]$EventType <- "Winter Storm"
StormData[grepl("hurrica|surge|tropica|tsunam",StormData$EVTYPE,ignore.case=T),]$EventType <- "Hurricane"
StormData[grepl("tornado|funnel",StormData$EVTYPE,ignore.case=T),]$EventType <- "Tornado"

Results

After the data was manipulated to a clean format, we started doing some exploratory plotting. The goal was to answer two primary questions:

  1. Which types of storm events are the most harmful for population health; and
  2. Which types of storm events have the greatest economic consequences.

To accomodate those two questions we summarized the data by Year and Event Type (the new bucketed version).

FinalData <- StormData %>%
      group_by(EventType) %>%
      summarize(Occurences = n(),
                HumanCost = sum(HumanCost),
                EconCost_mil = sum(EconCost)/1000000) %>%
      mutate(Pop_PerEvent = HumanCost/Occurences,
             Econ_PerEvent_mil = EconCost_mil/Occurences) %>%
      setorder(-Econ_PerEvent_mil)

Summary Information by New Event Type

SummaryOutput <- as.data.table(FinalData)
kable(SummaryOutput,digits=2)
EventType Occurences HumanCost EconCost_mil Pop_PerEvent Econ_PerEvent_mil
Hurricane 1423 2060 142769.53 1.45 100.33
Tornado 29223 22179 24900.50 0.76 0.85
Flood 76178 9750 51047.11 0.13 0.67
Other 58566 17329 31732.35 0.30 0.54
Winter Storm 43516 4305 9292.38 0.10 0.21
Hail 209247 830 17201.09 0.00 0.08
Thunderstorm 235377 10254 9577.39 0.04 0.04

Comparison of New Event Type by Economic Costs and Population Damage per Event

The data reveals that although hurricanes are the least frequent occuring event in the data, when they occur they are the most damaging from both an economic cost and population health perspective. Tornados are more frequent, and are the next most damaging on a per event basis. Thunderstorms and Hail are recorded with much more frequency, but have lower impacts on average relative to some of the other types.

qplot(Pop_PerEvent,log10(Econ_PerEvent_mil*1000000),data=FinalData,
      colour = EventType,size=Occurences,
      main = "Storm Events Population and Economic Consequences",
      xlab = "Fatalities or Injuries per Event",
      ylab = "Log of Total Economic Costs per Event ($'s)")