Synopsis

Based on storm data from the National Oceanographic and Atmospheric Administration’s (NOAA) National Weather Service (NWS), this report assesses the effects that severe weather effects have had on US population health and economy between 1996 and 2011. Out of the 66,700 storm-related casualties, excessive heat was the main cause of death, accounting for around 7,600 fatalities, whilst tornadoes have caused the highest number of injuries (over 20,000 injured). Property and crop damage amounted to a total estimate of slightly over 400 billion US dollars over the same period of time, mostly due to floods, with damage mainly affecting property.

Data Processing

Before reading and processing the data, we first load the R libraries that we’ll be using.

library(dplyr)
library(ggplot2)
library(tidyr)
library(lubridate)

The software versions that were used in this analysis, as returned by sessionInfo(), are listed in the appendices.

We download the raw storm data file, which is published online and is a bz2-compressed CSV file.

dataUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dataFile <- "StormData.csv.bz2"

if (!file.exists(dataFile)) {
  download.file(dataUrl, dataFile, mode = "wb")
}

We then read the raw data from the file into a table.

# read data from compressed data file (this may take up to a few minutes)
# (compressed data size: 48 MB, uncompressed data size: 548MB)
rawStorm <- read.csv(dataFile)

For exploratory purposes, we now display the dimensions of the data set as well as the first few rows.

dim(rawStorm)
## [1] 902297     37
head(rawStorm)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Having the column names in upper case is somewhat inconvenient, so we’ll convert them to lower case.

# convert column names to lower case for ease of use
colnames(rawStorm) <- tolower(colnames(rawStorm))

We also want to see where values are missing:

# count missing values by column
numNAs <- colSums(is.na(rawStorm))

# extract columns with missing values
numMissingValuesOnly <- numNAs[numNAs != 0]

# display number of missing values
numMissingValuesOnly
## countyendn          f   latitude latitude_e 
##     902297     843563         47         40
# show as ratio to total number of values
numMissingValuesOnly / nrow(rawStorm)
##   countyendn            f     latitude   latitude_e 
## 1.000000e+00 9.349061e-01 5.208928e-05 4.433130e-05

All the values in columns countyendn, most (93.5%) of the ones in the bgn_azi columns, and a few (around 0.005%) latitude and latitude_e values are missing.

These columns represent geographical data, which we will not be using in our analysis, and can therefore safely be ignored.

Removing events prior to 1996

Based on the Storm Events Database web page, we consider that the storm event database is highly biased towards some types of events prior to 1996, as:

  • From 1950 through 1954, only tornado events were recorded.

  • From 1955 through 1995, only tornado, thunderstorm wind and hail events were recorded.

  • From 1996 to present, 48 event types are recorded as defined in NWS Directive 10-1605.

For this reason, we choose to ignore all the events before 1996, in order to have a more representative selection of event types.

# add a year column with year of event
rawStorm$year <- year(mdy_hms(rawStorm$bgn_date))

# ratio of preserved events within the data set
ratioKeptEvents <- sum(rawStorm$year >= 1996) / nrow(rawStorm)
ratioKeptEvents
## [1] 0.7242959
# keep only data from 1996 onward for our analysis
storm <- subset(rawStorm, year >= 1996)

Our analysis will be based on 72.4% of the original data set, which is most of the data and therefore considered acceptable.

Cleaning the event types

Let us consider the number of different event types in the data set.

eventTypesLevels <- nlevels(storm$evtype)
eventTypesLevels
## [1] 985

The data set contains 985 event types, whereas it should only contain a maximum of 48, namely the ones listed in section 2.1.1 of the Storm Data Documentation, as per information on the event types of the Storm Events Database web page.

We will therefore proceed to match up the event types from the data set to the official event types, noting that this approach should really be reviewed by a subject matter expert.

# extract event types for further processing
evtypes <- storm$evtype

### general cleaning

## remove whitespace
cleanEventType <- trimws(evtypes)

## convert to lower case
cleanEventType <- tolower(cleanEventType)

### first match the most specific event types...

## astronomical low tide
cleanEventType <- gsub(".*blow-out tide.*", "astronomical low tide", cleanEventType)

## avalanche
cleanEventType <- gsub(".*(avalance|slide|landslump|landspout).*", 
                        "avalanche", cleanEventType)

## blizzard
cleanEventType <- gsub(".*blizzard.*", 
                        "blizzard", cleanEventType)

## coastal flood
cleanEventType <- gsub(
  paste(".*(astronomical high tide|beach eros|beach flood|cstl|tidal|high tides",
  "|coastal flooding|coastal surge|coastal/tidal|coastalfl).*"), 
  "coastal flood", cleanEventType)

## dust storm
cleanEventType <- gsub(".*(blowing dust|dust ?storm|saharan).*", 
                        "dust storm", cleanEventType)

## dense fog
cleanEventType <- gsub(".*fog.*", 
                        "dense fog", cleanEventType)

## dense smoke
cleanEventType <- gsub(".*smoke.*", 
                        "dense smoke", cleanEventType)

## drought
cleanEventType <- gsub(".*(dry|drought).*", 
                        "drought", cleanEventType)

## dust devil
cleanEventType <- gsub(".*dust dev.*", 
                        "dust devil", cleanEventType)

## flash flood
cleanEventType <- gsub(".*flash.?flooo?d.*", 
                        "flash flood", cleanEventType)

## flood
cleanEventType <- gsub(".*(?<!flash )flood.*", 
                        "flood", cleanEventType, perl = TRUE)
cleanEventType <- gsub("^urban.*", 
                        "flood", cleanEventType) ## -58

## ice storm
cleanEventType <- gsub(".*(glaze|ice ?storm).*", 
                        "frost/freeze", cleanEventType)

## frost/freeze
cleanEventType <- gsub(".*(freez|frost|ice(?! storm)|icy).*", 
                        "frost/freeze", cleanEventType, perl = TRUE)

## funnel cloud
cleanEventType <- gsub(".*(funnel|wall cloud).*", 
                        "funnel cloud", cleanEventType)

## hail
cleanEventType <- gsub(".*hail.*", 
                        "hail", cleanEventType)

## high surf
cleanEventType <- gsub(
  ".*(surf|swell|high seas|high water|wave|rough seas).*", 
  "high surf", cleanEventType)

## high wind
cleanEventType <- gsub("^high$", "high wind", cleanEventType)
cleanEventType <- gsub("^(high ? ?wind).*",
                        "high wind", cleanEventType)

## hurricane (typhoon)
cleanEventType <- gsub(".*(hurricane|typhoon).*", 
                        "hurricane (typhoon)", cleanEventType)

## lake-effect snow
cleanEventType <- gsub(".*(lake snow|lake effect).*", 
                        "lake-effect snow", cleanEventType)

## lightning
cleanEventType <- gsub("^(lightning|ligntning).*", 
                        "lightning", cleanEventType)

## marine strong wind
cleanEventType <- gsub("^(heavy seas).*", 
                        "marine strong wind", cleanEventType)

## marine thunderstorm wind
cleanEventType <- gsub("^(coastal ?storm).*", 
                        "marine thunderstorm wind", cleanEventType)

## rip current
cleanEventType <- gsub("^(rip current).*", 
                        "rip current", cleanEventType)

## sleet
cleanEventType <- gsub("^(sleet).*", 
                        "sleet", cleanEventType)

## storm surge/tide
cleanEventType <- gsub("^(storm surge).*", 
                        "storm surge/tide", cleanEventType)

## tornado
cleanEventType <- gsub(".*torn(ado|dao).*", 
                        "tornado", cleanEventType)

## tropical storm
cleanEventType <- gsub(".*tropical storm.*", 
                        "tropical storm", cleanEventType)

## volcanic ash
cleanEventType <- gsub(".*(vog|volcanic).*", 
                        "volcanic ash", cleanEventType)

## waterspout
cleanEventType <- gsub(".*way?ter ?spout.*", 
                        "waterspout", cleanEventType)

## wildfire
cleanEventType <- gsub(".*fire.*", 
                        "wildfire", cleanEventType)

## winter storm
cleanEventType <- gsub(".*(thundersnow|winter storm).*", 
                        "winter storm", cleanEventType)

## winter weather
cleanEventType <- gsub(".*(winter.*mix|wintery|wintry).*", 
                        "winter weather", cleanEventType)

### now match less specific event types...

## extreme cold/wind chill
cleanEventType <- gsub(
  paste(".*(wind ?chill|extreme cold|low temperature|record cool",
        "|record low|unseasonably co|unseasonal low|unseasonable cold",
        "|unusually cold).*"), 
  "extreme cold/wind chill", cleanEventType)

## cold/wind chill
cleanEventType <- gsub(".*(cold|cool|hypothermia).*", 
                        "cold/wind chill", cleanEventType)

## excessive heat
cleanEventType <- gsub(
  paste(".*(heat|high temperature|record high|record warm", 
        "|unseasonably Warm|unseasonably hot|warmth|unusually warm",
        "|very warm|warm weather|hyperthermia).*"), 
  "excessive heat", cleanEventType)

## heavy rain
cleanEventType <- gsub(
  paste("^(abnormally wet|excessive precip|excessive rain",
        "|excessive wet|extremely wet|heavy mix",
        "|heavy shower|hvy rain|prolonged rain|rain",
        "|record rain|torrential rain|unseasonably wet|unseasonal rain",
        "|wet).*"),
  "heavy rain", cleanEventType)
cleanEventType <- gsub(
  ".*(heavy precip|heavy rain|excessive rain|record precip).*", 
  "heavy rain", cleanEventType)

## heavy snow
cleanEventType <- gsub(
  paste("^(blowing snow|excessive snow|snow).*"),
  "heavy snow", cleanEventType)
cleanEventType <- gsub(".*(heavy snow|record snow).*", 
                        "heavy snow", cleanEventType)

## strong wind
cleanEventType <- gsub(
  paste(".*(gradient wind|gusty|micr?oburst|non-?thunderstorm wind|turbulence",
        "|downburst|strong wind).*"), 
  "strong wind", cleanEventType)
cleanEventType <- gsub("^(wind|wnd).*",
                        "strong wind", cleanEventType)

## thunderstorm wind
cleanEventType <- gsub(
  paste(".*(tstm|th?und?ee?r?e?s?torm|thuderstorm|thunderstrom|thundertsorm",
        "|gustnado|whirlwind|metro storm",
        "|rotating wall cloud|storm force wind",
        "|windstorm).*"), 
  "thunderstorm wind", cleanEventType)

A number of events could not be tagged (e.g. event types starting with “summary for”, or labelled “excessive” without mentioning what was in excess): these events will be labelled with the catch-all event type “other”.

officialEventTypes <- c("astronomical low tide", "avalanche", "blizzard", 
  "coastal flood", "cold/wind chill", "debris flow", "dense fog", 
  "dense smoke", "drought", "dust devil", "dust storm", "excessive heat", 
  "extreme cold/wind chill", "flash flood", "flood", "frost/freeze", 
  "funnel cloud", "freezing fog", "hail", "heat", "heavy rain", "heavy snow",
  "high surf", "high wind", "hurricane (typhoon)", "ice storm", 
  "lake-effect snow", "lakeshore flood", "lightning", "marine hail", 
  "marine high wind", "marine strong wind", "marine thunderstorm wind", 
  "rip current", "seiche", "sleet", "storm surge/tide", "strong wind", 
  "thunderstorm wind", "tornado", "tropical depression", "tropical storm", 
  "tsunami", "volcanic ash", "waterspout", "wildfire", "winter storm", 
  "winter weather")

# anything that doesn't match an official event type gets assigned the
# type "other"
eventsWithUnmatchedEventTypesVector <- !(cleanEventType %in% officialEventTypes)
cleanEventType[eventsWithUnmatchedEventTypesVector] <- "other"

# calculate some indicators on the impact of this approach on the data set
numEventsWithUnmatchedEventTypes <- sum(eventsWithUnmatchedEventTypesVector)
numEventsWithUnmatchedEventTypes
## [1] 761
ratioUnmatchedEventTypes <- sum(eventsWithUnmatchedEventTypesVector) / 
  nrow(storm)
ratioUnmatchedEventTypes
## [1] 0.001164445

Our strategy has resulted in 761 events (i.e. 0.116%) having unmatched event types, and being assigned the type other. This is a very low ratio, which is deemed satisfactory for our analysis.

# calculate number of distinct event types
numEventTypes <- length(unique(cleanEventType))
numEventTypes
## [1] 39

The clean-up operation produced a total of 39 distinct event types (including the catch-all “other” event type).

We finally add the cleaned event types as a new column in the data set.

storm$cleanEventType <- cleanEventType

Cleaning the property damage and crop damage exponents

Economic impacts of storm-related events are measured as property damage and crop damage, each of which is coded in the data set as a base value and a magnitude (B for billions, M for millions, K for thousands), as shown in this extract:

storm %>% 
  select(propdmg, propdmgexp, cropdmg, cropdmgexp) %>% 
  slice(1:10)
##    propdmg propdmgexp cropdmg cropdmgexp
## 1      380          K      38          K
## 2      100          K       0           
## 3        3          K       0           
## 4        5          K       0           
## 5        2          K       0           
## 6        0                  0           
## 7      400          K       0           
## 8       12          K       0           
## 9        8          K       0           
## 10      12          K       0

Let us check if the data set consistently applies this coding scheme by checking the values used for the orders of magnitude.

# orders of magnitude for property damage
unique(storm$propdmgexp)
## [1] K   M B 0
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# orders of magnitude for crop damage
unique(storm$cropdmgexp)
## [1] K   M B
## Levels:  ? 0 2 B k K m M

The coding scheme is mostly respected (we note that subsetting the data set to events from 1996 onwards has filtered out unclean magnitude specifiers such as -, ?, 5 and h), but we still need to investigate what the value 0 and the absence of value correspond to.

Let us show events where the property damage and crop damage base value is non-zero and the order of magnitude is either 0 or missing.

storm %>% select(propdmg, propdmgexp) %>%
  filter(propdmg != 0 & propdmgexp %in% c("", 0))
## [1] propdmg    propdmgexp
## <0 rows> (or 0-length row.names)
storm %>% select(cropdmg, cropdmgexp) %>%
  filter(cropdmg != 0 & cropdmgexp %in% c("", 0))
## [1] cropdmg    cropdmgexp
## <0 rows> (or 0-length row.names)

No events match these conditions, meaning that we can ignore the superfluous or missing magnitude specifiers.

We add two new columns to the data set, one for property damage and one for crop damage, containing the magnitudes expressed as a multiplier, defaulting to 0. We then update this multiplier when the magnitude specifier is either B, M or K.

# create new columns with multiplier, defaulting to 0
storm$propDmgMultiplier <- 0
storm$cropDmgMultiplier <- 0

# update multiplier based on magnitude
storm$propDmgMultiplier[storm$propdmgexp == "B"] <- 1e9
storm$propDmgMultiplier[storm$propdmgexp == "M"] <- 1e6
storm$propDmgMultiplier[storm$propdmgexp == "K"] <- 1e3

storm$cropDmgMultiplier[storm$cropdmgexp == "B"] <- 1e9
storm$cropDmgMultiplier[storm$cropdmgexp == "M"] <- 1e6
storm$cropDmgMultiplier[storm$cropdmgexp == "K"] <- 1e3

Finally we create two new columns containing the dollar-value amounts of damage, obtained by multiplying the base values by the multiplier, for both property damage and crop damage.

storm$propDamageDollars <- storm$propdmg * storm$propDmgMultiplier
storm$cropDamageDollars <- storm$cropdmg * storm$cropDmgMultiplier

# display first few rows to cross-check
storm %>% 
  select(propdmg, propdmgexp, propDamageDollars, 
         cropdmg, cropdmgexp, cropDamageDollars) %>% 
  slice(1:10)
##    propdmg propdmgexp propDamageDollars cropdmg cropdmgexp
## 1      380          K            380000      38          K
## 2      100          K            100000       0           
## 3        3          K              3000       0           
## 4        5          K              5000       0           
## 5        2          K              2000       0           
## 6        0                            0       0           
## 7      400          K            400000       0           
## 8       12          K             12000       0           
## 9        8          K              8000       0           
## 10      12          K             12000       0           
##    cropDamageDollars
## 1              38000
## 2                  0
## 3                  0
## 4                  0
## 5                  0
## 6                  0
## 7                  0
## 8                  0
## 9                  0
## 10                 0

Results

Analysis of the most harmful types of storm-caused events with respect to population health, across the United States

In order to have a general idea of the harm caused by storms to population health in the United States, we will first calculate the total number of fatalities, injuries, and overall casualties (i.e. injuries and fatalities) that storms have caused from 1996 to 2011.

# calculate total fatalities, injuries, and overall casualties
totalFatalities <- sum(storm$fatalities)
totalFatalities
## [1] 8732
totalInjuries <- sum(storm$injuries)
totalInjuries
## [1] 57975
totalCasualties <- sum(storm$fatalities) + sum(storm$injuries)
totalCasualties
## [1] 66707
ratioFatalities <- totalFatalities / totalCasualties
ratioFatalities
## [1] 0.1309008

During the analysed period, storm-related events were responsible for 66,707 casualties, decomposed as 57,975 injuries and 8,732 deaths (13.1% of total casualties).

We will estimate the harm caused by storms by event type, expressed as the total number of casualties.

# casualties by event type, ignoring events that have caused no casualties
casualtiesByEventType <- storm %>%
  select(cleanEventType, fatalities, injuries) %>%
  group_by(cleanEventType) %>%
  mutate(casualties = fatalities + injuries) %>%
  filter(casualties != 0) %>%
  summarise_each(funs(sum))

numEventTypeCausingCasualties <- nrow(casualtiesByEventType)
numEventTypeCausingCasualties
## [1] 32

Out of the 39 event types (including the “other” generic event type) in the cleaned data set, 32 have caused casualties.

The following graph represents the total number of casualties (fatalities and injuries) in the United States, by event type, during the studied period.

# reorder event types by number of casualties
casualtiesByEventType$cleanEventType <- factor(
  casualtiesByEventType$cleanEventType,
  levels = casualtiesByEventType$cleanEventType[order(
    casualtiesByEventType$casualties)])

# rename columns and reshape data in preparation of stacked bar plot
casualtiesByEventTypeReshaped <- casualtiesByEventType %>%
  rename(fatality = fatalities, injury = injuries) %>%
  gather("casualtyType", "numCasualties", c(fatality, injury))
# create bar plot of casualties, stacked by type of casualty
ggplot(casualtiesByEventTypeReshaped, aes(cleanEventType, numCasualties)) + 
  geom_bar(aes(fill = casualtyType), stat = "identity") +
  coord_flip() +
  scale_fill_discrete(name="Type of casualty") +
  theme(legend.position="bottom") +
  labs(title = "Casualties due to storms in the US, by type of event, from 1996 to 2011", 
       x = "Type of event", y = "Total number of casualties from 1996 to 2011")
Casualties due to storms in the US, by type of event, from 1996 to 2011

Casualties due to storms in the US, by type of event, from 1996 to 2011

The top 10 causes of casualties (fatalities and injuries) are:

top10CasualtiesByEventType <- casualtiesByEventType %>%
  arrange(desc(casualties)) %>%
  slice(1:10)
top10CasualtiesByEventType
## Source: local data frame [10 x 4]
## 
##         cleanEventType fatalities injuries casualties
##                 (fctr)      (dbl)    (dbl)      (dbl)
## 1              tornado       1511    20667      22178
## 2       excessive heat       2037     7615       9652
## 3                flood        450     6846       7296
## 4    thunderstorm wind        398     5071       5469
## 5            lightning        651     4141       4792
## 6          flash flood        887     1674       2561
## 7             wildfire         87     1458       1545
## 8         winter storm        191     1292       1483
## 9  hurricane (typhoon)        125     1328       1453
## 10           high wind        235     1083       1318
# calculate ratio of casualties due to leading cause compared to total
# casualties
ratioLeadingCauseCasualties <- top10CasualtiesByEventType$casualties[1] / 
  totalCasualties
ratioLeadingCauseCasualties
## [1] 0.3324689

With 22,178 casualties, tornadoes were the leading cause of casualties during the considered period, accounting for 33.2% of all casualties.

The top 10 causes of death (fatalities) are:

top10FatalitiesByEventType <- casualtiesByEventType %>%
  arrange(desc(fatalities)) %>%
  slice(1:10) %>%
  select(cleanEventType, fatalities)
top10FatalitiesByEventType
## Source: local data frame [10 x 2]
## 
##       cleanEventType fatalities
##               (fctr)      (dbl)
## 1     excessive heat       2037
## 2            tornado       1511
## 3        flash flood        887
## 4          lightning        651
## 5        rip current        542
## 6              flood        450
## 7  thunderstorm wind        398
## 8    cold/wind chill        396
## 9          avalanche        266
## 10         high wind        235
# calculate ratio of fatalities due to leading cause compared to total
# fatalities
ratioLeadingCauseFatalities <- top10FatalitiesByEventType$fatalities[1] / 
  totalFatalities
ratioLeadingCauseFatalities
## [1] 0.2332799

The leading cause of death, excessive heat, accounted for 2,037 fatalities (23.3% of total fatalities).

The top 10 causes of injuries are:

top10InjuriesByEventType <- casualtiesByEventType %>%
  arrange(desc(injuries)) %>%
  slice(1:10) %>%
  select(cleanEventType, injuries)
top10InjuriesByEventType
## Source: local data frame [10 x 2]
## 
##         cleanEventType injuries
##                 (fctr)    (dbl)
## 1              tornado    20667
## 2       excessive heat     7615
## 3                flood     6846
## 4    thunderstorm wind     5071
## 5            lightning     4141
## 6          flash flood     1674
## 7             wildfire     1458
## 8  hurricane (typhoon)     1328
## 9         winter storm     1292
## 10           high wind     1083
# calculate ratio of injuries due to leading cause compared to total
# injuries
ratioLeadingCauseInjuries <- top10InjuriesByEventType$injuries[1] / 
  totalInjuries
ratioLeadingCauseInjuries
## [1] 0.3564812

Most injuries were caused by tornadoes, which where responsible for 20,667 injuries (35.6% of all injuries).

Analysis of the types of storm-caused events which have had the greatest economic consequences across the United States

As we did for casualties, we will first assess the overall economic consequences of storms in the United States by calculating the total property damage, crop damage, and overall damage (i.e. property and crop) that storms have caused from 1996 to 2011.

# calculate total property damage, crop damage, and overall damage
totalPropertyDamage <- sum(storm$propDamageDollars)
totalPropertyDamage
## [1] 366767615380
totalCropDamage <- sum(storm$cropDamageDollars)
totalCropDamage
## [1] 34752728730
totalDamage <- totalPropertyDamage + totalCropDamage
totalDamage
## [1] 401520344110
# crop damage as ratio of total damage
ratioCropDamage <- totalCropDamage / totalDamage
ratioCropDamage
## [1] 0.08655285

Total damage caused by storms during the analysed period amounts to 402 billion US dollars, combining property damage (US$367B) and crop damage (US$34.8B, or 8.66% of the total damage).

We will now estimate the harm caused by storms by event type, expressed as the total number of casualties.

damageByEventType <- storm %>%
  select(cleanEventType, propDamageDollars, cropDamageDollars) %>%
  group_by(cleanEventType) %>%
  mutate(totalDamageDollars = propDamageDollars + cropDamageDollars) %>%
  filter(totalDamageDollars != 0) %>%
  summarise_each(funs(sum))

The following graph represents the total damage (property damage and crop damage) in the United States, by event type, during the studied period.

# reorder event types by number of damage amount
damageByEventType$cleanEventType <- factor(
  damageByEventType$cleanEventType,
  levels = damageByEventType$cleanEventType[order(
    damageByEventType$totalDamageDollars)])

# rename columns and reshape data in preparation of stacked bar plot
damageByEventTypeReshaped <- damageByEventType %>%
  rename(property = propDamageDollars, crop = cropDamageDollars) %>%
  gather("damageType", "damageDollars", c(property, crop))
# create bar plot of damage, stacked by type of damage
ggplot(damageByEventTypeReshaped, aes(cleanEventType, damageDollars/1e9)) + 
  geom_bar(aes(fill = damageType), stat = "identity") +
  coord_flip() +
  scale_fill_discrete(name="Scope of damage") +
  theme(legend.position="bottom") +
  labs(title = "Damage due to storms in the US, by type of event, from 1996 to 2011", 
       x = "Type of event", y = "Total damage in billions of US dollars from 1996 to 2011")
Damage due to storms in the US, by type of event, from 1996 to 2011

Damage due to storms in the US, by type of event, from 1996 to 2011

The top 10 causes of overall damage are:

top10DamageByEventType <- damageByEventType %>%
  arrange(desc(totalDamageDollars)) %>%
  select(cleanEventType, totalDamageDollars) %>%
  slice(1:10)
top10DamageByEventType
## Source: local data frame [10 x 2]
## 
##         cleanEventType totalDamageDollars
##                 (fctr)              (dbl)
## 1                flood       149566260260
## 2  hurricane (typhoon)        87068996810
## 3     storm surge/tide        47835579000
## 4              tornado        24900370720
## 5                 hail        17201091620
## 6          flash flood        16557170610
## 7              drought        14415414600
## 8    thunderstorm wind         8827476130
## 9       tropical storm         8320186550
## 10            wildfire         8162704630
# calculate ratio of damage due to leading cause compared to total
# damage
ratioLeadingCauseDamage <- top10DamageByEventType$totalDamageDollars[1] / 
  totalDamage
ratioLeadingCauseDamage
## [1] 0.3724998

With an estimated 150 billion US dollars’ worth of damage, floods were the most economically impacting events during the considered period, accounting for 37.2% of the total US dollar amount of damage.

The top 10 causes of property damage are:

top10PropertyDamageByEventType <- damageByEventType %>%
  arrange(desc(propDamageDollars)) %>%
  slice(1:10) %>%
  select(cleanEventType, propDamageDollars)
top10PropertyDamageByEventType
## Source: local data frame [10 x 2]
## 
##         cleanEventType propDamageDollars
##                 (fctr)             (dbl)
## 1                flood      144553098760
## 2  hurricane (typhoon)       81718889010
## 3     storm surge/tide       47834724000
## 4              tornado       24616945710
## 5          flash flood       15222268910
## 6                 hail       14639572920
## 7    thunderstorm wind        7875179780
## 8             wildfire        7760449500
## 9       tropical storm        7642475550
## 10           high wind        5248378360
# calculate ratio of property damage due to leading cause compared to total
# property damage
ratioLeadingCausePropertyDamage <- 
  top10PropertyDamageByEventType$propDamageDollars[1] / 
  totalPropertyDamage
ratioLeadingCausePropertyDamage
## [1] 0.3941272

The leading cause of property damage, flooding, accounted for 145 billion US dollars’ worth of property damage (39.4% of overall property damage).

The top 10 causes of crop damage are:

top10CropDamageByEventType <- damageByEventType %>%
  arrange(desc(cropDamageDollars)) %>%
  slice(1:10) %>%
  select(cleanEventType, cropDamageDollars)
top10CropDamageByEventType
## Source: local data frame [10 x 2]
## 
##         cleanEventType cropDamageDollars
##                 (fctr)             (dbl)
## 1              drought       13367581000
## 2  hurricane (typhoon)        5350107800
## 3                flood        5013161500
## 4                 hail        2561518700
## 5         frost/freeze        1384421000
## 6      cold/wind chill        1356765500
## 7          flash flood        1334901700
## 8    thunderstorm wind         952296350
## 9           heavy rain         728169800
## 10      tropical storm         677711000
# calculate ratio of crop damage due to leading cause compared to total
# crop damage
ratioLeadingCauseCropDamage <- 
  top10CropDamageByEventType$cropDamageDollars[1] / 
  totalCropDamage
ratioLeadingCauseCropDamage
## [1] 0.3846484

Drought was the main cause of crop damage, and was responsible for 13.4 billion US dollars’ worth of crop damage (38.5% of all crop damage).

Appendices

Impact of event type cleaning and period restriction

Taking into account the complete data set, without cleaning the event types or restricting our analysis to the dates when all event types were taken into account (i.e. 1996 to 2011), would have produced somewhat different results.

We illustrate this by considering the number of casualties by event type in the original (raw) data set.

# by event type
rawCasualtiesByEventType <- rawStorm %>%
  select(evtype, fatalities, injuries) %>%
  group_by(evtype) %>%
  mutate(casualties = fatalities + injuries) %>%
  summarise_each(funs(sum))

# number of events that have caused casualties
rawNumEventsCasualties <- sum(rawCasualtiesByEventType$casualties != 0)
rawNumEventsCasualties
## [1] 220

Out of the 985 uncleaned event types in the raw data set, 220 have caused casualties. We will extract the 20 events that caused the highest number of fatalities and injuries.

# extract top 20 casualty-causing event types
top20RawCasualtyEventTypes <- rawCasualtiesByEventType %>%
  arrange(desc(casualties)) %>%
  slice(1:20)
top20RawCasualtyEventTypes
## Source: local data frame [20 x 4]
## 
##                evtype fatalities injuries casualties
##                (fctr)      (dbl)    (dbl)      (dbl)
## 1             TORNADO       5633    91346      96979
## 2      EXCESSIVE HEAT       1903     6525       8428
## 3           TSTM WIND        504     6957       7461
## 4               FLOOD        470     6789       7259
## 5           LIGHTNING        816     5230       6046
## 6                HEAT        937     2100       3037
## 7         FLASH FLOOD        978     1777       2755
## 8           ICE STORM         89     1975       2064
## 9   THUNDERSTORM WIND        133     1488       1621
## 10       WINTER STORM        206     1321       1527
## 11          HIGH WIND        248     1137       1385
## 12               HAIL         15     1361       1376
## 13  HURRICANE/TYPHOON         64     1275       1339
## 14         HEAVY SNOW        127     1021       1148
## 15           WILDFIRE         75      911        986
## 16 THUNDERSTORM WINDS         64      908        972
## 17           BLIZZARD        101      805        906
## 18                FOG         62      734        796
## 19        RIP CURRENT        368      232        600
## 20   WILD/FOREST FIRE         12      545        557

Observing this top 20, we note that:

  • The thunderstorm wind event type appears under three different labels: TSTM WIND, THUNDERSTORM WIND and THUNDERSTORM WINDS, making this event type seem less significant than it actually was. Similarly, EXCESSIVE HEAT and HEAT are seen as different event types, as are WILDFIRE and WILD/FOREST FIRE.

  • The order of the causes of casualties (e.g. thunderstorm wind in third position) is different from the one found based on the cleaned data (where the events causing the third highest number of casualties are floods).

Furthermore, let us compare the ratio between casualties caused by tornadoes and the total number of casualties.

totalCasualtiesRaw <- sum(rawStorm$fatalities) + sum(rawStorm$injuries)
ratioTornadoCasualtiesRaw <- top20RawCasualtyEventTypes$casualties[1] /
  totalCasualtiesRaw
ratioTornadoCasualtiesRaw
## [1] 0.6229661
ratioTornadoCasualties <- top10CasualtiesByEventType$casualties[1] /
  totalCasualties
ratioTornadoCasualties
## [1] 0.3324689

62.3% of casualties were caused by tornadoes according to the raw data set, whereas this figure is 33.2% (about half as much) using the clean data set, thus clearly showing that the raw data set is strongly biased towards tornadoes.

In fact, if we extract the top three causes of death, tornadoes also come in at first place with the raw data set, well ahead of the second cause of death, whereas they are only the second cause of fatality in the clean data set.

# top 3 causes of fatality in raw data set
rawCasualtiesByEventType %>%
  arrange(desc(fatalities)) %>%
  slice(1:3)
## Source: local data frame [3 x 4]
## 
##           evtype fatalities injuries casualties
##           (fctr)      (dbl)    (dbl)      (dbl)
## 1        TORNADO       5633    91346      96979
## 2 EXCESSIVE HEAT       1903     6525       8428
## 3    FLASH FLOOD        978     1777       2755
# top 3 causes of fatality in clean data set
casualtiesByEventType %>%
  arrange(desc(fatalities)) %>%
  slice(1:3)
## Source: local data frame [3 x 4]
## 
##   cleanEventType fatalities injuries casualties
##           (fctr)      (dbl)    (dbl)      (dbl)
## 1 excessive heat       2037     7615       9652
## 2        tornado       1511    20667      22178
## 3    flash flood        887     1674       2561

Software versions

This report was produced using the following software versions:

sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 8 x64 (build 9200)
## 
## locale:
## [1] LC_COLLATE=French_France.1252  LC_CTYPE=French_France.1252   
## [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C                  
## [5] LC_TIME=French_France.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] lubridate_1.5.0 tidyr_0.3.1     ggplot2_2.0.0   dplyr_0.4.3    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.3      knitr_1.12.3     magrittr_1.5     munsell_0.4.2   
##  [5] colorspace_1.2-6 R6_2.1.1         stringr_1.0.0    plyr_1.8.3      
##  [9] tools_3.2.2      parallel_3.2.2   grid_3.2.2       gtable_0.1.2    
## [13] DBI_0.3.1        htmltools_0.3    yaml_2.1.13      lazyeval_0.1.10 
## [17] assertthat_0.1   digest_0.6.9     formatR_1.2.1    evaluate_0.8    
## [21] rmarkdown_0.9.2  labeling_0.3     stringi_1.0-1    scales_0.3.0