Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

In this report, we will study which types of events are most harmful with respect to population health, and which types of events have the greatest economic consequences.

Data Processing

Load in the packages that we will use

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

Load the data in

storm <- read.csv('repdata-data-StormData.csv', stringsAsFactors = FALSE)

We will be interested in the year that each event happens, so we will extract the year value from BGN_DATE value.

stormX <- mutate(storm, year = as.integer(sapply(strsplit(BGN_DATE,'[/ ]'), '[[', 3)))

Clean up data

The event types (EVTYPE) in the data are messy and have many duplicates and overlaps and even misspells. The number of different event types in the original data is:

nrow(as.data.frame(table(storm$EVTYPE)))
## [1] 985

We will cleanup these even types by trying to map them to the 48 standard NOAA storm events defined in http://www.ncdc.noaa.gov/stormevents/pd01016005curr.pdf

stormX$EVTYPE <- gsub('.*ASTRONOMICAL LOW TIDE.*|.*BLOW-OUT TIDE.*', 'Astronomical Low Tide', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*THUNDERSTORM.*|.*TSTM WIND.*|.*TSTMW.*|.*TUNDERSTORM.*|.*THUNERSTORM.*|.*THUNDEERSTORM.*|.*THUNDERSTROM.*|.*THUNDESTORM.*|.*THUNDERTSORM.*', 'Thunderstorm Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*MICROBUST.*|.*MICROBURST.*|*.GUSTNADO.*|.*DOWNBURST.*', 'Thunderstorm Wind', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('MARINE STRONG WIND|^MarineStrong Wind$', 'Marine Strong Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*TROPICAL STORM.*|*.COASTAL STORM.*|COASTALSTORM', 'Tropical Storm', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*BLIZZARD.*', 'Blizzard', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FOG.*', 'Fog', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FLASH FLOOD.*|.*STREAM.*', 'Flash Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*DROUGHT.*|.*RECORD LOW RAINFALL.*|.*DRY.*', 'Drought', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*ICE STORM.*', 'Ice Storm', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*HEAVY LAKE SNOW.*|.*LAKE-EFFECT SNOW.*|.*LAKE EFFECT SNOW.*', 'Lake-Effect Snow', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*SNOW.*', 'Snow', stormX$EVTYPE, ignore.case =  TRUE)
stormX$EVTYPE <- gsub('.*HIGH WIND.*', 'High Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*CURRENT.*', 'Rip Current', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*SURF.*|.*HIGH TIDE.*|.*HIGH WAVE.*|*.HIGH SEAS.*|.*HIGH.*SWELLS.*|.* HEAVY SWELLS.*|.*BEACH EROSIN.*|.*BEACH EROSION.*|COASTAL EROSION', 'High Surf', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*VOLCANIC.*', 'Volcanic Ash', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*HURRICANE.*|.*TYPHOON.*', 'Hurricane', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*LIGHTNING.*|LIGHTING|LIGNTNING', 'Lightning', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*COASTAL FLOOD.*|.*BEACH FLOOD.*|.*COASTALFLOOD.*|.*COASTAL/TIDAL FLOOD.*|.*Coastl Flood.*|*.Cstl Flood.*|.*Tidal Flood.*', 'Coastal Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*LAKESHORE FLOOD.*', 'Lakeshore Flood', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*FLOOD.*|.*Lakeshore Flood.*', 'Flood', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*URBAN.*', 'Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*HAIL.*', 'Hail', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*WATERSPOUT.*|WAYTERSPOUT', 'Waterspout', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*TORNADO.*|TORNDAO', 'Tornado', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*RAIN.*|.*SHOWER.*|.*WET.*', 'Heavy Rain', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('^HEAT$|.*WARM.*|.*HOT.*', 'Heat', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*HEAT.*|.*Heat Wave.*|.*Heatburst.*', 'Excessive Heat', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*Record Temperature.*|.*Temperature record.*|.*Record High.*|.*Record Warm.*|.*Record Heat.*|*.HIGH TEMPERATURE RECORD.*', 'Excessive Heat', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*EXTREME COLD.*|.*RECORD.*COLD.*|.*SEVERE COLD.*|.*UNSEASONABLY COLD.*|.*UNUSUALLY COLD.*|.*COLD WAVE.*|.*FREEZE.*|.*FREEZING.*|LOW TEMPERATURE RECORD|*.HYPOTHERMIA.*|*.RECORD LOW.*', 'Extreme Cold/Wind Chill', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*COLD.*|.*WIND CHILL.*|.*WINDCHILL.*|.*COOL.*|.*LOW TEMPERATURE.*|.*Cold Temperature.*', 'Cold/Wind Chill', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*WIND.*|.*Gusty.*|.*Strong wind.*|*.Strong Wind.*', 'Strong Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*FUNNEL.*', 'Funnel Cloud', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*DUST.*', 'Dust Storm', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*WINTER STORM.*', 'Winter Storm', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*WINTER.*|.*WINTRY.*|.*Wintry.*', 'Winter Weather', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*LAND.*|.*MUD.*|.*ROCK.*', 'Debris Flow', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FIRE.*', 'Wildfire', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FROST.*|.*ICE.*|.*ICY.*|*.Icy.*|^Frost$', 'Frost/Freeze', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*SURGE.*', 'Storm Surge/Tide', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*GLAZE*', 'Freezing Fog', stormX$EVTYPE, ignore.case = TRUE)

Although there are some event types that are not mapped, they are few and hence will not affect our finding of the most damaging events.

Selecting data

In the earlier years in the data, only a few types of storm events are recorded. We can see in the figure below that there is a surge in the number of events recorded since 1995, not because there are more events happened from that year onward, but because more types of events are recorded since then.

qplot(stormX$year, main="Number of events each year", binwidth =1, xlab="Year", ylab="Number of events")

So to exclude the bias toward the events that were recorded earlier, in our comparison we will only focus on storm data from 1995 onward.

storm_recent <- filter(stormX, year >=1995)

Since the last year in the data is 2011, the portion of data that we analyze spans 17 years.

Preprocessing

To calculate the damage of an event toward the population health, we will calculate the casualties of the event which combines both FATALITIES and INJURIES

storm_recent <- storm_recent %>% mutate(casualties = FATALITIES + INJURIES)

Economic of an event is calculated by combining both PROPDMG and CROPDMG (and scale them with PROPDMGEXP and CROPDMGEXP accordingly) The valid scaling factor are ‘K’ (thousand), ‘M’ (million) and ‘B’ (billion). There are some dirty data values in PROPDMGEXP and CROPDMGEXP that do not belong to the above values, which we simply ignore and do not perform any scaling for such events.

lookup= c('K', 'k', 'M', 'm', 'B')
multiplier = c(1000, 1000, 1E6, 1E6, 1E9)
storm_recent <- storm_recent %>% mutate(damages = ifelse(PROPDMGEXP %in% lookup, PROPDMG * multiplier[match(PROPDMGEXP, lookup)], PROPDMG))
storm_recent <- storm_recent %>% mutate(damages = damages + ifelse(CROPDMGEXP %in% lookup, CROPDMG * multiplier[match(CROPDMGEXP, lookup)], CROPDMG))

Result

We can now sum up the casulaties and economic damages for each of the event type

storm_summary <- storm_recent %>% group_by(EVTYPE) %>% summarise(count = n(), casualties = sum(casualties), damages = sum(damages), fatalities = sum(FATALITIES), injuries = sum(INJURIES))

The types of events that are most harmful to population health

head(arrange(storm_summary, desc(casualties)), 3)
## Source: local data frame [3 x 6]
## 
##           EVTYPE count casualties      damages fatalities injuries
##            (chr) (int)      (dbl)        (dbl)      (dbl)    (dbl)
## 1        Tornado 24365      23328  25227093817       1545    21783
## 2 Excessive Heat  1918       9215    516225750       2157     7058
## 3          Flood 25098       7199 149669709785        428     6771

So Tornado top the list, which claims 1545 fatalities and 21783 injuries, followed by Excessive Heat (2157 fatalities and 7058 injuries), and Flood (428 fatalities and 6771 injuries).

The types of events that have the greatest economoic sequences

head(arrange(storm_summary, desc(damages)), 3)
## Source: local data frame [3 x 6]
## 
##             EVTYPE count casualties      damages fatalities injuries
##              (chr) (int)      (dbl)        (dbl)      (dbl)    (dbl)
## 1            Flood 25098       7199 149669709785        428     6771
## 2        Hurricane   292       1465  90656027810        133     1332
## 3 Storm Surge/Tide   401         55  47835579000         13       42

So Flood caused the most damages (totaling 159 Billion USD), followed by Hurricane (90 Billion USD) and Storm Surge/Tide (47 Billion USD)