Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
In this report, we will study which types of events are most harmful with respect to population health, and which types of events have the greatest economic consequences.
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
storm <- read.csv('repdata-data-StormData.csv', stringsAsFactors = FALSE)
We will be interested in the year that each event happens, so we will extract the year value from BGN_DATE value.
stormX <- mutate(storm, year = as.integer(sapply(strsplit(BGN_DATE,'[/ ]'), '[[', 3)))
The event types (EVTYPE) in the data are messy and have many duplicates and overlaps and even misspells. The number of different event types in the original data is:
nrow(as.data.frame(table(storm$EVTYPE)))
## [1] 985
We will cleanup these even types by trying to map them to the 48 standard NOAA storm events defined in http://www.ncdc.noaa.gov/stormevents/pd01016005curr.pdf
stormX$EVTYPE <- gsub('.*ASTRONOMICAL LOW TIDE.*|.*BLOW-OUT TIDE.*', 'Astronomical Low Tide', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*THUNDERSTORM.*|.*TSTM WIND.*|.*TSTMW.*|.*TUNDERSTORM.*|.*THUNERSTORM.*|.*THUNDEERSTORM.*|.*THUNDERSTROM.*|.*THUNDESTORM.*|.*THUNDERTSORM.*', 'Thunderstorm Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*MICROBUST.*|.*MICROBURST.*|*.GUSTNADO.*|.*DOWNBURST.*', 'Thunderstorm Wind', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('MARINE STRONG WIND|^MarineStrong Wind$', 'Marine Strong Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*TROPICAL STORM.*|*.COASTAL STORM.*|COASTALSTORM', 'Tropical Storm', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*BLIZZARD.*', 'Blizzard', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FOG.*', 'Fog', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FLASH FLOOD.*|.*STREAM.*', 'Flash Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*DROUGHT.*|.*RECORD LOW RAINFALL.*|.*DRY.*', 'Drought', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*ICE STORM.*', 'Ice Storm', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*HEAVY LAKE SNOW.*|.*LAKE-EFFECT SNOW.*|.*LAKE EFFECT SNOW.*', 'Lake-Effect Snow', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*SNOW.*', 'Snow', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*HIGH WIND.*', 'High Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*CURRENT.*', 'Rip Current', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*SURF.*|.*HIGH TIDE.*|.*HIGH WAVE.*|*.HIGH SEAS.*|.*HIGH.*SWELLS.*|.* HEAVY SWELLS.*|.*BEACH EROSIN.*|.*BEACH EROSION.*|COASTAL EROSION', 'High Surf', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*VOLCANIC.*', 'Volcanic Ash', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*HURRICANE.*|.*TYPHOON.*', 'Hurricane', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*LIGHTNING.*|LIGHTING|LIGNTNING', 'Lightning', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*COASTAL FLOOD.*|.*BEACH FLOOD.*|.*COASTALFLOOD.*|.*COASTAL/TIDAL FLOOD.*|.*Coastl Flood.*|*.Cstl Flood.*|.*Tidal Flood.*', 'Coastal Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*LAKESHORE FLOOD.*', 'Lakeshore Flood', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*FLOOD.*|.*Lakeshore Flood.*', 'Flood', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*URBAN.*', 'Flood', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*HAIL.*', 'Hail', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*WATERSPOUT.*|WAYTERSPOUT', 'Waterspout', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*TORNADO.*|TORNDAO', 'Tornado', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*RAIN.*|.*SHOWER.*|.*WET.*', 'Heavy Rain', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('^HEAT$|.*WARM.*|.*HOT.*', 'Heat', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*HEAT.*|.*Heat Wave.*|.*Heatburst.*', 'Excessive Heat', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*Record Temperature.*|.*Temperature record.*|.*Record High.*|.*Record Warm.*|.*Record Heat.*|*.HIGH TEMPERATURE RECORD.*', 'Excessive Heat', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*EXTREME COLD.*|.*RECORD.*COLD.*|.*SEVERE COLD.*|.*UNSEASONABLY COLD.*|.*UNUSUALLY COLD.*|.*COLD WAVE.*|.*FREEZE.*|.*FREEZING.*|LOW TEMPERATURE RECORD|*.HYPOTHERMIA.*|*.RECORD LOW.*', 'Extreme Cold/Wind Chill', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*COLD.*|.*WIND CHILL.*|.*WINDCHILL.*|.*COOL.*|.*LOW TEMPERATURE.*|.*Cold Temperature.*', 'Cold/Wind Chill', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*WIND.*|.*Gusty.*|.*Strong wind.*|*.Strong Wind.*', 'Strong Wind', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*FUNNEL.*', 'Funnel Cloud', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*DUST.*', 'Dust Storm', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*WINTER STORM.*', 'Winter Storm', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*WINTER.*|.*WINTRY.*|.*Wintry.*', 'Winter Weather', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*LAND.*|.*MUD.*|.*ROCK.*', 'Debris Flow', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FIRE.*', 'Wildfire', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*FROST.*|.*ICE.*|.*ICY.*|*.Icy.*|^Frost$', 'Frost/Freeze', stormX$EVTYPE)
stormX$EVTYPE <- gsub('.*SURGE.*', 'Storm Surge/Tide', stormX$EVTYPE, ignore.case = TRUE)
stormX$EVTYPE <- gsub('.*GLAZE*', 'Freezing Fog', stormX$EVTYPE, ignore.case = TRUE)
Although there are some event types that are not mapped, they are few and hence will not affect our finding of the most damaging events.
In the earlier years in the data, only a few types of storm events are recorded. We can see in the figure below that there is a surge in the number of events recorded since 1995, not because there are more events happened from that year onward, but because more types of events are recorded since then.
qplot(stormX$year, main="Number of events each year", binwidth =1, xlab="Year", ylab="Number of events")
So to exclude the bias toward the events that were recorded earlier, in our comparison we will only focus on storm data from 1995 onward.
storm_recent <- filter(stormX, year >=1995)
Since the last year in the data is 2011, the portion of data that we analyze spans 17 years.
To calculate the damage of an event toward the population health, we will calculate the casualties of the event which combines both FATALITIES and INJURIES
storm_recent <- storm_recent %>% mutate(casualties = FATALITIES + INJURIES)
Economic of an event is calculated by combining both PROPDMG and CROPDMG (and scale them with PROPDMGEXP and CROPDMGEXP accordingly) The valid scaling factor are ‘K’ (thousand), ‘M’ (million) and ‘B’ (billion). There are some dirty data values in PROPDMGEXP and CROPDMGEXP that do not belong to the above values, which we simply ignore and do not perform any scaling for such events.
lookup= c('K', 'k', 'M', 'm', 'B')
multiplier = c(1000, 1000, 1E6, 1E6, 1E9)
storm_recent <- storm_recent %>% mutate(damages = ifelse(PROPDMGEXP %in% lookup, PROPDMG * multiplier[match(PROPDMGEXP, lookup)], PROPDMG))
storm_recent <- storm_recent %>% mutate(damages = damages + ifelse(CROPDMGEXP %in% lookup, CROPDMG * multiplier[match(CROPDMGEXP, lookup)], CROPDMG))
We can now sum up the casulaties and economic damages for each of the event type
storm_summary <- storm_recent %>% group_by(EVTYPE) %>% summarise(count = n(), casualties = sum(casualties), damages = sum(damages), fatalities = sum(FATALITIES), injuries = sum(INJURIES))
head(arrange(storm_summary, desc(casualties)), 3)
## Source: local data frame [3 x 6]
##
## EVTYPE count casualties damages fatalities injuries
## (chr) (int) (dbl) (dbl) (dbl) (dbl)
## 1 Tornado 24365 23328 25227093817 1545 21783
## 2 Excessive Heat 1918 9215 516225750 2157 7058
## 3 Flood 25098 7199 149669709785 428 6771
So Tornado top the list, which claims 1545 fatalities and 21783 injuries, followed by Excessive Heat (2157 fatalities and 7058 injuries), and Flood (428 fatalities and 6771 injuries).
head(arrange(storm_summary, desc(damages)), 3)
## Source: local data frame [3 x 6]
##
## EVTYPE count casualties damages fatalities injuries
## (chr) (int) (dbl) (dbl) (dbl) (dbl)
## 1 Flood 25098 7199 149669709785 428 6771
## 2 Hurricane 292 1465 90656027810 133 1332
## 3 Storm Surge/Tide 401 55 47835579000 13 42
So Flood caused the most damages (totaling 159 Billion USD), followed by Hurricane (90 Billion USD) and Storm Surge/Tide (47 Billion USD)