Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The present report provides an analysis of storm events in the United States since from 1950 to 2011. Data were collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The analysis is focused on the impact of storm events on population health and economy. By considering the total number of deaths for each type of event, we highlight the 5 most harmful events with respect to population health. Similarly, by considering the cumulative economic damage, we highlight the 5 most harmful events with respect to economy.
The following sections detail the data processing workflow and results.
Data processing was based on the following information :
Data processing and analysis were performed using R statistical programming software and Rstudio environment.
Besides, the following packages were used :
library(data.table)
library(R.utils)
library(lubridate)
library(ggplot2)
It is assumed that the data file named data/repdata_data_StormData.csv.bz2 is located in /data/ folder in R working directory. If it is not the case the file is downloaded. File is then extracted and read into R. Extracted file is deleted after reading to save disk space.
Because the analysis focuses on the impact of storm event on population health and economy, we only load the variables of interest which are :
data.file <- "data/repdata_data_StormData.csv.bz2"
if (!file.exists(data.file)) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",data.file)
}
bunzip2(data.file, destname = "temp.csv", remove = FALSE)
data <- fread("temp.csv", sep = ",", stringsAsFactors = FALSE,
colClasses = c("NULL","character",rep("NULL",5),"character", rep("NULL",14),
rep("numeric",3),"character","numeric","character",rep("NULL",9)))
file.remove("temp.csv")
Prior to analysis, the dataset has to be cleaned and subset, especially :
data$BGN_DATE <- mdy_hms(data$BGN_DATE)
The following subsection describe in more detail the method used to address points 2 and 3.
For more details regarding the classification of event names one can refer to the documentation or severe weather terminology.
data[grep("marine(.*)(tstm|thunderstorm)", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Marine Thunderstorm Wind"
data[grep("^( ?tstm|(severe )?thunderstorms?)((.*winds?s?)|.*\\d|.*\\)|)$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Thunderstorm Wind"
data[grep("^high winds?(| \\d+|/)$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "High Wind"
data[grep("fires?", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Wildfire"
data[grep("(extr|sev|rec|exces|unus|Unseas)(.*)cold(|/wind chill|/frost)$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Extreme Cold/Wind Chill"
data[grep("landslide", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Debris Flow"
data[grep("coastal flood", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Coastal Flood"
data[grep("^flooding$|stream|^urban|river", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Flood"
data[grep("^( ?flash|flood|local).*floods?(ing)?/?$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Flash Flood"
data[grep("^rip currents?$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Rip Current"
data[grep("^storm surge?$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Storm Surge/Tide"
data[grep("hurricane", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Hurricane (Typhoon)"
data[grep("^(heavy Surf|high surf)", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "High Surf"
data[grep("^Strong Winds?$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Strong Wind"
data[grep("warm|heat", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Heat"
data[grep("^(deep |small )?hail.*(damage|\\d+|)$", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Hail"
data[grep("frost|freeze", EVTYPE,ignore.case = TRUE)]$EVTYPE <- "Frost/Freeze"
The above re-mapping strategies covers more than 99% of the entries. The most frequent event that we do not re-map, because they do not clearly belong to a category, are : WINTER WEATHER/MIX (1104 entries), TSTM WIND/HAIL (1028 entries), SNOW (587 entries), FOG (538 entries).
Damage value and its attribute are gathered into a single variable using the function cleanDamage below (see comments for more details). The function is called when subsetting the dataset (see next step).
cleanDamage <- function(dammage,attribute) {
# returns the dammage in $ taking into account its attribute which can be :
# h = hundred, k = thousand, m = million, b = billion
# or a digit n considered as a factor 10^n
# if attribute is empty we assume a factor 1
#
# args:
# - dammage : numeric, the dammage values (here PROPDMG or CROPDMG)
# - attribute : character, the dammage attribute (here PROPDMGEXP or CROPDMGEXP)
#
# Returns: the dammage in dollar
#
# Exemple : for dammage = 2 and attribute = "k", returns the numeric value 2000.
attribute <- gsub("[hH]","2",attribute)
attribute <- gsub("[kK]","3",attribute)
attribute <- gsub("[mM]","6",attribute)
attribute <- gsub("[bB]","9",attribute)
attribute <- gsub("^$","0",attribute)
dammage*10^as.numeric(attribute)
}
Last step of the data processing is to subset only the events included in the 48 valid event types according to the specification of NWS Directive 10-1605. We also define more explicit variable names. Result is stored in the tidy dataset storm.data.
valid.events <- c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill",
"Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm",
"Excessive Heat","Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze",
"Funnel Cloud","Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf",
"High Wind","Hurricane (Typhoon)","Ice Storm","Lake-Effect Snow","Lakeshore Flood",
"Lightning","Marine Hail","Marine High Wind","Marine Strong Wind",
"Marine Thunderstorm Wind","Rip Current","Seiche","Sleet","Storm Surge/Tide",
"Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm",
"Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather")
storm.data <- data[tolower(EVTYPE) %in% tolower(valid.events),
list(event = EVTYPE,
date = BGN_DATE,
injuries = INJURIES,
deaths = FATALITIES,
damage_property = cleanDamage(PROPDMG,PROPDMGEXP),
damage_crop = cleanDamage(CROPDMG,CROPDMGEXP))]
We first investigate which types of events are most harmful with respect to population health. To this end, we choose to rank the events according to the total number of deaths (we also compute for information the total number of injuries) :
health.data <- storm.data[, list(total.injuries = sum(injuries, na.rm = TRUE),
total.deaths = sum(deaths, na.rm = TRUE)),
by = event]
setorder(health.data,-total.deaths)
And we plot the top 5 most harmful events :
# plot top 5 most harmfull in total death (also plot total injuries)
top5.health.tot <- melt(health.data[1:5,], "event", c("total.injuries","total.deaths"),
variable.name = "type", value.name = "count")
setorder(top5.health.tot,-type,-count)
g <- ggplot(top5.health.tot,aes(x = factor(event,levels=event),y = count))
g <- g + geom_bar(aes(fill = type), position = "dodge", stat="identity")
g <- g + scale_y_log10()
g <- g + labs(x = "Type of Event",y = "Count")
g <- g + ggtitle("Top 5 most hamfull events (according to total number of deaths)")
g <- g + theme(legend.title=element_blank(),
plot.title = element_text(lineheight=.8, face="bold", vjust = 2),
axis.text.x = element_text(angle = 45, hjust = 1))
g <- g + scale_fill_discrete(breaks=c("total.injuries", "total.deaths"),
labels=c("Total injuries", "Total deaths"))
g
Figure : Total number of deaths and injuries for the 5 most harmful events (accross the US between 1950 and 2011).
The graph above shows the 5 most harmful events with respect to the number of deaths. Total number of injuries is also plotted for comparison. The most harmful storm event is the Tornado, which also display by far the largest total injuries, followed by Heat. Numbers are shown below :
health.data[1:5,]
## event total.injuries total.deaths
## 1: TORNADO 91346 5633
## 2: Heat 9243 3178
## 3: Flash Flood 1800 1035
## 4: LIGHTNING 5230 816
## 5: Thunderstorm Wind 9385 705
We then investigate which types of events are most harmful with respect to economy. To this end we rank the events according to the cumulative damage (property + crop).
eco.data <- storm.data[, list(total.damage_property = sum(damage_property, na.rm = TRUE),
total.damage_crop = sum(damage_crop, na.rm = TRUE)),
by = event]
eco.data[,total.damage := total.damage_property + total.damage_crop]
And we plot the cumulative damage for the 5 most costly events :
# plot top 5 with the greatest economic consequences
eco.data$event <- as.factor(eco.data$event)
eco.data$event <- reorder(eco.data$event, -eco.data$total.damage)
top5.eco.tot <- melt(eco.data[1:5,], "event", c("total.damage_property","total.damage_crop"),
variable.name = "type", value.name = "cost")
g <- ggplot(top5.eco.tot,aes(x = event ,y = cost/1e9))
g <- g + geom_bar(aes(fill = type), stat="identity")
g <- g + labs(x = "Type of Event",y = "Total damage in Billions of Dollars")
g <- g + ggtitle("Top 5 events having the greatest economic consequences")
g <- g + theme(legend.title=element_blank(),
plot.title = element_text(lineheight=.8, face="bold", vjust = 2),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position="top")
g <- g + scale_fill_discrete(breaks=c("total.damage_property", "total.damage_crop"),
labels=c("Property damages", "Crop damages"))
g
Figure : Total economic damage for 5 most harmful events with respect to economy (accross the US between 1950 and 2011).
From the graph above, we see that the event involving the largest cumulative damage is the Hurricane, Followed by tornado and hail. Numbers are shown below :
eco.data[1:5,]
## event total.damage_property total.damage_crop
## 1: TORNADO 56947380617 414953270
## 2: Thunderstorm Wind 11132265776 1206795738
## 3: Hail 15977540013 3046887623
## 4: WINTER STORM 6688497251 26944000
## 5: Hurricane (Typhoon) 84756180010 5515292800
## total.damage
## 1: 57362333887
## 2: 12339061514
## 3: 19024427636
## 4: 6715441251
## 5: 90271472810