Each year weather events in the US cause hundreds of deaths, thousands of injuries and billions of dollars in property and crop damage. This study summarises the events that cause most fatalities and injuries, and cause the most crop and property damage.
This study shows that tornadoes have the highest human cost and the third highest financial damage. Floods are the most damaging weather event from a financial perspective and have the third highest human cost. These results can assist policy makers focusing policy and regulations to reduce the impact of such events.
This report answers two questions:
The data for this report come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size: Storm Data [47Mb]
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Due to memory restrictions, only those variable required for further analysis have been retained, i.e. EVTYPE
, MAG
, FATALITIES
, INJURIES
, PROPDMG
, PROPDMGEXP
, CROPDMG
.
if (!file.exists("repdata-data-StormData.csv.bz2")) {
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile="repdata-data-StormData.csv.bz2")
}
stormdata <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), stringsAsFactors=F)
stormdata <- stormdata[,c(8,23:28)]
str(stormdata)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
The ENVTYPE
variable shows a high amount of variation with 985 unique values over 902,297 observations. The variety of values has been reduced by grouping them into broad categories.
#stormdata$EVTYPE_spare <- stormdata$EVTYPE
#stormdata$EVTYPE <- stormdata$EVTYPE_spare
stormdata$EVTYPE <- tolower(stormdata$EVTYPE) #All lowercase
stormdata$EVTYPE[grep("blizzard", stormdata$EVTYPE)] <- "Blizzard"
stormdata$EVTYPE[grep("tropical", stormdata$EVTYPE)] <- "Hurricane"
stormdata$EVTYPE[grep("hurricane", stormdata$EVTYPE)] <- "Hurricane"
stormdata$EVTYPE[grep("typhoon", stormdata$EVTYPE)] <- "Hurricane"
stormdata$EVTYPE[grep("[tor]*[gust]*n[ado]", stormdata$EVTYPE)] <- "Sharknado"
stormdata$EVTYPE[grep("fr", stormdata$EVTYPE)] <- "Cold"
stormdata$EVTYPE[grep("hail", stormdata$EVTYPE)] <- "Hail"
stormdata$EVTYPE[grep("snow", stormdata$EVTYPE)] <- "Snow"
stormdata$EVTYPE[grep("shower", stormdata$EVTYPE)] <- "Rain"
stormdata$EVTYPE[grep("precip", stormdata$EVTYPE)] <- "Rain"
stormdata$EVTYPE[grep("wetness", stormdata$EVTYPE)] <- "Rain"
stormdata$EVTYPE[grep("rain", stormdata$EVTYPE)] <- "Rain"
stormdata$EVTYPE[grep("drou", stormdata$EVTYPE)] <- "Drought"
stormdata$EVTYPE[grep("fire", stormdata$EVTYPE)] <- "Bushfire"
stormdata$EVTYPE[grep("heat", stormdata$EVTYPE)] <- "Heat"
stormdata$EVTYPE[grep("tsunami", stormdata$EVTYPE)] <- "Coastal flood"
stormdata$EVTYPE[grep("coastal", stormdata$EVTYPE)] <- "Coastal flood"
stormdata$EVTYPE[grep("tide", stormdata$EVTYPE)] <- "Coastal flood"
stormdata$EVTYPE[grep("lig[h]*[n]*t", stormdata$EVTYPE)] <- "Lightning"
stormdata$EVTYPE[grep("t[hunder]*st[or]*m", stormdata$EVTYPE)] <- "Thunderstorm"
stormdata$EVTYPE[grep("high water", stormdata$EVTYPE)] <- "Flood"
stormdata$EVTYPE[grep("[fF]l[o]*d", stormdata$EVTYPE)] <- "Flood"
stormdata$EVTYPE[grep("rising water", stormdata$EVTYPE)] <- "Flood"
stormdata$EVTYPE[grep("drowning", stormdata$EVTYPE)] <- "Flood"
stormdata$EVTYPE[grep("storm", stormdata$EVTYPE)] <- "Storm"
stormdata$EVTYPE[grep("wind", stormdata$EVTYPE)] <- "Storm"
stormdata$EVTYPE[grep("microburst", stormdata$EVTYPE)] <- "Storm"
stormdata$EVTYPE[grep("slide", stormdata$EVTYPE)] <- "Erosion"
stormdata$EVTYPE[grep("erosion", stormdata$EVTYPE)] <- "Erosion"
stormdata$EVTYPE[grep("cold", stormdata$EVTYPE)] <- "Cold"
stormdata$EVTYPE[grep("wint[e]*r", stormdata$EVTYPE)] <- "Cold"
stormdata$EVTYPE[grep("hyperthermia", stormdata$EVTYPE)] <- "Cold"
stormdata$EVTYPE[grep("low temp", stormdata$EVTYPE)] <- "Cold"
stormdata$EVTYPE[grep(" *waterspout", stormdata$EVTYPE)] <- "Waterspout"
stormdata$EVTYPE[grep("surf", stormdata$EVTYPE)] <- "Waves"
stormdata$EVTYPE[grep("swell", stormdata$EVTYPE)] <- "Waves"
stormdata$EVTYPE[grep("seas", stormdata$EVTYPE)] <- "Waves"
stormdata$EVTYPE[grep("wave", stormdata$EVTYPE)] <- "Waves"
stormdata$EVTYPE[grep("fog", stormdata$EVTYPE)] <- "Fog"
stormdata$EVTYPE[grep("ic[e]*[y]*", stormdata$EVTYPE)] <- "Ice"
stormdata$EVTYPE[grep("glaze", stormdata$EVTYPE)] <- "Ice"
stormdata$EVTYPE[grep("sleet", stormdata$EVTYPE)] <- "Ice"
stormdata$EVTYPE[grep("rip current", stormdata$EVTYPE)] <- "Rip Current"
stormdata$EVTYPE[grep("dust", stormdata$EVTYPE)] <- "Dust"
stormdata$EVTYPE[grep("avalan", stormdata$EVTYPE)] <- "Avalanche"
stormdata$EVTYPE[grep("other", stormdata$EVTYPE)] <- "Other"
stormdata$EVTYPE[grep("apache", stormdata$EVTYPE)] <- "Other"
stormdata$EVTYPE[grep("\\?", stormdata$EVTYPE)] <- "Other"
stormdata$EVTYPE[grep("marine", stormdata$EVTYPE)] <- "Other"
After this transformation the ENVTYPE
variable has been reduced to 153 unique values.
Two variables are available to calculate the storm damage for crops and property. The CROPDMG
and PROPDMG
provide the amount of damage with the CROPDMEXP
and the PROPDMEXP
providing the metric prefix, e.g. ‘k’ for thousands and ‘m’ for millions and so on. The metric prefixes are recoded to numeric values to enable determining the total damage per event.
library(car) #using the recode function
recodes <- "''=0; '-'=0; '?'=0; '+'=0; '0'=0; '1'=10; '2'=100; 'h'=100; 'H'=100; '3'=10^3; 'k'=10^3; 'K'=10^3; '4'=10^4; '5'=10^5; '6'=10^6; 'm'=10^6;'M'=10^6; '7'=10^7; '8'=10^8; 'B'=10^9"
stormdata$pexp <- recode(stormdata$PROPDMGEXP, recodes, as.factor.result = FALSE)
stormdata$cexp <- recode(stormdata$CROPDMGEXP, recodes, as.factor.result = FALSE)
The human cost of extreme weather events is categorised in fatalities and injuries per event type. Only those events with at least one fatality are reported.
fatalities <- tapply(stormdata$FATALITIES, stormdata$EVTYPE, sum)
injuries <- tapply(stormdata$INJURIES, stormdata$EVTYPE, sum)
humancost <- data.frame(types=rownames(fatalities), fatalities, injuries)
humancost <- humancost[humancost$fatalities>0,]
library(reshape, quietly=T)
humancost_melt <- melt(humancost, id="types")
names(humancost_melt) <- c("event", "consequence", "number")
The human cost if visualised in figure 1. The number of fatalities and injuries are plotted on a logarithmic scale.
library(ggplot2, quietly=T)
ggplot(humancost_melt, aes(x=reorder(event, number), weight=number)) + geom_bar() +
facet_wrap(~consequence) + coord_flip() + scale_y_log10() +
labs(x="", y="Number of people (log)", title="Fig. 1: Human cost per event type.")
The total financial damage is reported for those events with a combined damage of at least 1 billion dollars.
stormdata$propdamage <- stormdata$PROPDMG*stormdata$pexp
stormdata$cropdamage <- stormdata$CROPDMG*stormdata$cexp
propdamage <- tapply(stormdata$propdamage, stormdata$EVTYPE, sum)
cropdamage <- tapply(stormdata$cropdamage, stormdata$EVTYPE, sum)
damage <- data.frame(type=rownames(cropdamage), property=propdamage, crops=cropdamage)
damage$total <- damage$property+damage$crops
damage <- damage[order(damage$total, decreasing = T),]
damage_melt <- melt(subset(damage[c("type", "property", "crops")], damage$total>=10^9), id="type")
damage_melt <- damage_melt[damage_melt$value>0,] #Remove zero damage events
names(damage_melt) <- c("event", "damage", "amount")
The amount of damage per event type and for each of the damage types is visualised in figure 2.
ggplot(damage_melt, aes(x=reorder(event, amount), fill=damage, weight=amount/10^9)) +
geom_bar() + coord_flip() +
labs(x="", y="Amount [billion $]", title="Fig. 2: Total weather damage per event type.")
Tornadoes have the highest human cost and the third highest financial damage. Floods are the most damaging weather event from a financial perspective and have the third highest human cost.
.