Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The data set used comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.
This report explores which weather events are most impactful to the human population through causing both physical harm and economic damage. We look first at those types of weather events which cause the most physical harm followed by economic damage. Economic damage is the monetary damage to property and to crops.
The data is loaded from the original compressed comma separated plain text file availble for download at this address: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
The database is a collection of data sets sourced from many different preparers. Although there are guidelines for classification and submission, the interpretation and reporting of weather events differs. This report reclassifies the events into a smaller coarser set representing key weather patterns: wind, heat, cold, snow, fire, etc.
library(data.table)
library(dplyr)
library(tidyr)
library(ggplot2)
library(scales)
system.time(data <- read.table("repdata-data-StormData.csv.bz2", header=T, sep =","))
## user system elapsed
## 223.784 1.655 229.243
data <- data %>%
mutate(event_type = EVTYPE) %>%
mutate(event_type = sub(".*TORNADO.*", "TORNADO", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*HURRICANE.*", "HURRICANE", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*WIND.*", "WIND", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*SNOW.*", "SNOW", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*STORM.*", "STORM", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*FIRE.*", "FIRE", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*FLOOD.*", "FLOOD", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*COLD.*", "COLD", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*HEAT.*", "HEAT", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*SURF.*|.*SEAS.*", "SEAS", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*ICE.*", "COLD", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*ICY ROADS.*", "COLD", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*LANDSLIDE.*", "LANDSLIDE", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*FOG.*", "FOG", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*MARINE.*", "SEAS", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*CURRENT.*", "SEAS", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*FREEZING.*", "WINTER WEATHER", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*WINTER.*", "WINTER WEATHER", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*WINTRY.*", "WINTER WEATHER", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*VOLCANIC.*", "VOLCANO", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*WATER.*SPOUT.*", "WATERSPOUT", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*HAIL.*", "HAIL", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*PRECIPITATION.*", "RAIN", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*RAIN.*", "RAIN", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*MUD.*SLIDE.*", "MUDSLIDE", event_type, ignore.case = T)) %>%
mutate(event_type = sub(".*LIGHTNING.*", "LIGHTNING", event_type, ignore.case = T))
The property damage exponent field (PROPDMGEXP) is populated in different ways. In some cases it is a numerical value giving the true exponent, in some cases it is k, m, b, for thousands, millions and billions in both lower and upper case. In some cases the field value cannot be intepreted - these are ignored since the number of rows affected is small. See the enumeration of values and corresponding count below.
table(data$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
For the crop damage exponent the values are more clustered around the SI abbreviations.
table(data$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
We apply the same technique to the two fields which is to replace the abbreviations with their corresponding power of ten, change any remaining non-numerical fields to 0 and also change blank to 0. We then multiply the numerical value provided by 10^exponent.
data <- data %>%
# k is 10^3
mutate(PROPDMGEXP = sub("k", "3", PROPDMGEXP, ignore.case = T)) %>%
# m is 10^6
mutate(PROPDMGEXP = sub("m", "6", PROPDMGEXP, ignore.case = T)) %>%
# b is 10^9
mutate(PROPDMGEXP = sub("b", "9", PROPDMGEXP, ignore.case = T)) %>%
# replace remaining non-numerical values with zero
mutate(PROPDMGEXP = sub("[^0-9]", "0", PROPDMGEXP, ignore.case = T)) %>%
# replace blank with zero
mutate(PROPDMGEXP = sub("^$", "0", PROPDMGEXP, ignore.case = T)) %>%
# convert factor to numerical
mutate(PROPDMGEXP = as.numeric(as.character(PROPDMGEXP))) %>%
mutate(CROPDMGEXP = sub("k", "3", CROPDMGEXP, ignore.case = T)) %>%
mutate(CROPDMGEXP = sub("m", "6", CROPDMGEXP, ignore.case = T)) %>%
mutate(CROPDMGEXP = sub("b", "9", CROPDMGEXP, ignore.case = T)) %>%
mutate(CROPDMGEXP = sub("[^0-9]", "0", CROPDMGEXP, ignore.case = T)) %>%
mutate(CROPDMGEXP = sub("^$", "0", CROPDMGEXP, ignore.case = T)) %>%
mutate(CROPDMGEXP = as.numeric(as.character(CROPDMGEXP))) %>%
# calculate total economic cost
mutate(economic_cost = PROPDMG * 10 ^ PROPDMGEXP + CROPDMG * 10 ^ CROPDMGEXP)
# calculate summary table for physical harm, group by event type, sum the
# injuries and fatalities, filter events with low harm
# gather so that we can produce a faceted plot
harm <- data %>%
group_by(event_type) %>%
summarize(Injuries = sum(INJURIES), Fatalities = sum(FATALITIES)) %>%
filter(Fatalities > mean(Fatalities) | Injuries > mean(Injuries)) %>%
gather(injury_type, value=count, -event_type)
# plot bar chart, faceted by injury_type
ggplot(data = harm, aes(x=reorder(event_type, -count), y=count, fill=injury_type)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust = 0.5)) +
facet_grid(injury_type ~ ., scales = "free") +
xlab("Event Type") +
scale_y_continuous(name="Count", labels = comma) +
ggtitle("Number of Fatalities and Injuries by Event Type") +
labs(fill = "Injury Type")
Tornados, other strong wind events, heat and flood caused the most harm to to the human population during this time frame. Tornados causes by far the most injuries, and fatalities.
# calculate summary table for economic cost, group by event type, sum the cost
# we only wish to explore the high impact events so filter to reduce result set
cost <- data %>%
group_by(event_type) %>%
summarise(cost = sum(economic_cost)) %>%
filter(cost > mean(cost))
# plot a bar chart
ggplot(data = cost, aes(x=reorder(event_type, -cost), y=cost / 10^6)) +
geom_bar(stat="identity") +
theme(axis.text.x=element_text(angle = 90, hjust = 1, vjust = 0.5)) +
xlab("Event Type") +
scale_y_continuous(name="Cost ($m)", labels = comma) +
ggtitle("Economic Cost of Weather Events")
Flood, storm, hurricane, tornados cause the most economic damage.