The US Storm data between 1966 and 2011 was analysed to identify the types of event that have had the greatest impact across the US. Tornadoes have had the greatest impact on the population, killing 5600 and injuring 91,000. Winter storms have had the greatest economic impact causing 130 trillion dollars of property damage. The relative impact of different types of events can be seen against a logarithmic scale at the bottom of this report.
storm.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(storm.url, "StormData.csv.bz2")
storm.data <- read.table("StormData.csv.bz2", header = TRUE, sep = ",")
A function is needed to interpret the PROPDMGEXP and CROPDMGEXP fields which multiply PROPDMG and CROPDMG by varying powers of 10. damage(x, y) calculates the actual damage of \(x.10^y\) where x is the coefficient, PROPDMG, and y is the exponent, PROPDMGEXP.
damage <- function(x, y) {
exp <- ifelse(is.numeric(y),
y,
ifelse(y %in% c("h", "H"),
2,
ifelse(y %in% c("k", "K"),
3,
ifelse(y %in% c("m", "M"),
6,
ifelse(y %in% c("b", "B"),
9,
0
)
)
)
)
)
x * (10 ^ exp)
}
To quantify the effect on population health (people) we will add the number of fatalities and injuries. To quantify the cost (cost) we will add the property and crop damages.
suppressMessages(library(dplyr))
storm.summary <- storm.data %>%
group_by(EVTYPE) %>%
mutate(property = damage(PROPDMG, PROPDMGEXP),
crop = damage(CROPDMG, CROPDMGEXP)) %>%
summarize(people = sum(FATALITIES + INJURIES),
fatalities = sum(FATALITIES),
injuries = sum(INJURIES),
property = sum(property),
crop = sum(crop),
cost = sum(property + crop))
There are 985 distinct values of event type in the data. According to section 2.1.1 of the Storm Data Documentation there are only 48 permitted events. Rationalize this using a regular expression for each permitted events. This does not need to be perfect, as long as it rationalizes the results to give a unique set of values for the top 10 results. Existing strings will be retained for any items that do not match.
First, define the regex for each event.
matrix.events <- matrix(c("Avalanche", "avalanche",
"Blizzard", "blizzard",
"Coastal Flood", "coastal",
"Cold/Windhill", "cold",
"Debris Flow", "debris|flow",
"Dense Fog", "fog",
"Dense Smoke", "smoke",
"Drought", "drought",
"Dust Devil", "devil",
"Dust Storm", "dust(.*)storm",
"Excessive Heat", "excessive(.*)heat",
"Extremeold/Windhill", "extremeold",
"Flash Flood", "flash",
"Flood", "(^|urban |major )flood$",
"Frost/Freeze", "frost|freeze",
"Funnelloud", "funnelloud",
"Freezing Fog", "freezing(.*)fog",
"Hail", "hail",
"Heat", "^heat",
"Heavy Rain", "rain",
"Heavy Snow", "heavy(.*)snow",
"High Surf", "surf",
"High Wind", "high(.*)wind",
"Hurricane (Typhoon)", "hurricane|typhoon",
"Ice Storm", "ice(.*)storm",
"Lake-Effect Snow", "lake(.*)effect",
"Lakeshore Flood", "lakeshore(.*)flood",
"Lightning", "lightning",
"Marine Hail", "marine(.*)hail",
"Marine High Wind", "marine(.*)high(.*)wind",
"Marine Strong Wind", "marine(.*)strong(.*)wind",
"Marine Thunderstorm Wind", "marine(.*)thunderstorm(.*)wind",
"Rip Current", "rip(.*)current",
"Seiche", "seiche",
"Sleet", "sleet",
"Storm Surge/Tide", "surge|tide",
"Strong Wind", "strong(.*)wind",
"Thunderstorm Wind", "thunderstorm|tstm",
"Tornado", "tornado",
"Tropical Depression", "depression",
"Tropical Storm", "tropical(.*)storm",
"Tsunami", "tsunami",
"Volcanic Ash", "volcanic|ash",
"Waterspout", "waterspout",
"Wildfire", "wild(.*)fire",
"Winter Storm", "winter(.*)storm",
"Winter Weather", "winter(.*)weather",
"Other","(.*)"),
ncol = 2,
byrow = TRUE)
permitted.events <- data.frame(matrix.events)
names(permitted.events) <- c("name", "regex")
The rationalize.event() function will loop through these values to find a match. This can then be applied to all 985 events, and the results summarized.
event.count <- nrow(permitted.events)
rationalize.event <- function(x){
for(i in 1:event.count){
ifelse(grepl(permitted.events$regex[i], x, ignore.case = TRUE),
return(permitted.events$name[i]),
NA)
}
}
storm.summary$event <- factor(sapply(storm.summary$EVTYPE, rationalize.event))
storm.summary <- storm.summary %>%
group_by(event) %>%
summarize(people = sum(people),
fatalities = sum(fatalities),
injuries = sum(injuries),
property = sum(property),
crop = sum(crop),
cost = sum(cost))
The top 10 event types with the greatest impact on the population are shown in the table below. This is calculated as the sum of fatalities and injuries.
library(knitr)
storm.bypeople <- arrange(storm.summary, desc(people))
kable(storm.bypeople[1:10,1:4],
digits = 0,
caption = "Events types with the greatest impact on the population")
| event | people | fatalities | injuries |
|---|---|---|---|
| Tornado | 97043 | 5636 | 91407 |
| Thunderstorm Wind | 10135 | 714 | 9421 |
| Excessive Heat | 8445 | 1920 | 6525 |
| Flood | 7259 | 470 | 6789 |
| Lightning | 6049 | 817 | 5232 |
| Heat | 3593 | 1114 | 2479 |
| Flash Flood | 2837 | 1035 | 1802 |
| Ice Storm | 2079 | 89 | 1990 |
| High Wind | 1815 | 297 | 1518 |
| Wildfire | 1696 | 90 | 1606 |
The top 10 event types with the greatest economic impact are shown in the table below. This is calculated as the sum of property and crop damage.
options(scipen = 10)
storm.bycost <- arrange(storm.summary, desc(cost))
kable(storm.bycost[1:10,c(1, 7, 5, 6)],
digits = 0,
caption = "Event types with the greatest economic impact")
| event | cost | property | crop |
|---|---|---|---|
| Winter Storm | 132720591001979 | 132720590500000 | 501979 |
| Tropical Storm | 49049686349 | 49033180400 | 16505949 |
| Hurricane (Typhoon) | 22850914613 | 22684579475 | 166335138 |
| Storm Surge/Tide | 19393818716 | 19393817861 | 855 |
| Flood | 13399068900 | 13398899938 | 168961 |
| Tornado | 3215833901 | 3215732875 | 101026 |
| Heavy Rain | 3022934190 | 2898258754 | 124675436 |
| Cold/Windhill | 2707270350 | 611163461 | 2096106889 |
| Hail | 1859219683 | 1851522544 | 7697138 |
| Frost/Freeze | 1196875975 | 1705992 | 1195169984 |
To identify the types of events which the US should prioritize resources for, the plot below combines the top 10 event types in terms of population and economic impact. The scales are relative and use a logarithmic scale so that the differences between smaller values can be seen.
library(tidyr)
tenth.people <- storm.bypeople[[10, "people"]]
tenth.cost <- storm.bycost[[10, "cost"]]
max.people <- max(storm.summary$people)
max.cost <- max(storm.summary$cost)
storm.plot <- storm.summary %>%
filter(people >= tenth.people | cost >= tenth.cost) %>%
transmute(event, people, cost) %>%
mutate(people = log10(people/max.people),
cost = log10(cost/max.cost)) %>%
arrange(people) %>%
gather(type, damage, -event)
Order the levels of the event factor so that events are plotted in descending levels of population impact. Also update the levels of the type factor so that the facet titles are meaningful.
storm.plot$event <- with(storm.plot, factor(event, unique(event)))
levels(storm.plot$type) <- c("Population", "Economic")
Use a dot plot rather than a bar chart, and remove the scale to emphasise that this is a relative comparison so all values are visible on the same scale. The financial impact of Winter Storms is actually 2000 times greater than Tropical Storm, the next most expensive event type.
library(ggplot2)
g <- ggplot(storm.plot, aes(x = event, colour = type)) +
geom_point(aes(y = damage), stat = "identity") +
coord_flip(ylim = c(-6, 0.1)) +
facet_wrap(~ type, ncol = 2) +
scale_colour_brewer(type = "qual", palette = 6) +
theme(legend.position = "none",
axis.text.x = element_blank(),
axis.ticks.x = element_blank()) +
labs(title = "Relative impact of weather events across the US (logarithmic)",
x = NULL,
y = NULL)
g