This analysis aims to identify the storm weather events in the United States which have historically caused most damage to human health and the economy. Additionally, it aims to map the extent of damage caused by these particular weather events quantitatively. The analysis is based on data from the storm data base of the U.S National Oceanic and Atmospheric Administration (NOAA), which was collected between the year 1950 and November 2011 across the entirety of the United States.
The raw data used is available through the following links:
The analysis also relies on a variety of R packages. These are listed below:
library(tidyverse)
library(ggplot2)
library(ggpubr)
library(RColorBrewer)
The data was loaded into the working directory with the following code:
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "weatherdata.csv.bz2", method = "curl")
df0 <- read.csv("weatherdata.csv.bz2", stringsAsFactors=FALSE, header=TRUE)
Data was then grouped by weather event to make inter event analysis possible. The column with weather events is first set to lowercase and spaces are removed to avoid matching problems. Next, the values for fatalities and injuries are summed for each weather event.
df.health <- df0 %>%
mutate(EVTYPE = tolower(gsub(" ", "", EVTYPE))) %>%
group_by(EVTYPE) %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
summarise_all(list(sum)) %>%
na.omit() %>%
arrange(desc(INJURIES))
head(df.health)
## # A tibble: 6 x 3
## EVTYPE FATALITIES INJURIES
## <chr> <dbl> <dbl>
## 1 tornado 5633 91346
## 2 tstmwind 504 6957
## 3 flood 470 6789
## 4 excessiveheat 1903 6525
## 5 lightning 816 5230
## 6 heat 937 2100
Exactly the same strategy is used to make inter event analysis possible in terms of economic damage. There are four variables in the data set related to economic damage, two of which are numeric (PROPDMG and CROPDMG) and two of which are character variables (PROPDMGEXP and CROPDMGEXP). The supplementary documentation states that:
“Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions" (p.12)
Accordingly, we merge the four variables to create two numeric variables, one detailing damage to property and the other damage to crops.
Finally, the resulting damage variables are summed for each event type.
df.econ <- df0 %>%
mutate(EVTYPE = tolower(gsub(" ", "", EVTYPE))) %>%
group_by(EVTYPE) %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "K",
PROPDMG *1000, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "M",
PROPDMG *1000000, PROPDMG)) %>%
mutate(PROPDMG = ifelse(PROPDMGEXP == "B",
PROPDMG *1000000000, PROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "K",
CROPDMG *1000, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "M",
CROPDMG *1000000, CROPDMG)) %>%
mutate(CROPDMG = ifelse(CROPDMGEXP == "B",
CROPDMG *1000000000, CROPDMG)) %>%
select(EVTYPE, PROPDMG, CROPDMG) %>%
summarise_all(list(sum)) %>%
mutate(TOTDMG = CROPDMG + PROPDMG) %>%
arrange(desc(PROPDMG))
head(df.econ)
## # A tibble: 6 x 4
## EVTYPE PROPDMG CROPDMG TOTDMG
## <chr> <dbl> <dbl> <dbl>
## 1 flood 144657709807 5661968450 150319678257
## 2 hurricane/typhoon 69305840000 2607872800 71913712800
## 3 tornado 56925660790. 414953270 57340614060.
## 4 stormsurge 43323536000 5000 43323541000
## 5 flashflood 16140862067. 1421317100 17562179167.
## 6 hail 15727367053. 3025537890 18752904943.
The harm caused to human health by weather events can take the form of either fatality or injury. It is difficult to create a variable which captures both, for it seems plausible that the numeric values in the two variables should be weighted differently (fatality is much graver than injury). Yet, determining such weights remains controversial. Furthermore, as figure 1 indicates there is no consistency in the strength of the relationship between fatality and injury across weather events, although there is, as expected, a positive relationship between the two.
ggplot(df.health, aes(x = log(FATALITIES), y = log(INJURIES), label = EVTYPE)) + geom_text(size = 3, nudge_x = 0.5, alpha = 1/2) + labs(title = "Figure 1: Relationship between fatalities and injuries\nper type of weather event in the US", x = "Log of Total Fatalities (between 1950 and 2011)", y = "Log of Total Injuries\n(between 1950 and 2011)", subtitle = "There is variability in the strength of the relationship between fatality and\ninjury across different types of storm events. This means some weather\nevents have a larger tendency to cause fatalities whilst others have a\nlarger tendency to generate injury. ") + theme_classic()
Accordingly, the impact of harmful weather events in terms of Human Health should be considered separately in terms of injury and fatalities. Figure 2 reflects the fatality and injury figures for the top 7 weather events in terms of health risks. The 7 events were selected by merging the top 5 list in total fatality and the top 5 list in total injury, whilst removing any duplicates.
df.top5fatalities <- df.health %>%
arrange(desc(FATALITIES)) %>%
select(EVTYPE) %>%
slice(1:5) %>%
rowid_to_column("rank2")
df.top5injuries <- df.health %>%
arrange(desc(INJURIES)) %>%
select(EVTYPE) %>%
slice(1:5) %>%
rowid_to_column("rank1")
df.top5 <- df.health %>%
filter(EVTYPE %in% unlist(df.top5fatalities) |
EVTYPE %in% unlist(df.top5injuries)) %>%
left_join(df.top5injuries) %>%
mutate(rank1 = ifelse(is.na(rank1), 11, rank1)) %>%
left_join(df.top5fatalities) %>%
mutate(rank2 = ifelse(is.na(rank2), 11, rank2)) %>%
mutate(average = (rank1 + rank2)/2) %>%
arrange(desc(average)) %>%
rowid_to_column("rank") %>%
select(rank, EVTYPE, FATALITIES, INJURIES) %>%
pivot_longer(c(FATALITIES,INJURIES))
ggplot(df.top5, aes(x = rank, y = value, fill = EVTYPE, rank)) + theme_classic() + facet_wrap(~name, scales = "free", drop = TRUE) + geom_col() + geom_bar(stat = "identity", position = "dodge") + coord_flip() + labs(title = "Figure 2: Top seven types of weather events in terms\nof total historic fatalities or injuries in the US", subtitle = "Tornadoes and excessive heat waves stand out as most \nthreatening to human life. The seven weather events were \nselected by merging the top five events with the highest \ntotal injuries with the top five events with the highest total fatalities.", y = "Total between 1950 and 2011") + theme(axis.title.y=element_blank(), axis.text.y=element_blank(), axis.ticks.y=element_blank()) + guides(fill = guide_legend(title ="Weather Event\nType")) + scale_fill_manual(values = brewer.pal(n = 7,"Pastel2"))
Figure 2 indicates that tornadoes and heat waves (heat and excessive heat) have caused the highest amount of fatalities by a landslide. Tornadoes also overshadow all other event types in terms of injuries, whilst heat waves do not tend to cause more injuries than most of the other weather events on this top seven list.
The harm caused to human health by weather events can take the form of either fatality or injury. It is difficult to create a variable which captures both, for it seems plausible that the numeric values in the two variables should be weighted differently (fatality is much graver than injury). Yet, determining such weights remains controversial. Furthermore, as figure 1 indicates there is no consistency in the strength of the relationship between fatality and injury across weather events, although there is as we expect a positive relationship between the two.
df.top10econ <- df.econ %>%
arrange(desc(TOTDMG)) %>%
slice(1:10) %>%
arrange(TOTDMG) %>%
rowid_to_column("rank")%>%
pivot_longer(c(PROPDMG, CROPDMG))
df.top10econ$EVTYPE <- reorder(df.top10econ$EVTYPE, df.top10econ$rank)
ylab = c(50, 100, 150, 200, 250)
ggplot(df.top10econ, aes(x = EVTYPE, y = value, fill = name)) + theme_classic() + geom_bar(stat = "identity", position = "stack") + coord_flip() + labs(title = "Figure 3: Top ten types of weather events in terms\nof economic damage in the US", subtitle = "Floods and hurricanes stand out as the most threatening to\neconomic prosperity. Furthermore, the top 6 weather types\nprimarily cause property damage rather than crop damage.\nThe ten weather events were selected by ranking event types\nby the sum of crop and property damage.", y = "Economic damage caused between\n1950 and 2011 (in billions of US $)", x = "Event Type") + scale_y_continuous(labels = paste0(ylab, "B"),
breaks = 10^9 * ylab) + scale_fill_manual(values = rev(brewer.pal(n = 2,"Pastel2")), name = "Type of\nDamage",labels = c("Crop Damage", "Property Damage"))
This project was submitted as part of the Reproducible Research course created by John Hopkins University. The two questions answered in this analysis and the links to the raw data were supplied by the course organisers. All aforementioned code and analysis choices are my own.