This is a short analysis looking at which types of weather event cause the most harm to human health and the most economic damage in the United States. The analysis is based on data from the NOAA Storm Database. We focus on the effects on human health in terms of injuries and fatalities, and on economic costs in terms of damage to property and crops, during the period 2000-2011. The analysis suggests that while tornados are the most harmful weather events to human health, flash floods cause the most damage to property and hail causes the most damage to crops.
We begin by reading in the data using the readr package. We then simplify and reduce the size of the dataset by selecting only those variables we need for our analysis. These are the dates when the events occured (are when they began, to be more precise), the type of event and the consequences. Two of these relate to population health, namely fatalities and injuries, the other two relate to economic costs, namely property damage and crop damage.
library(readr)
library(tidyverse)
data <- read_csv("./data/StormData.csv.bz2")
stormData <- subset(data, select = c("BGN_DATE", "EVTYPE",
"FATALITIES", "INJURIES", "PROPDMG", "CROPDMG"))
stormData <- rename(stormData, Event.type = EVTYPE,
Fatalities = FATALITIES, Injuries = INJURIES,
Property.Damage = PROPDMG, Crop.Damage = CROPDMG)
The NOAA dataset extends back to 1950, but many types of event (especially those that are not "storm" related) were are not recorded in the early years of the dataset. More generally, as weather patterns have changed, it makes sense to focus on a more recent period. We opt for 2000-2011. To this end, we create a variable "Year" which we use to filter the dataset. We then compute the fatalities, injuries, property damage and crop damage caused by each type of weather event in each year.
library(tidyr)
stormData <- separate(stormData, BGN_DATE,
into = c("Date", "Time"), sep = " ")
stormData$Date <- as.POSIXct(stormData$Date, format = "%m/%d/%Y")
stormData$Year <- lubridate::year(stormData$Date)
library(dplyr)
stormData <- filter(stormData, Year %in% c(2000:2011))
stormData <- group_by(stormData, Event.type)
Data_sum_evtype <- summarise(stormData,
Fatalities = sum(Fatalities),
Injuries = sum(Injuries),
Property.Damage = sum(Property.Damage),
Crop.Damage = sum(Crop.Damage))
There are many types of weather event in the dataset and we need to focus on the ones that cause the most damage. To this end, we arrange the data to identify which five types of event caused, respectively, the most fatalities, injuries, property damage and crop damage. From the events identified, we pull together the ones causing health damage (i.e. fatalities and injuries) and the ones causing economic damage (i.e. damage to property and crops).
top_fatalities <- arrange(Data_sum_evtype, desc(Fatalities))[1:5,1]
top_injuries <- arrange(Data_sum_evtype, desc(Injuries))[1:5,1]
top_prop.damage <- arrange(Data_sum_evtype, desc(Property.Damage))[1:5,1]
top_crop.damage <- arrange(Data_sum_evtype, desc(Crop.Damage))[1:5,1]
top_health <- unique(rbind(top_fatalities, top_injuries))
top_economic <- unique(rbind(top_prop.damage, top_crop.damage))
Having made a pre-selection of the weather events causing the most damage, we use this to filter the original dataset and, for each event, calculate the average number of fatalities and injuries each year and the average cost in terms of damage to property and crops in millions of dollars.
Data_health <- subset(stormData, Event.type %in% top_health$Event.type)
Data_health <- group_by(Data_health, Event.type)
Data_health <- summarise(Data_health,
Fatalities = sum(Fatalities)/12,
Injuries = sum(Injuries)/12)
Data_economic <- subset(stormData, Event.type %in% top_economic$Event.type)
Data_economic <- group_by(Data_economic, Event.type)
Data_economic <- summarise(Data_economic,
Property.Damage = sum(Property.Damage)/12000,
Crop.Damage = sum(Crop.Damage)/12000)
In terms of threat to human health, the most dangerous type of weather event (based on the NOAA classification) during 2000-2011 was the tornado (see Fig.1). Tornados caused, on average, more than 1200 injuries each year and 100 deaths. Tornados caused far more injuries than any other type of weather event. However, excessive heat was not far behind in causing fatalities.
library(ggplot2)
require(gridExtra)
plot1a <- ggplot(data = Data_health) +
geom_bar(mapping = aes(x = Event.type, y = Fatalities, fill = Event.type),
stat = "identity") +
labs(title = "Average number of fatalities each year") +
xlab("") + ylab("") +
theme(title=element_text(size=7)) +
theme(axis.text.x=element_text(size=6,angle = 45)) +
theme(axis.title.y=element_text(size=10)) +
theme(legend.position = "none")
plot1b <- ggplot(data = Data_health) +
geom_bar(mapping = aes(x = Event.type, y = Injuries, fill = Event.type),
stat = "identity") +
labs(title = "Average number of injuries each year") +
xlab("") + ylab("") +
theme(title=element_text(size=7)) +
theme(axis.text.x=element_text(size=6,angle = 45)) +
theme(axis.title.y=element_text(size=10)) +
theme(legend.position = "none")
grid.arrange(plot1a, plot1b, ncol=2,
top = "Fig.1: Weather events causing most threat to health in the U.S (2000-2011)")
Turning to the economic consequences of weather events, several types of event (flash floods, thunderstorms, tornados and TSTM wind) caused similar amounts of damage to property (see Fig.2). Most damage was caused by flash floods, at slightly more than 80 million dollars on average each year. Hail caused by far the most damage to crops, at around 30 million dollars on average each year.
plot2a <- ggplot(data = Data_economic) +
geom_bar(mapping = aes(x = Event.type, y = Property.Damage,
fill = Event.type),
stat = "identity") +
labs(title = "Average property damage each year ($ millions)") +
xlab("") + ylab("") +
theme(title=element_text(size=7)) +
theme(axis.text.x=element_text(size=6,angle = 45)) +
theme(axis.title.y=element_text(size=10)) +
theme(legend.position = "none")
plot2b <- ggplot(data = Data_economic) +
geom_bar(mapping = aes(x = Event.type, y = Crop.Damage, fill = Event.type),
stat = "identity") +
labs(title = "Average crop damage each year ($ millions)") +
xlab("") + ylab("") +
theme(title=element_text(size=7)) +
theme(axis.text.x=element_text(size=6,angle = 45)) +
theme(axis.title.y=element_text(size=10)) +
theme(legend.position = "none")
grid.arrange(plot2a, plot2b, ncol=2,
top = "Fig.2: Weather events causing most economic damage in the U.S. (2000-2011)")