We used NOAA’s storm data from 1950 through November 2011 to answer two key questions to inform planning and preparation for severe weather events. First, across the United States, which types of events are most harmful with respect to population health? And second, across the United States, which types of events have the greatest economic consequences? To answer the first question, we summarized the total fatalities and injuries directly linked to each type of severe weather event. The resulting plot, shown below, emphasizes how greatly tornadoes affect population health, as a much higher magnitude than other types of events with about 100,000 associated deaths and injuries. We similarly analyzed the types of events that caused the most monetary damage to property and crops over the data time period. In this case, floods yielded the most damage, about $150 billion, followed by hurricanes/typhoons and tornadoes.
To conduct the analysis, we first open the necessary packages for analysis and download and read the data:
#Load dplyr and ggplot2
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
#Create a new folder if it does not exist and download the zipped data
if(!file.exists("./NOAA_Analysis")) {
dir.create("./NOAA_Analysis")
}
dataURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(dataURL, "./NOAA_Analysis/NOAA_Data")
#Read data. This may take a minute.
NOAAdata <- read.csv("./NOAA_Analysis/NOAA_Data", stringsAsFactors = FALSE, encoding = "latin1")
We will focus on a limited set of variables for analysis that are relevant to population health and economic impacts:
We select only these variables in the data for analysis.
##Create a new dataframe with the select variables
stormData <- NOAAdata %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
We also adjust the property and crop damage amounts so the monetary exponents are not a separate variable.
#Fix damage exponents
exp_map <- function(e) {
ifelse(e %in% c("K", "k"), 1e3,
ifelse(e %in% c("M", "m"), 1e6,
ifelse(e %in% c("B", "b"), 1e9, 0)))
}
stormData <- stormData %>%
mutate(
prop_dmg_total = PROPDMG * exp_map(PROPDMGEXP),
crop_dmg_total = CROPDMG * exp_map(CROPDMGEXP),
total_dmg = prop_dmg_total + crop_dmg_total
)
We look specifically at the number of fatalities and injuries, arranged by event type. We will look at the top 10 types of events that have caused the most fatalities and injuries across the data.
# Summarize fatalities and injuries
health_impact <- stormData %>%
group_by(EVTYPE) %>%
summarize(
fatalities = sum(FATALITIES, na.rm = TRUE),
injuries = sum(INJURIES, na.rm = TRUE)
) %>%
mutate(total = fatalities + injuries) %>%
arrange(desc(total))
# Top 10 events
top_health <- head(health_impact, 10)
The following chart visualizes the top 10 event types that most affect population health through injury or death, in descending order from left to right.
# Plot
ggplot(top_health, aes(x = reorder(EVTYPE, -total), y = total)) +
geom_bar(stat = "identity", fill = "darkred") +
labs(title = "Top 10 Weather Events by Population Health Impact",
x = "Event Type", y = "Total Fatalities and Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Tornadoes far exceed other types of events in the resulting deaths and injuries, directly causing about 96,979 such events over the data time period (1950-2011).
We look specifically at the total monetary impacts from property and crop damage, arranged by event type. We will look at the top 10 types of events that have caused the greatest amount of damage in dollar amounts.
economic_impact <- stormData %>%
group_by(EVTYPE) %>%
summarize(total_dmg = sum(total_dmg, na.rm = TRUE)) %>%
arrange(desc(total_dmg))
top_econ <- head(economic_impact, 10)
The following chart visualizes the top 10 event types that caused the most monetary damage to property and crops, in descending order from left to right.
ggplot(top_econ, aes(x = reorder(EVTYPE, -total_dmg), y = total_dmg / 1e9)) +
geom_bar(stat = "identity", fill = "darkgreen") +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type", y = "Damage (in billions of dollars)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Floods are clearly responsible for the most damage, resulting in approximately 150 billion U.S. dollars of property and crop damage over the data time frame.