Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. In this analysis we explore the NOAA Storm Database and answer some basic questions about severe weather events across the United States. We categorize events according to seven different categorize and examine these events on annual and cumulative annual totals for fatalities, injuries and financial damage.
Summary of Finding
# Load Libraries
library(ggplot2)
library(plyr)
library(reshape2)
library(scales)
# Define Data Path
root <- "/Users/gabrielm/OneDrive/"
data_path <- "Documents/HW/Coursera/Data Science Specialization/5 - Reproducible Research/Projects/Project 2/"
setwd(paste0(root, data_path))
We convert necessary fields in the dataset to make analysis easier to follow. For example, we create a year column and convert property & crop damages into true values from exponential. Finally, we create event type categories according to a National Weather Service 2009 report. The 985 event types in our data can be bucketed into the following seven categories: Convection, Extreme Temperatures, Flood, Marine, Tropical Cyclones, Winter, and Other.
# Read Data
data <- read.csv(bzfile("Data/repdata-data-StormData.csv.bz2"))
# Standardize date: We ignore time zones as they won't change analysis a
# lot.
data$BGN_DATE <- as.Date(data$BGN_DATE, format = ("%m/%d/%Y %H:%M:%S"))
data$Year <- as.numeric(format(data$BGN_DATE, "%Y"))
# Exponential Conversion Rules
damage.exp <- read.csv("Data/Damage Exp Converter.csv")
damage.exp
## Exp Zeros Multiplier
## 1 0 1
## 2 - 0 1
## 3 ? 0 1
## 4 + 0 1
## 5 0 0 1
## 6 1 1 10
## 7 2 2 100
## 8 3 3 1000
## 9 4 4 10000
## 10 5 5 100000
## 11 6 6 1000000
## 12 7 7 10000000
## 13 8 8 100000000
## 14 B 9 1000000000
## 15 H 2 100
## 16 K 3 1000
## 17 M 6 1000000
# Format property damage
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data <- merge(x = data, y = damage.exp, by.x = "PROPDMGEXP", by.y = "Exp", all = TRUE)
data$PROPERTY.DAMAGE <- data$PROPDMG * data$Multiplier
data <- data[, !(names(data) %in% c("Zeros", "Multiplier"))]
# Format crop damage
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
data <- merge(x = data, y = damage.exp, by.x = "CROPDMGEXP", by.y = "Exp", all.x = TRUE)
data$CROP.DAMAGE <- data$CROPDMG * data$Multiplier
data <- data[, !(names(data) %in% c("Zeros", "Multiplier"))]
data$Total.Damage <- data$PROPERTY.DAMAGE + data$CROP.DAMAGE
# Categorize weather events
weather.events <- read.csv("Data/Weather Events.csv")
weather.events$Event.Description <- factor(weather.events$Event.Description,
levels = c("Convection", "Extreme Temperatures", "Flood", "Marine", "Tropical Cyclones",
"Winter", "Other"))
data <- merge(x = data, y = weather.events, by = "EVTYPE")
# Calculate Summary Table
summary.event.year <- aggregate(x = data[c("FATALITIES", "INJURIES", "Total.Damage")],
by = data[c("Event.Description", "Year")], FUN = sum, na.rm = T)
summary.event.year <- arrange(summary.event.year, Event.Description, Year)
summary.event.year$Type <- "Yearly"
# Calculate Cumulative Outcomes
summary.event.year.cum <- summary.event.year[c("Event.Description", "Year")]
summary.event.year.cum$FATALITIES <- ddply(summary.event.year, "Event.Description",
summarize, cumsum(FATALITIES))[, 2]
summary.event.year.cum$INJURIES <- ddply(summary.event.year, "Event.Description",
summarize, cumsum(INJURIES))[, 2]
summary.event.year.cum$Total.Damage <- ddply(summary.event.year, "Event.Description",
summarize, cumsum(Total.Damage))[, 2]
summary.event.year.cum$Type <- "Cumulative"
# Combine Annual with Cumulative Outcomes
summary <- rbind(summary.event.year, summary.event.year.cum)
# Calculate Cumulative Outcomes
summary.event.year.cum <- summary.event.year[summary.event.year$Year >= 1993,
c("Event.Description", "Year")]
summary.event.year.cum$FATALITIES <- ddply(summary.event.year[summary.event.year$Year >=
1993, ], "Event.Description", summarize, cumsum(FATALITIES))[, 2]
summary.event.year.cum$INJURIES <- ddply(summary.event.year[summary.event.year$Year >=
1993, ], "Event.Description", summarize, cumsum(INJURIES))[, 2]
summary.event.year.cum$Total.Damage <- ddply(summary.event.year[summary.event.year$Year >=
1993, ], "Event.Description", summarize, cumsum(Total.Damage))[, 2]
summary.event.year.cum$Type <- "Cumulative Since 1993"
# Combine Annual with Cumulative Outcomes
summary <- rbind(summary, summary.event.year.cum)
# Convert Summary from Long to Short Format for Graphing Purposes
summary2 <- melt(summary, id.vars = c("Event.Description", "Year", "Type"),
measure.vars = c("FATALITIES", "INJURIES", "Total.Damage"), variable.name = "DAMAGE.FACTOR",
value.name = "DAMAGE.VALUE")
We initially started the analysis by including all years in the dataset. We quickly realized convection events were the only type of events available prior to 1993. Hence, we modified our analysis to start from 1993. However, we do notice that convection events were responsible for the most amount of injuries since tracking started 60 years ago and still do.
ggplot(summary2[summary2$Type == "Yearly", ], aes(x = Year, y = DAMAGE.VALUE,
color = Event.Description, group = Event.Description)) + geom_line() + ggtitle("Outcomes by Event Type") +
scale_y_continuous(name = "") + facet_grid(. ~ DAMAGE.FACTOR) + facet_wrap(~DAMAGE.FACTOR,
scales = "free")
In the years since 1993 extreme temperatures were responsible for most fatalities with the exception of couple of years were conventions the number one cause of fatalities. There were two occurrences of floods in 1998 and 2006. 2006 was Katrina and it caused significant amount of financial damage but not much in terms of fatalities or injuries. Year 2011 experienced strong convection event where we noticed a spike in fatalities and injuries but not financial damages.
ggplot(summary2[summary2$Type == "Yearly" & summary$Year >= 1993, ], aes(x = Year,
y = DAMAGE.VALUE, color = Event.Description, group = Event.Description)) +
geom_line() + ggtitle("Outcomes by Event Type Since 1993") + scale_y_continuous(name = "") +
facet_grid(. ~ DAMAGE.FACTOR) + facet_wrap(~DAMAGE.FACTOR, scales = "free")
Ranking of event categories shows the following:
ggplot(summary2[summary2$Type == "Cumulative Since 1993", ], aes(x = Year, y = DAMAGE.VALUE,
color = Event.Description, group = Event.Description)) + geom_line() + ggtitle("Cumulative Outcomes by Event Type Since 1993") +
scale_y_continuous(name = "") + facet_grid(. ~ DAMAGE.FACTOR) + facet_wrap(~DAMAGE.FACTOR,
scales = "free")