Weather events can impact the health of the population, and result in significant financial loss. This report analyzes data from the National Oceanic and Atmospheric Administration’s Storm Database to answer two fundamental questions:
Recording of various weather event types occurred over different timefromes. Therefore, to allow for relatively consistent comparisons across weather events, the period of analysis has been restricted to the years 1999 through 2011.
Using this dataset, and restricting the timeframe to the years 1999 to 2011, the most harmful events in terms of injuries and fatalities were: Tornados, Excessive Heat, TSTM Wind, Flood, and Lightning.
Over the same time frame, the events that resulted in the greatest economic impact with respect to property damage and crop damage were found to be: Floods, Storm Surges, Tornados, Hail, and Flash Floods.
From the National Oceanic and Atmospheric Administration’s Storm Database, we obtained data on weather related events from 1950 through 2011
The storm database is provided as a zipped *.csv file with headers. Thus, download and import is straight forward. The data is downloaded to the “data” subdirectory of the current working directory and is then extracted for further analysis.
# Load libraries used in this script, installing if necessary
if(!("pacman" %in% rownames(installed.packages()))) {
install.packages("pacman", repos = "http://cran.us.r-project.org")
}
library(pacman)
p_load(dplyr, ggplot2, ggpubr, lubridate, tidyr)
# Set up the destination location for the data file if necessary
if (!file.exists("./data")) {
dir.create("./data")
}
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL, destfile = "./data/StormData.csv.bz2", method = "curl")
# Extract the data
stormData <- read.csv("./data/StormData.csv.bz2")
After importing the data, we can check the first few rows and columns of the dataset.
head(stormData[ , 1:8])
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
The first step in data manipulation is to set the variable “EVTYPE” as a factor. This variable identifies the weather event type used in the analysis. In addition, the year in which the event began is also extracted to support the analysis.
stormData$EVTYPE <- as.factor(stormData$EVTYPE)
stormData$date <- as.Date(stormData$BGN_DATE, "%m/%d/%Y %H:%M:%S")
stormData$year <- as.numeric(format(stormData$date, "%Y"))
The database provides information on property damage and crop damage resulting from each event. However, this data is spread across multiple variables. For property damage, the variable “PROPDMG” provides a cost number and the variable “PROPDMGEXP” indicates the exponent (K = thousands, M = millions, B = billions). The same is true for crop damage based on “CROPDMG” and “CROPDMGEXP”.
New variables are created to identify the cost in dollars for property damage and crop damage.
stormData <- stormData %>%
mutate(propDmgTot = case_when(
PROPDMGEXP == "K" ~ 1000 * PROPDMG,
PROPDMGEXP == "M" ~ 1000000 * PROPDMG,
PROPDMGEXP == "B" ~ 1000000000 * PROPDMG,
TRUE ~ PROPDMG
))
stormData <- stormData %>%
mutate(cropDmgTot = case_when(
CROPDMGEXP == "K" ~ 1000 * CROPDMG,
CROPDMGEXP == "M" ~ 1000000 * CROPDMG,
CROPDMGEXP == "B" ~ 1000000000 * CROPDMG,
TRUE ~ CROPDMG
))
While the storm database contains data from 1950 to 2011, collection of data for most weather related events did not start until much later in time. The histogram below identifies how many new weather events were added to the database each year.
dataColStart <- stormData %>%
group_by(EVTYPE) %>%
summarize(startYear = min(year, na.rm = TRUE))
ggplot(dataColStart, aes(x = startYear)) +
geom_histogram(binwidth = 1) +
ggtitle("When Data Collection Started for Each Event Type") +
xlab("Year") +
ylab("Count of New Event Types") +
theme(plot.title = element_text(hjust = 0.5))
From this graph, it is apparent that data collection for the majority of weather events didn’t begin until 1993 or later. Thus, it is not appropriate to compare total costs for various weather related events when data for those events span significantly different timeframes. In order to compare data over relatively consistent timeframes, it was determined that data collection for 90% of the weather event types identified in the database began during or before the year 1999. Therefore, data will be restricted to weather events occurring during years 1999 and 2007. Further, only weather events for which collection of data began after the start of this timeframe will not be used.
dataColStart <- dataColStart %>%
filter(startYear <= 1999)
stormData <- stormData %>%
filter(year >= 1999 & EVTYPE %in% dataColStart$EVTYPE)
The provided data includes fatalities and injuries attributed to each weather event. To determine which weather events were most harmful to human health, we will consider the total number of fatalities and injuries over the time period attributed to each type of weather event. We will then identify the top five types of weather events with respect to total deaths, along with the top five types of weather events with respect to total injuries. The union of these two sets of events will be used to explore the harm to human health resulting from these events.
The following stacked bar chart identifies the top weather events in terms of impact to Human Health.
stormLoss <- stormData %>%
group_by(EVTYPE) %>%
summarize(sDeaths = sum(FATALITIES, na.rm = TRUE),
sInjuries = sum(INJURIES, na.rm = TRUE))
topFatalEvents <- stormLoss %>%
arrange(desc(sDeaths)) %>%
slice(1:5) %>%
select(EVTYPE)
topInjuryEvents <- stormLoss %>%
arrange(desc(sInjuries)) %>%
slice(1:5) %>%
select(EVTYPE)
topHHEvents <- union(topFatalEvents$EVTYPE, topInjuryEvents$EVTYPE)
HHCosts <- stormLoss %>%
filter(EVTYPE %in% topHHEvents) %>%
gather("harm", "cost", -EVTYPE) %>%
ggplot(aes(x = reorder(EVTYPE, -cost), y = cost, fill = harm)) +
geom_bar(position = "stack", stat = "identity") +
ggtitle("Weather Event Related Impact to Human Health (1999 - 2011)") +
xlab("Event Type") +
ylab("Count of Injuries and Fatalities") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45)) +
scale_fill_discrete(name = "Human Impact", labels = c("Fatalities", "Injuries"))
HHCosts
The provided data includes property damage and crop damage cost estimates attributed to each weather event. To determine which weather events had the most significant economic impacts, we will consider the total cost of property damage and crop damage over the given time period attributed to each type of weather event. We will then identify the top types of weather events with respect to total property and crop damage.
The following stacked bar chart identifies the top weather events in terms of economic impact.
stormLoss <- stormData %>%
group_by(EVTYPE) %>%
summarize(sPropDamage = sum(propDmgTot, na.rm = TRUE),
sCropDamage = sum(cropDmgTot, na.rm = TRUE))
topPropDmgEvents <- stormLoss %>%
arrange(desc(sPropDamage)) %>%
slice(1:5) %>%
select(EVTYPE)
topCropDmgEvents <- stormLoss %>%
arrange(desc(sCropDamage)) %>%
slice(1:5) %>%
select(EVTYPE)
topEconEvents <- union(topPropDmgEvents$EVTYPE, topCropDmgEvents$EVTYPE)
EconCosts <- stormLoss %>%
filter(EVTYPE %in% topEconEvents) %>%
gather("damage", "cost", -EVTYPE) %>%
ggplot(aes(x = reorder(EVTYPE, -cost), y = cost, fill = damage)) +
geom_bar(position = "stack", stat = "identity") +
ggtitle("Weather Event Related Economic Impact (1999 - 2011)") +
xlab("Event Type") +
ylab("Estimated Financial Impact (USD)") +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45)) +
scale_fill_discrete(name = "Economic Impact", labels = c("Crop", "Property"))
EconCosts
The storm data provided by NOAA was assessed to determine the human health and financial impacts due to weather events. The analysis indicated that tornados caused the greatest human health impact in terms of fatalities and injuries from 1999 to 2011. The greatest economic impact over the same period of time was the result of floods.