The purpose of this analysis is to examine the Storm Data dataset provided by the U.S. National Oceanic and Atmospheric Administration (NOAA), which captures weather events from 1950 to 2011, to answer the following questions: 1.) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health; and 2) Across the United States, which types of events have the greatest economic consequences?
This analysis discovers through the Storm Data dataset that tornadoes are by far the most impactful weather events to the US human population in terms of fatalities and injuries. For economic impact, floods and drought are the most significant weather events.
The data for this analysis comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation can be downloaded here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
National Climatic Data Center Storm Events FAQ can be downloaded here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Consequently, when analyzing this dataset, any observations prior to 1966 have been discarded due to the unreliability and sparcity of the data collected. Addtionally, since the focus of this analysis is on the human and economic impact of weather events, observations are limited to those weather events that had associated fatalities/injuries or property/crop damages
The following variables will be analyzed from the Storm Data dataset:
R Packages needed for this analysis:
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(ggplot2))
#disable scientific notation
options(scipen=999)
StormData = "Storm-Dataset.csv.bz2"
if ( ! file.exists(StormData)) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", StormData, quiet = TRUE)
print("Downloaded file to working directory")
}
setInternet2(use = TRUE)
data <- read.csv(bzfile(StormData))
# convert BGN_DATE to a date
data$BGN_DATE <- mdy_hms(data$BGN_DATE)
#change EVTYP to Uppercase
data$EVTYPE <- toupper(data$EVTYPE)
One of the discoveries in the storm dataset was that the EVTYPE variable for the weather event did not always map to the list of 48 valid, documented weather types. To properly and consistently categorize the weather events captured, a mapping to the correct weather event was created. EVTYPE.CLEAN conforms to one of the 48 valid weather events.
# Read in Event Types
evtype.mapping <- read.csv("EVTYPE_Mapping.csv")
head(evtype.mapping, 10)
## EVTYPE EVTYPE.CLEAN
## 1 TORNADO TORNADO
## 2 EXCESSIVE HEAT EXCESSIVE HEAT
## 3 FLASH FLOOD FLASH FLOOD
## 4 HEAT HEAT
## 5 LIGHTNING LIGHTNING
## 6 TSTM WIND THUNDERSTORM WIND
## 7 FLOOD FLOOD
## 8 RIP CURRENT RIP CURRENT
## 9 HIGH WIND HIGH WIND
## 10 AVALANCHE AVALANCHE
Convert H, K, M, and B Exponent values to compute the property and crop damage values using the PROPDMGEXP and CROPDMGEXP variables:
exp.multiplier.df <- data.frame(exp=c("H", "K", "M", "B"), multiplier=c(100, 1000, 1000000, 1000000000))
head(exp.multiplier.df)
## exp multiplier
## 1 H 100
## 2 K 1000
## 3 M 1000000
## 4 B 1000000000
Begin processing the Storm Data dataset by limiting the recorded observations to those that had fatalities, injuries, property damage, or crop damage. Additionally, observations are limited to the year 1966 forward. All observations not meeting these criteria are discarded for this analysis.
Additionally, this step converts the EVTYPE to one of the valid 48 documented event types.
stormdata <- data %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
filter(BGN_DATE >= '1966-01-01') %>%
left_join(evtype.mapping, by = c("EVTYPE" = "EVTYPE")) %>%
transmute(event.type = EVTYPE.CLEAN, # convert to one of the 48 valid event types
fatalities = ifelse(is.na(FATALITIES), 0, FATALITIES),
injuries = ifelse(is.na(INJURIES), 0, INJURIES),
property.damage = PROPDMG,
property.damage.exp = toupper(PROPDMGEXP), # convert lowercase exponents to uppercase
crop.damage = CROPDMG,
crop.damage.exp = toupper(CROPDMGEXP)) # convert lowercase exponents to uppercase
Finally, calculate the property and crop damage to create the data frame for the analysis:
stormdata <-
stormdata %>%
left_join(exp.multiplier.df, by=c("property.damage.exp" = "exp")) %>%
left_join(exp.multiplier.df, by=c("crop.damage.exp" = "exp")) %>%
mutate(property.damage.value = ifelse(is.na(property.damage), 0, property.damage) *
ifelse(is.na(multiplier.x), 0, multiplier.x),
crop.damage.value = ifelse(is.na(crop.damage), 0, crop.damage) *
ifelse(is.na(multiplier.y), 0, multiplier.y))
stormdata$multiplier.x <- NULL
stormdata$multiplier.y <- NULL
stormdata$crop.damage.exp <- NULL
stormdata$property.damage.exp <- NULL
The resultant data looks like:
head(stormdata, 10)
## event.type fatalities injuries property.damage crop.damage
## 1 TORNADO 0 11 250.0 0
## 2 TORNADO 0 2 25.0 0
## 3 TORNADO 1 3 250.0 0
## 4 TORNADO 0 0 25.0 0
## 5 TORNADO 0 0 2.5 0
## 6 TORNADO 0 1 250.0 0
## 7 TORNADO 0 0 25.0 0
## 8 TORNADO 0 0 250.0 0
## 9 TORNADO 0 0 25.0 0
## 10 TORNADO 0 0 25.0 0
## property.damage.value crop.damage.value
## 1 250000 0
## 2 25000 0
## 3 250000 0
## 4 25000 0
## 5 2500 0
## 6 250000 0
## 7 25000 0
## 8 250000 0
## 9 25000 0
## 10 25000 0
Results of the analysis - Fatalies by event:
stormdata.fatalities <- stormdata %>%
group_by(event.type) %>%
summarise(fatalities = sum(fatalities)) %>%
top_n(10) %>%
transmute(event = event.type,
impact = "Fatalities",
total = fatalities)
## Selecting by fatalities
arrange(stormdata.fatalities, desc(total))
## Source: local data frame [10 x 3]
##
## event impact total
## 1 TORNADO Fatalities 3681
## 2 EXCESSIVE HEAT Fatalities 2195
## 3 FLASH FLOOD Fatalities 1036
## 4 HEAT Fatalities 937
## 5 LIGHTNING Fatalities 817
## 6 THUNDERSTORM WIND Fatalities 712
## 7 RIP CURRENT Fatalities 577
## 8 FLOOD Fatalities 512
## 9 HIGH WIND Fatalities 323
## 10 EXTREME COLD/WIND CHILL Fatalities 317
Results of the analysis - Injuries by event:
stormdata.injuries <- stormdata %>%
group_by(event.type) %>%
summarise(injuries = sum(injuries)) %>%
top_n(10) %>%
transmute(event = event.type,
impact = "Injuries",
total = injuries)
## Selecting by injuries
arrange(stormdata.injuries , desc(total))
## Source: local data frame [10 x 3]
##
## event impact total
## 1 TORNADO Injuries 67618
## 2 THUNDERSTORM WIND Injuries 9511
## 3 EXCESSIVE HEAT Injuries 7111
## 4 FLOOD Injuries 6873
## 5 LIGHTNING Injuries 5232
## 6 ICE STORM Injuries 2483
## 7 HEAT Injuries 2100
## 8 FLASH FLOOD Injuries 1800
## 9 HIGH WIND Injuries 1615
## 10 WILDFIRE Injuries 1608
Figure 1 shows that tornadoes are clearly the most impactful upon the human population. Tornadoes are the number one event for both total numbers associated fatalities and injuries. Injuries from tornadoes far exceed another single event. Second to tornadoes we see that Excessive Heat also contributes to fatalities and injuries.
stormdata.top10 <- rbind(stormdata.fatalities, stormdata.injuries)
ggplot(stormdata.top10, aes(x=total, y=event)) + geom_segment(aes(yend=event), xend=0, colour="grey50") +
geom_point(size = 3, aes(colour = impact)) +
scale_colour_brewer(palette ="Set1", limits=c("Fatalities", "Injuries"), guide=FALSE) +
theme_grey() + theme(panel.grid.major.y = element_blank()) +
facet_grid(impact ~ ., scales ="free_y", space="free_y") +
labs(title = "Figure 1: Top 10 Weather Events - Fatalities vs. Injuries ", x="Total", y="Weather Event")
Results of the analysis - Property Damage by event:
stormdata.property.damage <- stormdata %>%
group_by(event.type) %>% na.omit() %>%
summarise(property.damage = sum(property.damage.value)) %>%
top_n(10) %>%
transmute(event = event.type,
impact = "Property Damage",
total = property.damage)
## Selecting by property.damage
arrange(stormdata.property.damage, desc(total))
## Source: local data frame [10 x 3]
##
## event impact total
## 1 FLOOD Property Damage 150234934300
## 2 HURRICANE/TYPHOON Property Damage 85366885010
## 3 TORNADO Property Damage 53040326470
## 4 STORM SURGE/TIDE Property Damage 47965244000
## 5 FLASH FLOOD Property Damage 16906877610
## 6 HAIL Property Damage 15974564720
## 7 THUNDERSTORM WIND Property Damage 9821780280
## 8 WILDFIRE Property Damage 8496628500
## 9 TROPICAL STORM Property Damage 7703890550
## 10 WINTER STORM Property Damage 6689064800
Results of the analysis - Crop Damage by event:
# Crop Damage
stormdata.crop.damage <- stormdata %>%
group_by(event.type) %>%
summarise(crop.damage = sum(crop.damage.value)) %>%
top_n(10) %>%
transmute(event = event.type,
impact = "Crop Damage",
total = crop.damage)
## Selecting by crop.damage
arrange(stormdata.crop.damage, desc(total))
## Source: local data frame [10 x 3]
##
## event impact total
## 1 DROUGHT Crop Damage 13972621780
## 2 FLOOD Crop Damage 10855941050
## 3 HURRICANE/TYPHOON Crop Damage 5532667800
## 4 ICE STORM Crop Damage 5022114300
## 5 HAIL Crop Damage 3046937600
## 6 FROST/FREEZE Crop Damage 1700831000
## 7 FLASH FLOOD Crop Damage 1532197150
## 8 EXTREME COLD/WIND CHILL Crop Damage 1330023000
## 9 THUNDERSTORM WIND Crop Damage 1258359900
## 10 HEAVY RAIN Crop Damage 1021770800
Figure 2 shows that floods, hurricanes/typhoons, and tornadoes are the most impactful weather events for property damage. Floods are by far the single biggest contributor to property damage in the United States.
Drought and floods are the most significant weather events for crop damage.
stormdata.top10.damage <- rbind(stormdata.property.damage, stormdata.crop.damage)
ggplot(stormdata.top10.damage, aes(x=total/10^9, y=event)) + geom_segment(aes(yend=event), xend=0, colour="grey50") +
geom_point(size = 3, aes(colour = impact)) +
scale_colour_brewer(palette ="Set1", limits=c("Property Damage", "Crop Damage"), guide=FALSE) +
theme_grey() + theme(panel.grid.major.y = element_blank()) +
facet_grid(impact ~ ., scales ="free_y", space="free_y") +
labs(title = "Figure 2: Top 10 Weather Events and Economic Impact", x="Total Damage (Billions USD)", y="Weather Event")