The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database contains storm data from 1950 to November 2011. It contains data per storm event on human health effects, fatalities and injuries, and economic effects, property and crop damage in USD. This paper analyzes the data to determine which event types cause the highest human heath and economic damage.
The results section shows, tornadoes have by far the highest human health effect and floods have the highest economic effect.
### Download storm data file and read into 'stormdata'
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(stringr)
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, "stormdata.csv.bz2")
stormdata <- read.csv("stormdata.csv.bz2")
Fatalities and Injuries are deciding factor for if event is harmful to health.
### Filter stormdata by FATALITIES and INJURIES, taking storms with top 1% of FATALITIES or top 0.5% of INJURIES. Filter further to get a top 10.
fatal.injury <- select(stormdata, EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarize(fatalities = sum(FATALITIES), injuries = sum(INJURIES))
most.harmful <- filter(fatal.injury,
fatalities > quantile(fatal.injury$fatalities, probs = seq(0.9, 1, 0.01)[9]) |
injuries > quantile(fatal.injury$injuries, probs = seq(0.99, 1, 0.005)[9])) %>%
arrange(by = desc(fatalities, injuries)) %>%
rename(eventtype = EVTYPE)
most.harmful <- most.harmful[c(1:10), ]
Property (PROPDMG) and crop (CROPDMG) are used to determine economic consequence. Variables PROPDMGEXP adn CROPDMGEXP which determine if value in PROPDMG and CROPDMG is in thousand(K), million(M), or billion(B) USD.
### ...DMGEXP needs to be converted to numeric and multiplied into associated PROPDMG and CROPDMG to get value of property and crop damage.
### Create new variable (totaldamage) by adding property and crop damage for each event type. Create top 10 list.
damage <- select(stormdata, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
group_by(EVTYPE) %>%
filter(PROPDMG > 0 & CROPDMG > 0) %>%
mutate(PROPDMGEXP = str_replace_all(PROPDMGEXP,c("K" = "1000"))) %>%
mutate(PROPDMGEXP = str_replace_all(PROPDMGEXP,c("k" = "1000"))) %>%
mutate(PROPDMGEXP = str_replace_all(PROPDMGEXP,c("M" = "1000000"))) %>%
mutate(PROPDMGEXP = str_replace_all(PROPDMGEXP,c("m" = "1000000"))) %>%
mutate(PROPDMGEXP = str_replace_all(PROPDMGEXP,c("B" = "1000000000"))) %>%
mutate(CROPDMGEXP = str_replace_all(CROPDMGEXP,c("K" = "1000"))) %>%
mutate(CROPDMGEXP = str_replace_all(CROPDMGEXP,c("k" = "1000"))) %>%
mutate(CROPDMGEXP = str_replace_all(CROPDMGEXP,c("M" = "1000000"))) %>%
mutate(CROPDMGEXP = str_replace_all(CROPDMGEXP,c("m" = "1000000"))) %>%
mutate(CROPDMGEXP = str_replace_all(CROPDMGEXP,c("B" = "1000000000")))
damage$PROPDMGEXP <- as.numeric(damage$PROPDMGEXP)
damage$CROPDMGEXP <- as.numeric(damage$CROPDMGEXP)
damage$PROPDMG <- (damage$PROPDMG * damage$PROPDMGEXP) / 1000000
damage$CROPDMG <- (damage$CROPDMG * damage$CROPDMGEXP) / 1000000
damage[is.na(damage)] <- 0
damage <- summarize(damage, property = sum(PROPDMG), crop = sum(CROPDMG)) %>%
mutate(totaldamage = property + crop) %>%
select(EVTYPE, totaldamage) %>%
arrange(by = desc(totaldamage)) %>%
rename(eventtype = EVTYPE)
damage <- damage[c(1:10), ]
Three charts are included to show the data analysis results. The first two answer the question on human health effects, plots for fatalities and injuries by storm type. The third answers the question on economic effects, plot of economic damage by storm type in Millions of USD.
pf <- ggplot(most.harmful, aes(x = eventtype, y = fatalities)) +
geom_col(aes(fill = eventtype)) +
guides(fill="none") +
xlab("Storm Event Type") + ylab("Total Fatalities") +
ggtitle("Fatalities by Storm Type") +
theme(plot.title = element_text(hjust = 0.5))
pi <- ggplot(most.harmful, aes(x = eventtype, y = injuries)) +
geom_col(aes(fill = eventtype)) +
guides(fill="none") +
xlab("Storm Event Type") + ylab("Total Injuries") +
ggtitle("Injuries by Storm Type") +
theme(plot.title = element_text(hjust = 0.5))
print(pf)
print(pi)
ggplot(damage, aes(x = eventtype, y = totaldamage)) +
geom_col(aes(fill = eventtype)) +
guides(fill = "none") +
xlab("Storm Event Type") + ylab("Total Damage (M USD)") +
ggtitle("Total Economic Damage by Storm Type") +
theme(plot.title = element_text(hjust = 0.5))