In this document, we look at injuries as well as fatalities due to the most dangerous weather events in the NOAA storm data set. We then explore economic damage caused by the worst weather events. We find that tornadoes are by a large margin the most dangerous and expensive events and are relatively frequent compared to other events that cause injury and death. We also find that floods are relatively uncommon compared to other severe weather events, but they tend to cause a disproportionate amount of damage both in terms of health outcomes as well as economic outcomes.
setwd("C:/Users/USER/Dropbox/Coursera/Reproducable Research/Project 2")
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.2
library(reshape2)
if(!file.exists("stormdata.csv.bz2")){
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormdata.csv.bz2")
}
storm <- read.csv("stormdata.csv.bz2")
df.casualties <- storm %>%
group_by(EVTYPE) %>%
summarize(TotalDeaths = sum(FATALITIES), TotalInjured = sum(INJURIES),
TotalEvents = length(EVTYPE)) %>%
filter(TotalDeaths > 0, TotalInjured > 0) %>%
top_n(10, TotalEvents) %>%
arrange(desc(TotalEvents))
In words what the above code does: For each different type of severe weather event indicated by EVTYPE (tornado, lightning, etc) it calculates the total number of fatalities for that type of event, the total number of non-fatal injuries, and the total number of such events are found in the dataset. It then removes any event types that did not result in at least one fatality and injury and then reduces it to the top 10 events by frequency.
df.propdmg <- storm
df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "K"] = df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "K"]*10^3
df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "M"] = df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "M"]*10^6
df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "B"] = df.propdmg$PROPDMG[df.propdmg$PROPDMGEXP == "B"]*10^9
df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "K"] = df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "K"]*10^3
df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "M"] = df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "M"]*10^6
df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "B"] = df.propdmg$CROPDMG[df.propdmg$CROPDMGEXP == "B"]*10^9
df.economic <- df.propdmg %>%
group_by(EVTYPE) %>%
summarize(TotalPropDmg = sum(PROPDMG), TotalCropDmg = sum(CROPDMG)) %>%
mutate(TotalEconomicDmg = TotalPropDmg + TotalCropDmg) %>%
top_n(10, TotalEconomicDmg) %>%
arrange(desc(TotalEconomicDmg))
The above code converts the PROPDMG and CROPDMG numbers into dollar amounts using the indicators in the PROPDMGEXP column (K for 1000, M for million, B for billion) and then calculates total damage caused by each event type and reduces the table to the top 10 by economic damage caused.
# This looks really weird, but their data has TSTM WIND, THUNDERSTORM WINDS, and THUNDERSTORM WIND
# all marked as different EVTYPE; non-standardized data names on their side. This next line
# cleans those up
df.casualties[1,2:4] <- df.casualties[1,2:4] + df.casualties[4,2:4] + df.casualties[9,2:4]
df.casualties <- df.casualties[-4,]
df.casualties <- df.casualties[-8,]
df.melt <- melt(df.casualties)
## Using EVTYPE as id variables
p4 <- ggplot(df.melt, aes(x = EVTYPE, y=value)) +
geom_bar(stat = "identity") +
facet_grid(variable ~ ., scales = "free_y") +
theme_bw() +
xlab("") +
ylab("") +
ggtitle("Severe Weather Event Health Summary") +
theme(plot.title = element_text(lineheight=1, hjust = 0, family = "serif", face="bold", size = 19),
legend.position="top",
plot.title = element_text(size = rel(.5), color = "grey45",, family = "serif"),
axis.title.x = element_text(hjust = 1, size = 14, color = "grey45",, family = "serif"),
axis.title.y = element_text(size = 14, color = "grey45",, family = "serif"),
axis.text = element_text(size = 11, color = "grey45", angle=30, hjust=1),
legend.text = element_text(size = 14, family = "serif", hjust = 0),
axis.ticks = element_line(colour = 'grey45'),
panel.border = element_rect(color = "grey85"),
panel.grid.major = element_line(colour = "grey45"))
p4
Based on the above figure, tornadoes are the most dangerous severe weather events in this data set by a large margin; they have killed over 4000 people and injured more than 60,000 others. They are much less frequent than the two most frequent events that have caused injury and death, namely thunderstorms with strong wind and hailstorms. However, those two event types are associated with relatively low rates of injury or mortality. The other noteworty events in this chart are floods and flash floods; they are comparatively very rare appearing in the data about 150 times combined, but they are the cause of a disproportionately large number of deaths relative to their infrequency of occurrence.
p5 <- ggplot(df.economic, aes(x = EVTYPE, y=TotalEconomicDmg)) +
geom_bar(stat = "identity") +
theme_bw() +
xlab("") +
ylab("Total Damage (Dollars)") +
ggtitle("Severe Weather Event Economic Damage Summary") +
theme(plot.title = element_text(lineheight=1, hjust = 0, family = "serif", face="bold", size = 19),
legend.position="top",
plot.title = element_text(size = rel(.5), color = "grey45",, family = "serif"),
axis.title.x = element_text(hjust = 1, size = 14, color = "grey45",, family = "serif"),
axis.title.y = element_text(size = 14, color = "grey45",, family = "serif"),
axis.text = element_text(size = 11, color = "grey45", angle=30, hjust=1),
legend.text = element_text(size = 14, family = "serif", hjust = 0),
axis.ticks = element_line(colour = 'grey45'),
panel.border = element_rect(color = "grey85"),
panel.grid.major = element_line(colour = "grey45"))
p5
Again, we see that tornadoes are public enemy number one, causing more than triple the damage of the next most damaging event, river flooding. We see a continued pattern from above, however; flooding seems to be one of the most severe events even though they do not occur very frequently. Ice storms and winter storms make an appearance here, but not in the health figure; apparently people that live in areas with such storms can’t avoid the economic hardships from them, but are able to keep themselves safe.