This research investigates the impact of natural events on public and economic health. The data used is the National Weather Service Storm Data collection over 61 years (from 1950 untill 2011) which contains 902297 observations and 37 variables. The purpose of this report is to show which weather related event types have the most damaging impact. The first graph shows the top eight event types which lead to the most direct peronal injuries and the second graph shows the top eight event types which lead to most economic damage. Both graphs clearly show the Tornado event type as a leading cause in both personal injuries and economic damage, with other wind or water related (e.g. flooding, flash-flooding) event types a not so close second. This research suggests taking a further look into preventing wind and water related disasters as a measure with the highest ROI.
library(ggplot2)
# working directory set in console
## Download the dataset into a new directory
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
datadir <- "./data"
zipfile <- "projectdataset.bz2"
zipfilefullpath <- paste(datadir, "/", zipfile, sep="")
# create data dir if it doesn't exist yet
if(!file.exists(datadir)){dir.create(datadir)}
# download data file if it wasn't downloaded before
if(!file.exists(zipfilefullpath)){
download.file(fileurl, destfile = zipfilefullpath)
}
d <- read.csv(zipfilefullpath)Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
require(reshape2)
#transform data
d.aggr <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = d, FUN = sum)
d.order <- d.aggr[order(d.aggr$INJURIES, decreasing = TRUE),]
d.head <- head(d.order, 8)
d.melt <- melt(d.head, id.var="EVTYPE")
#create plot
plot1 <- ggplot(d.melt, aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(angle=45, hjust=1, vjust=.95)) +
ylab("Number of people affected") + xlab("Event Type")
plot1#show data used in a table
head(d.head, 8)## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 130 EXCESSIVE HEAT 1903 6525
## 464 LIGHTNING 816 5230
## 275 HEAT 937 2100
## 427 ICE STORM 89 1975
## 153 FLASH FLOOD 978 1777
Across the United States, which types of events have the greatest economic consequences?
#transform data
eco <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data = d, FUN = sum)
eco.ordered <- eco[order(eco$PROPDMG, eco$CROPDMG, decreasing = TRUE),]
eco.dmg <- head(eco.ordered, 8)
#create plot
gp2 <- ggplot(eco.dmg, aes(x=EVTYPE, y=PROPDMG, fill=EVTYPE))
gp2 <- gp2 + geom_bar(stat="identity")
gp2 <- gp2 + theme(axis.text.x = element_text(angle=45, hjust=1, size=8, color=2))
gp2 <- gp2 + geom_text(aes(x=eco.dmg$EVTYPE, y=eco.dmg$PROPDMG, angle=0, label=paste("$", floor(eco.dmg$PROPDMG), sep = ""), hjust=0.5, vjust=-0.5)) + theme(legend.position = "none")
gp2 <- gp2 + ylab("Property + Crop damage in dollars") + xlab("Event Type")
gp2#show data used in a table
head(eco.dmg, 8)## EVTYPE PROPDMG CROPDMG
## 834 TORNADO 3212258.2 100018.52
## 153 FLASH FLOOD 1420124.6 179200.46
## 856 TSTM WIND 1335965.6 109202.60
## 170 FLOOD 899938.5 168037.88
## 760 THUNDERSTORM WIND 876844.2 66791.45
## 244 HAIL 688693.4 579596.28
## 464 LIGHTNING 603351.8 3580.61
## 786 THUNDERSTORM WINDS 446293.2 18684.93