Reproducible research - week 4 - Peer graded assignment 2 - weather data

synopsis

This research investigates the impact of natural events on public and economic health. The data used is the National Weather Service Storm Data collection over 61 years (from 1950 untill 2011) which contains 902297 observations and 37 variables. The purpose of this report is to show which weather related event types have the most damaging impact. The first graph shows the top eight event types which lead to the most direct peronal injuries and the second graph shows the top eight event types which lead to most economic damage. Both graphs clearly show the Tornado event type as a leading cause in both personal injuries and economic damage, with other wind or water related (e.g. flooding, flash-flooding) event types a not so close second. This research suggests taking a further look into preventing wind and water related disasters as a measure with the highest ROI.

Data Processing

library(ggplot2)
# working directory set in console
## Download the dataset into a new directory
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
datadir <- "./data"
zipfile <- "projectdataset.bz2"
zipfilefullpath <- paste(datadir, "/", zipfile, sep="")

# create data dir if it doesn't exist yet
if(!file.exists(datadir)){dir.create(datadir)}

# download data file if it wasn't downloaded before
if(!file.exists(zipfilefullpath)){
  download.file(fileurl, destfile = zipfilefullpath)
}

d <- read.csv(zipfilefullpath)

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

require(reshape2)

#transform data
d.aggr <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = d, FUN = sum)
d.order <- d.aggr[order(d.aggr$INJURIES, decreasing = TRUE),]
d.head <- head(d.order, 8)
d.melt <- melt(d.head, id.var="EVTYPE")

#create plot
plot1 <- ggplot(d.melt, aes(x = EVTYPE, y = value, fill = variable)) +
  geom_bar(stat="identity") + 
  theme(axis.text.x = element_text(angle=45, hjust=1, vjust=.95)) +
  ylab("Number of people affected") + xlab("Event Type")
plot1

#show data used in a table
head(d.head, 8)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 856      TSTM WIND        504     6957
## 170          FLOOD        470     6789
## 130 EXCESSIVE HEAT       1903     6525
## 464      LIGHTNING        816     5230
## 275           HEAT        937     2100
## 427      ICE STORM         89     1975
## 153    FLASH FLOOD        978     1777

Across the United States, which types of events have the greatest economic consequences?

#transform data
eco <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, data = d, FUN = sum)
eco.ordered <- eco[order(eco$PROPDMG, eco$CROPDMG, decreasing = TRUE),]
eco.dmg <- head(eco.ordered, 8)

#create plot
gp2 <- ggplot(eco.dmg, aes(x=EVTYPE, y=PROPDMG, fill=EVTYPE)) 
gp2 <- gp2 + geom_bar(stat="identity") 
gp2 <- gp2 + theme(axis.text.x = element_text(angle=45, hjust=1, size=8, color=2))
gp2 <- gp2 + geom_text(aes(x=eco.dmg$EVTYPE, y=eco.dmg$PROPDMG, angle=0, label=paste("$", floor(eco.dmg$PROPDMG), sep = ""), hjust=0.5, vjust=-0.5)) + theme(legend.position = "none")
gp2 <- gp2 + ylab("Property + Crop damage in dollars") + xlab("Event Type")
gp2

#show data used in a table
head(eco.dmg, 8)
##                 EVTYPE   PROPDMG   CROPDMG
## 834            TORNADO 3212258.2 100018.52
## 153        FLASH FLOOD 1420124.6 179200.46
## 856          TSTM WIND 1335965.6 109202.60
## 170              FLOOD  899938.5 168037.88
## 760  THUNDERSTORM WIND  876844.2  66791.45
## 244               HAIL  688693.4 579596.28
## 464          LIGHTNING  603351.8   3580.61
## 786 THUNDERSTORM WINDS  446293.2  18684.93