Synopsis

The goal of this analysis is to determine which forms of weather are responsible for the most harmful effects on human life and the economy in the United States. This analysis uses storm data from NOAA available [here][1]. To determine the harmful effects on human life, the number of fatalities related to a particular weather event was taken as the quantity of interest. The sum of property damage and crop damage was the quantity of interest to determine the consequences on the economy. It was found that while tornadoes have caused the most fatalities from 1950 to 2011, flooding caused the most financial damage. [1]: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. “here”

Data Processing

First the data will be downloaded and loaded into the stormData variable.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata.csv")
stormData <- read.csv("stormdata.csv")

The best measure of how harmful a particular weather effect is to human life is the fatality rate (as opposed to injury rate, which is very ambiguous). Therefore, the data is processed by subsetting based on fatalities according to weather type. Furthermore, a lower cutoff of 300 deaths is set to identify the worst offenders.

fatSum <- aggregate(FATALITIES~EVTYPE, stormData, sum)
topFat <- subset(fatSum, fatSum[,2] > 300)

Additional processing is required to determine the total economic consequence of each weather type. Total cost is defined here as the sum of both property damage and crop damage (from the PROPDMG and CROPDMG columns). These columns have alphabetic modifiers attached in adjacent columns (h/H for hundred, k/K for thousand, m/M for million, b/B for billion). First the data is processed to remove events which did not cause any financial damage (either property or crop). Next the data is reduced down to only the weather type and the relevant financial damage columns. The alphabet codes are then transformed into the numeric multipliers they represent. Modifiers which were not one of the approved letter codes were dropped, with the exception of a blank entry which was changed to a 1. The sum of the property damage and crop damage is tallied and the most costly forms of weather (> $15 billion) are summarized.

if("dplyr" %in% rownames(installed.packages()) == FALSE) {install.packages("dplyr")}
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
stormDmg <- stormData[stormData$PROPDMG != 0 | stormData$CROPDMG != 0,]
Dmg <- select(stormDmg, EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
levels(Dmg$PROPDMGEXP) <- c(levels(Dmg$PROPDMGEXP), "100", "1000", "1000000", "1000000000")
Dmg$PROPDMGEXP[Dmg$PROPDMGEXP == "H" | Dmg$PROPDMGEXP == "h"] <- "100"
Dmg$PROPDMGEXP[Dmg$PROPDMGEXP == "K" | Dmg$PROPDMGEXP == "k"] <- "1000"
Dmg$PROPDMGEXP[Dmg$PROPDMGEXP == "M" | Dmg$PROPDMGEXP == "m"] <- "1000000"
Dmg$PROPDMGEXP[Dmg$PROPDMGEXP == "B" | Dmg$PROPDMGEXP == "b"] <- "1000000000"
Dmg$PROPDMGEXP[Dmg$PROPDMGEXP == ""] <- "1"
levels(Dmg$CROPDMGEXP) <- c(levels(Dmg$CROPDMGEXP), "1", "100", "1000", "1000000", "1000000000")
Dmg$CROPDMGEXP[Dmg$CROPDMGEXP == "H" | Dmg$CROPDMGEXP == "h"] <- "100"
Dmg$CROPDMGEXP[Dmg$CROPDMGEXP == "K" | Dmg$CROPDMGEXP == "k"] <- "1000"
Dmg$CROPDMGEXP[Dmg$CROPDMGEXP == "M" | Dmg$CROPDMGEXP == "m"] <- "1000000"
Dmg$CROPDMGEXP[Dmg$CROPDMGEXP == ""] <- "1"
Dmg <- filter(Dmg, PROPDMGEXP == "1" | PROPDMGEXP == "100" | PROPDMGEXP == "1000" | PROPDMGEXP == "1000000" | PROPDMGEXP == "1000000000")
Dmg <- filter(Dmg, CROPDMGEXP == "1" | CROPDMGEXP == "100" | CROPDMGEXP == "1000" | CROPDMGEXP == "1000000")
Dmg <- mutate(Dmg, TOTCOST = (Dmg$PROPDMG*as.numeric(as.character(Dmg$PROPDMGEXP)) +       
                                  Dmg$CROPDMG*as.numeric(as.character(Dmg$CROPDMGEXP)))/1000000000)
DmgSum <- aggregate(TOTCOST~EVTYPE, Dmg, sum)
DmgSumHigh <- subset(DmgSum, DmgSum[,2] > 15)

Results

The below plot summarizes the eight most deadly weather events. Tornadoes have caused the most deaths between 1950 and 2011 (~5600).

if("lattice" %in% rownames(installed.packages()) == FALSE) {install.packages("lattice")}
library(lattice)
fat_plot <- xyplot(FATALITIES~EVTYPE, topFat, main = "Deaths from US Weather Events, 1950-2011", 
                   xlab = "Weather Type", ylab = "Fatalities", type = "o")
update(fat_plot, par.settings = list(fontsize = list(text = 8, points = 15)))

The below plot summarizes the six most costly weather events. Flooding caused the most economic damage (~$150 billion) between 1950 and 2011.

cost_plot <- xyplot(TOTCOST~EVTYPE, DmgSumHigh, main = "Economic Cost of Various Weather Types in the US", 
                   xlab = "Weather Type", ylab = "Cost, in Billions of US Dollars", type = "o")
update(cost_plot, par.settings = list(fontsize = list(text = 8, points = 15)))