Synopsis

Exploring the NOAA Storm Database to answer the following 2 questions from Reproducible Research Assignment 2:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Load libraries and read in data

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)

if(!file.exists("stormData.csv.bz2")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
  destfile = "stormData.csv.bz2", method = "curl")
}

df <- read.csv("repdata_data_StormData.csv", header = T)

To answer both questions, we don’t need all variables so we will subset the columns needed for our analysis

tidydf <- df[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

Property and crop damage utilize letters to describe the units with H, K, M, and B meaning hundred, thougsand, million, and billion, respectively. Therefore we will create an empty column to fill with the correct unit of measurements, allowing for more concise comparisons.

#for property damage
tidydf$PROPDMGNUM = 0

tidydf[tidydf$PROPDMGEXP == "H", ]$PROPDMGNUM = tidydf[tidydf$PROPDMGEXP == "H", ]$PROPDMG * 10^2
tidydf[tidydf$PROPDMGEXP == "K", ]$PROPDMGNUM = tidydf[tidydf$PROPDMGEXP == "K", ]$PROPDMG * 10^3
tidydf[tidydf$PROPDMGEXP == "M", ]$PROPDMGNUM = tidydf[tidydf$PROPDMGEXP == "M", ]$PROPDMG * 10^6
tidydf[tidydf$PROPDMGEXP == "B", ]$PROPDMGNUM = tidydf[tidydf$PROPDMGEXP == "B", ]$PROPDMG * 10^9

#for crop damage
tidydf$CROPDMGNUM = 0

tidydf[tidydf$CROPDMGEXP == "H", ]$CROPDMGNUM = tidydf[tidydf$CROPDMGEXP == "H", ]$CROPDMG * 10^2
tidydf[tidydf$CROPDMGEXP == "K", ]$CROPDMGNUM = tidydf[tidydf$CROPDMGEXP == "K", ]$CROPDMG * 10^3
tidydf[tidydf$CROPDMGEXP == "M", ]$CROPDMGNUM = tidydf[tidydf$CROPDMGEXP == "M", ]$CROPDMG * 10^6
tidydf[tidydf$CROPDMGEXP == "B", ]$CROPDMGNUM = tidydf[tidydf$CROPDMGEXP == "B", ]$CROPDMG * 10^9

Results

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ?

fatalities <- aggregate(FATALITIES ~ EVTYPE, data=tidydf, sum)

fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)

injuries <- aggregate(INJURIES ~ EVTYPE, data=tidydf, sum)
injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Fatalities") + ggtitle("Number of Fatalities by Top 10 Weather Events")

ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", fill = "orange") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Injuries") + ggtitle("Number of Injuries by Top 10 Weather Events")

From the figures, we see that tornadoes cause the most injuries and fatalities compared to other weather events in the US. Therefore, tornadoes are the most harmful weather event with respect to population health.

Question 2: Across the United States, which types of events have the greatest economic consequences?

damages <- aggregate(PROPDMGNUM + CROPDMGNUM ~ EVTYPE, data=tidydf, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")
damages <- damages[order(-damages$TOTALDAMAGE), ][1:10, ]
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)

ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) + 
    geom_bar(stat = "identity", fill = "purple") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damages ($US)") + ggtitle("Property & Crop Damages by Top 10 Weather Events")

From the figure, we see that floods cause the most combined property and crop damage. Therefore, floods have the greatest economic consequences in the US.