Title: “Reproducible Research: Peer Assessment 2”

Impact of Severe Weather Events on Public Health and Economy in the United States

Synopsis

Many severe weather events can result in fatalities, injuries, and property damage. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and address two questions: which the weather events is most harmful to population health; which weather events causes the greatest economic consequences.

Configuration and libraries

knitr::opts_chunk$set(cache=TRUE)
knitr::opts_chunk$set(fig.width=8, fig.height=4, fig.path='figure-html/',
                      echo=TRUE, warning=FALSE, message=FALSE)


options(scipen = 1)  # Turn off scientific notations for numbers

if("dplyr" %in% rownames(installed.packages()) == FALSE) {install.packages("dplyr")}

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
options("scipen" = 10)

Data Processing

Download and read data

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
desfile <- "storm.csv.bz2"
if (!file.exists(desfile)) {
   download.file(url,desfile) 
}
stormDT <- read.csv(desfile, header=TRUE)

Processing data

The original dataset is quite large and in order to analyze the data more efficiently, we select the following variables:
* EVTYPE as a measure of event type (e.g. tornado, flood, etc.)
* FATALITIES as a measure of harm to human health
* INJURIES as a measure of harm to human health
* PROPDMG as a measure of property damage and hence economic damage in USD
* PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.)
* CROPDMG as a measure of crop damage and hence economic damage in USD
* CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)

stormDT <- select(stormDT, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(stormDT)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Across the United States, which types of events are most harmful with respect to population health?

There are two variables that are related to population health: FATALITIES and Injuries. Let’s look at the top 10 weather regarding fatalities and injuries repectively.

Fatalities

fatality <- aggregate(FATALITIES ~ EVTYPE, data=stormDT, sum)
fatalityTop10 <- arrange(fatality, desc(FATALITIES))[1:10,]
# Top 10 fatality weather events
fatalityTop10
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224
ggplot(fatalityTop10, aes(x = reorder(factor(EVTYPE),FATALITIES), y=FATALITIES)) + 
  geom_bar(stat = "identity") + 
  coord_flip() +
  xlab("Event Types") +
  ylab("FATALITIES") + 
  ggtitle("Top 10 Fatal Weather Events in 1950-2011") +
  theme(legend.position="none")

Injuries

injury <- aggregate(INJURIES ~ EVTYPE, data=stormDT, sum)
injuryTop10 <- arrange(injury, desc(INJURIES))[1:10,]
# Top 10 injury weather events
injuryTop10
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
ggplot(injuryTop10, aes(x = reorder(factor(EVTYPE),INJURIES), y =INJURIES )) + 
  geom_bar(stat = "identity") + 
  coord_flip() +
  xlab("Event Types") +
  ylab("INJURIES") + 
  ggtitle("Top 10 Weather Events Causing Injuries in 1950-2011")

Based on the results, TORNADO is the most fatalities and injuries so it is the most harmful weather event with respect to population health across United States.

Across the United States, which types of events have the greatest economic consequences?

Let’s look at the variables PROPDMG and CROPDMG with repect to type of events. But first we need to convert them into compariable values in dollars.

The variable PROPDMGEXP and CROPDMGEXP have the following values:

levels(stormDT$PROPDMGEXP)
##  [1] ""  "+" "-" "0" "1" "2" "3" "4" "5" "6" "7" "8" "?" "B" "H" "K" "M"
## [18] "h" "m"
levels(stormDT$CROPDMGEXP)
## [1] ""  "0" "2" "?" "B" "K" "M" "k" "m"

Those Alphabetical characters were used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.

Reformatting data so that magnitude levels are uniform for both Property and Crop Damage

#Property Damage
stormDT$PROPDMGEXP = gsub("\\-|\\+|\\?","0",stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("B|b", "9", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("M|m", "6", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("K|k", "3", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("H|h", "2", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP <- as.numeric(stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP[is.na(stormDT$PROPDMGEXP)] = 0
stormDT$ActPROPDMG<- stormDT$PROPDMG * 10^stormDT$PROPDMGEXP

#Crop Damage
stormDT$CROPDMGEXP = gsub("\\-|\\+|\\?","0",stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("B|b", "9", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("M|m", "6", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("K|k", "3", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("H|h", "2", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP <- as.numeric(stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP[is.na(stormDT$CROPDMGEXP)] = 0
stormDT$ActCROPDMG<- stormDT$CROPDMG * 10^stormDT$CROPDMGEXP

# Total Damage (Property + Crop)
stormDT$TotDMG <- stormDT$ActPROPDMG + stormDT$ActCROPDMG

# Damage by Event Type
damage <- aggregate(TotDMG ~ EVTYPE, data=stormDT, sum)
damageTop10 <- arrange(damage, desc(TotDMG))[1:10,]
damageTop10
##               EVTYPE       TotDMG
## 1              FLOOD 150319678257
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333946
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221986
## 6        FLASH FLOOD  18243991078
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041360
ggplot(damageTop10, aes(x = reorder(factor(EVTYPE),TotDMG), y =TotDMG )) + 
  geom_bar(stat = "identity") + 
  coord_flip() +
  xlab("Event Types") +
  ylab("TotDMG") + 
  ggtitle("Top 10 Weather Events Causing Economic Damage in 1950-2011")

Results

Based on the results, TORNADO has the most fatalities and injuries so it is the most harmful weather event with respect to population health across United States between 1950-2011.

Across the United States, FLOOD has the greatest economic consequences between 1950-2011.