Many severe weather events can result in fatalities, injuries, and property damage. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and address two questions: which the weather events is most harmful to population health; which weather events causes the greatest economic consequences.
knitr::opts_chunk$set(cache=TRUE)
knitr::opts_chunk$set(fig.width=8, fig.height=4, fig.path='figure-html/',
echo=TRUE, warning=FALSE, message=FALSE)
options(scipen = 1) # Turn off scientific notations for numbers
if("dplyr" %in% rownames(installed.packages()) == FALSE) {install.packages("dplyr")}
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
options("scipen" = 10)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
desfile <- "storm.csv.bz2"
if (!file.exists(desfile)) {
download.file(url,desfile)
}
stormDT <- read.csv(desfile, header=TRUE)
The original dataset is quite large and in order to analyze the data more efficiently, we select the following variables:
* EVTYPE as a measure of event type (e.g. tornado, flood, etc.)
* FATALITIES as a measure of harm to human health
* INJURIES as a measure of harm to human health
* PROPDMG as a measure of property damage and hence economic damage in USD
* PROPDMGEXP as a measure of magnitude of property damage (e.g. thousands, millions USD, etc.)
* CROPDMG as a measure of crop damage and hence economic damage in USD
* CROPDMGEXP as a measure of magnitude of crop damage (e.g. thousands, millions USD, etc.)
stormDT <- select(stormDT, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
head(stormDT)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
There are two variables that are related to population health: FATALITIES and Injuries. Let’s look at the top 10 weather regarding fatalities and injuries repectively.
fatality <- aggregate(FATALITIES ~ EVTYPE, data=stormDT, sum)
fatalityTop10 <- arrange(fatality, desc(FATALITIES))[1:10,]
# Top 10 fatality weather events
fatalityTop10
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
ggplot(fatalityTop10, aes(x = reorder(factor(EVTYPE),FATALITIES), y=FATALITIES)) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("Event Types") +
ylab("FATALITIES") +
ggtitle("Top 10 Fatal Weather Events in 1950-2011") +
theme(legend.position="none")
injury <- aggregate(INJURIES ~ EVTYPE, data=stormDT, sum)
injuryTop10 <- arrange(injury, desc(INJURIES))[1:10,]
# Top 10 injury weather events
injuryTop10
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
ggplot(injuryTop10, aes(x = reorder(factor(EVTYPE),INJURIES), y =INJURIES )) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("Event Types") +
ylab("INJURIES") +
ggtitle("Top 10 Weather Events Causing Injuries in 1950-2011")
Based on the results, TORNADO is the most fatalities and injuries so it is the most harmful weather event with respect to population health across United States.
Let’s look at the variables PROPDMG and CROPDMG with repect to type of events. But first we need to convert them into compariable values in dollars.
The variable PROPDMGEXP and CROPDMGEXP have the following values:
levels(stormDT$PROPDMGEXP)
## [1] "" "+" "-" "0" "1" "2" "3" "4" "5" "6" "7" "8" "?" "B" "H" "K" "M"
## [18] "h" "m"
levels(stormDT$CROPDMGEXP)
## [1] "" "0" "2" "?" "B" "K" "M" "k" "m"
Those Alphabetical characters were used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.
#Property Damage
stormDT$PROPDMGEXP = gsub("\\-|\\+|\\?","0",stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("B|b", "9", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("M|m", "6", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("K|k", "3", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP = gsub("H|h", "2", stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP <- as.numeric(stormDT$PROPDMGEXP)
stormDT$PROPDMGEXP[is.na(stormDT$PROPDMGEXP)] = 0
stormDT$ActPROPDMG<- stormDT$PROPDMG * 10^stormDT$PROPDMGEXP
#Crop Damage
stormDT$CROPDMGEXP = gsub("\\-|\\+|\\?","0",stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("B|b", "9", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("M|m", "6", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("K|k", "3", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP = gsub("H|h", "2", stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP <- as.numeric(stormDT$CROPDMGEXP)
stormDT$CROPDMGEXP[is.na(stormDT$CROPDMGEXP)] = 0
stormDT$ActCROPDMG<- stormDT$CROPDMG * 10^stormDT$CROPDMGEXP
# Total Damage (Property + Crop)
stormDT$TotDMG <- stormDT$ActPROPDMG + stormDT$ActCROPDMG
# Damage by Event Type
damage <- aggregate(TotDMG ~ EVTYPE, data=stormDT, sum)
damageTop10 <- arrange(damage, desc(TotDMG))[1:10,]
damageTop10
## EVTYPE TotDMG
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991078
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
ggplot(damageTop10, aes(x = reorder(factor(EVTYPE),TotDMG), y =TotDMG )) +
geom_bar(stat = "identity") +
coord_flip() +
xlab("Event Types") +
ylab("TotDMG") +
ggtitle("Top 10 Weather Events Causing Economic Damage in 1950-2011")
Based on the results, TORNADO has the most fatalities and injuries so it is the most harmful weather event with respect to population health across United States between 1950-2011.
Across the United States, FLOOD has the greatest economic consequences between 1950-2011.