Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The goal of this assignment is to explore the NOAA Storm Database and answer two basic questions about severe weather events:
To avoid misusing data or mixing files, is recommended to create a specific folder for the project inside your working directory:
mainDir <- getwd()
subDir <- "storm_data"
if (file.exists(subDir)){
setwd(file.path(mainDir, subDir))
} else {
dir.create(file.path(mainDir, subDir))
setwd(file.path(mainDir, subDir))
}
Loading required packages:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(tidyr)
library(knitr)
Complementary information:
sessionInfo()
## R version 3.3.2 (2016-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: macOS Sierra 10.12.4
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] knitr_1.15.1 tidyr_0.6.1 ggplot2_2.2.1 dplyr_0.5.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.9 codetools_0.2-15 digest_0.6.12 rprojroot_1.2
## [5] assertthat_0.1 plyr_1.8.4 grid_3.3.2 R6_2.2.0
## [9] gtable_0.2.0 DBI_0.6 backports_1.0.5 magrittr_1.5
## [13] scales_0.4.1 evaluate_0.10 stringi_1.1.2 lazyeval_0.2.0
## [17] rmarkdown_1.3 tools_3.3.2 stringr_1.2.0 munsell_0.4.3
## [21] yaml_2.1.14 colorspace_1.3-2 htmltools_0.3.5 tibble_1.2
The data for this project come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. It can be downloaded from:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
## Download dataset from website into new working directory
if (!file.exists("repdata_data_StormData.csv.bz2")){
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile="repdata_data_StormData.csv.bz2", quiet = FALSE,
mode = "w", cacheOK = TRUE, method="libcurl")
}
## Read file
FullStormData <- read.csv("repdata_data_StormData.csv.bz2")
For the purpose of this project an event is considered harmful when it causes injuries and/or fatalities.
Create data frame with the necessary data and remove NA values:
harmfulevents <- FullStormData
harmfulevents$FATALITIES <- as.numeric(harmfulevents$FATALITIES)
harmfulevents$INJURIES <- as.numeric(harmfulevents$INJURIES)
harmfulevents <- harmfulevents[(!is.na(harmfulevents$FATALITIES)) | (!is.na(harmfulevents$INJURIES)), c("EVTYPE","FATALITIES","INJURIES")]
Aggregate injuries and fatalities numbers for each type of event and rearrange them on descending order:
sumharmfulevents <- aggregate(. ~ EVTYPE, harmfulevents, sum)
sumharmfulevents <- arrange(sumharmfulevents, desc(FATALITIES + INJURIES))
## Remove no longer necessary data frame from memory
rm(harmfulevents)
Create a plot for the top 5 events with greater numbers of injuries and fatalities:
sumharmfulevents <- sumharmfulevents[1:5,]
sumharmfulevents <- gather(sumharmfulevents, HMTYPE, TOTAL, FATALITIES:INJURIES)
hfeplot <- ggplot(sumharmfulevents, aes(x = reorder(EVTYPE, -TOTAL),
y = TOTAL, fill = HMTYPE)) +
geom_bar(stat = "identity") +
labs(x = "Weather Event", y = "Number Occurrences") +
labs(title = "Top 5 Most Harmful Weather Events") +
labs(fill = "Occurrence Type") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0))
For the purpose of this project economic consequences are extracted from the following data frame columns:
Create data frame with the necessary data and remove NA values:
economicconsequences <- FullStormData
economicconsequences$PROPDMG <- as.numeric(economicconsequences$PROPDMG)
economicconsequences$CROPDMG <- as.numeric(economicconsequences$CROPDMG)
economicconsequences <- economicconsequences[(!is.na(economicconsequences$PROPDMG)) |
(!is.na(economicconsequences$CROPDMG)) | (!is.na(economicconsequences$PROPDMGEXP)) |
(!is.na(economicconsequences$CROPDMGEXP)), c("EVTYPE","PROPDMG","CROPDMG","PROPDMGEXP","CROPDMGEXP")]
## Remove no longer necessary data frame from memory
rm(FullStormData)
Before adding values from property and crop damages, is necessary to put all data in the same order of magnitude.
Magnitudes for property damages:
levels(as.factor(economicconsequences$PROPDMGEXP))
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
Assuming that:
Then:
for(i in 1:length(economicconsequences$PROPDMGEXP)) {
if (economicconsequences$PROPDMGEXP[i] == "B") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 1000000000
} else if (economicconsequences$PROPDMGEXP[i] == "8") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 100000000
} else if (economicconsequences$PROPDMGEXP[i] == "7") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 10000000
} else if (economicconsequences$PROPDMGEXP[i] == "m" | economicconsequences$PROPDMGEXP[i] == "M" | economicconsequences$PROPDMGEXP[i] == "6") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 1000000
} else if (economicconsequences$PROPDMGEXP[i] == "5") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 100000
} else if (economicconsequences$PROPDMGEXP[i] == "4") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 10000
} else if (economicconsequences$PROPDMGEXP[i] == "K" | economicconsequences$PROPDMGEXP[i] == "3") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 1000
} else if (economicconsequences$PROPDMGEXP[i] == "h" | economicconsequences$PROPDMGEXP[i] == "H" | economicconsequences$PROPDMGEXP[i] == "2") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 100
} else if (economicconsequences$PROPDMGEXP[i] == "1") {
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 10
} else
economicconsequences$PROPDMG[i] <- economicconsequences$PROPDMG[i] * 1
}
Magnitudes for crop damages:
levels(as.factor(economicconsequences$CROPDMGEXP))
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
Assuming that:
Then:
for(i in 1:length(economicconsequences$CROPDMGEXP)) {
if (economicconsequences$CROPDMGEXP[i] == "B") {
economicconsequences$CROPDMG[i] <- economicconsequences$CROPDMG[i] * 1000000000
} else if (economicconsequences$CROPDMGEXP[i] == "m" | economicconsequences$CROPDMGEXP[i] == "M") {
economicconsequences$CROPDMG[i] <- economicconsequences$CROPDMG[i] * 1000000
} else if (economicconsequences$CROPDMGEXP[i] == "k" | economicconsequences$PROPDMGEXP[i] == "K") {
economicconsequences$CROPDMG[i] <- economicconsequences$CROPDMG[i] * 1000
} else if (economicconsequences$PROPDMGEXP[i] == "2") {
economicconsequences$CROPDMG[i] <- economicconsequences$CROPDMG[i] * 100
} else
economicconsequences$CROPDMG[i] <- economicconsequences$CROPDMG[i] * 1
}
Aggregate damages for each type of event and rearrange them on descending order:
economicconsequences <- economicconsequences[c("EVTYPE","PROPDMG","CROPDMG")]
sumeconomicconsequences <- aggregate(. ~ EVTYPE, economicconsequences, sum)
sumeconomicconsequences <- arrange(sumeconomicconsequences, desc(PROPDMG + CROPDMG))
## Remove no longer necessary data frame from memory
rm(economicconsequences)
Create a plot for the top 10 events with highest economic consequences:
sumeconomicconsequences <- sumeconomicconsequences[1:10,]
sumeconomicconsequences <- gather(sumeconomicconsequences, DMTYPE, TOTAL, PROPDMG:CROPDMG)
ecplot <- ggplot(sumeconomicconsequences, aes(x = reorder(EVTYPE, -TOTAL),
y = TOTAL/10^9, fill = DMTYPE)) +
geom_bar(stat = "identity") +
labs(x = "Event", y = "Damages in billions of US$") +
labs(title = "Top 10 Weather Events whith Highest Economic Consequences") +
labs(fill = "Damage Type") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0))
Since we have changed the default working directory, it is now recommended that we change it back to the previous one:
setwd(mainDir)
From the plot created on item 2.4:
print(hfeplot)
sumharmfulevents
## EVTYPE HMTYPE TOTAL
## 1 TORNADO FATALITIES 5633
## 2 EXCESSIVE HEAT FATALITIES 1903
## 3 TSTM WIND FATALITIES 504
## 4 FLOOD FATALITIES 470
## 5 LIGHTNING FATALITIES 816
## 6 TORNADO INJURIES 91346
## 7 EXCESSIVE HEAT INJURIES 6525
## 8 TSTM WIND INJURIES 6957
## 9 FLOOD INJURIES 6789
## 10 LIGHTNING INJURIES 5230
tornadoinjuries <- subset(sumharmfulevents, EVTYPE=="TORNADO"&HMTYPE=="INJURIES", TOTAL)
tornadoinjuries <- tornadoinjuries/10^3
tornadofatalities <- subset(sumharmfulevents, EVTYPE=="TORNADO"&HMTYPE=="FATALITIES", TOTAL)
tornadofatalities <- tornadofatalities/10^3
It is possible to see that, across the United States, tornado (with 5.6 thousand fatalities and 91.35 thousand injuries) is the most harmful weather events.
From the plot created on item 2.5:
print(ecplot)
sumeconomicconsequences
## EVTYPE DMTYPE TOTAL
## 1 FLOOD PROPDMG 144657709807
## 2 HURRICANE/TYPHOON PROPDMG 69305840000
## 3 TORNADO PROPDMG 56947380676
## 4 STORM SURGE PROPDMG 43323536000
## 5 HAIL PROPDMG 15735267513
## 6 FLASH FLOOD PROPDMG 16822673978
## 7 DROUGHT PROPDMG 1046106000
## 8 HURRICANE PROPDMG 11868319010
## 9 RIVER FLOOD PROPDMG 5118945500
## 10 ICE STORM PROPDMG 3944927860
## 11 FLOOD CROPDMG 5605254720
## 12 HURRICANE/TYPHOON CROPDMG 2605174701
## 13 TORNADO CROPDMG 380761846
## 14 STORM SURGE CROPDMG 5
## 15 HAIL CROPDMG 2850955098
## 16 FLASH FLOOD CROPDMG 1380145763
## 17 DROUGHT CROPDMG 13965343230
## 18 HURRICANE CROPDMG 2741910000
## 19 RIVER FLOOD CROPDMG 5028185275
## 20 ICE STORM CROPDMG 5021610504
floodpropdmg <- subset(sumeconomicconsequences, EVTYPE=="FLOOD"&DMTYPE=="PROPDMG", TOTAL)
floodpropdmg <- floodpropdmg/10^9
floodtotaldmg <- subset(sumeconomicconsequences, EVTYPE=="FLOOD"&DMTYPE=="PROPDMG", TOTAL) +
subset(sumeconomicconsequences, EVTYPE=="FLOOD"&DMTYPE=="CROPDMG", TOTAL)
floodtotaldmg <- floodtotaldmg/10^9
droughtcrpdmg <- subset(sumeconomicconsequences, EVTYPE=="DROUGHT"&DMTYPE=="CROPDMG", TOTAL)
droughtcrpdmg <- droughtcrpdmg/10^9
It is possible to see that, across the United States, flood (with around $144.7 billion) has the highest property damage value and drought (with around $13.97 billion) has the highest crop damage value.But the total of $150.3 billion puts flood as the event with greatest economic consequences.