We use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, consisting of data collected between 1950-2011 to investigate the effects of major storms and weather events have on the United States.
We aim to achieve the above by basing our analyses around the following two questions:
In this report we also provide analyses which will prove useful in the future for prioritizing resources depending on the types of events in order to attenuate health and economic issues brought on by severe weather events in the United States.
#loading libraries
library(xtable)
library(gridExtra)
## Loading required package: grid
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
##
## Attaching package: 'tidyr'
##
## The following object is masked from 'package:R.utils':
##
## extract
#Set ggplot theme to black and white
theme_set(theme_bw())
#reads in data
filename = "data/repdata_data_StormData.csv.bz2"
if(!file.exists(filename)) {
download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile=filename, method="curl")
}
csvFile <- "data/repdata_data_StormData.csv"
if(!file.exists(filename)) {
bunzip2(zipFile, destname = csvFile, overwrite=TRUE, remove=FALSE)
}
data = tbl_df(read.csv("data/repdata_data_StormData.csv"))
numEVTYPE = length(unique(data$EVTYPE))
str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "000","0000","0001",..: 152 167 2645 1563 2524 3126 122 1563 3126 3126 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Of the 985 types of weather events documented, we set out to find which of these were most harmful to (1) population heal and (2) economic consequences
#Fatalities
pubHealth = data %>%
group_by(EVTYPE) %>%
summarise(
totFatal = sum(FATALITIES),
totInjuries = sum(INJURIES)
)
pubHealth.long =
pubHealth %>%
filter(totFatal >0 & totInjuries >0) %>%
gather(type.of.dmg, count , -EVTYPE)
#Plot
p0 =ggplot(pubHealth.long, aes(x=reorder(EVTYPE, log10(count)), y=log10(count), fill=type.of.dmg))+
geom_bar(stat="identity")+#, position="dodge")+
scale_fill_brewer("Type of Dmg", labels=c("Fatalities", "Injuries"), palette="Set1")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1))+
xlab("Weather Event")+
ylab("Frequency (log10)")
p1 = ggplot(subset(pubHealth.long, type.of.dmg == 'totFatal'),
aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
geom_bar(fill="#e41a1c", stat="identity")+#, position="dodge")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
xlab("Weather Event")+
ylab("Frequency (log10)")+
ggtitle("Fatalities")
p2 =ggplot(subset(pubHealth.long, type.of.dmg == 'totInjuries'),
aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
geom_bar(fill="#377eb8", stat="identity")+#, position="dodge")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
xlab("Weather Event")+
ylab("Frequency (log10)")+
ggtitle("Injuries")
grid.arrange(p0, arrangeGrob(p1, p2, nrow=1), ncol=1)
The top 10 events in terms of fatalities observed are shown in the following tables.
| EVTYPE | totFatal |
|---|---|
| TORNADO | 5633.00 |
| EXCESSIVE HEAT | 1903.00 |
| FLASH FLOOD | 978.00 |
| HEAT | 937.00 |
| LIGHTNING | 816.00 |
| TSTM WIND | 504.00 |
| FLOOD | 470.00 |
| RIP CURRENT | 368.00 |
| HIGH WIND | 248.00 |
| AVALANCHE | 224.00 |
| EVTYPE | totInjuries |
|---|---|
| TORNADO | 91346.00 |
| TSTM WIND | 6957.00 |
| FLOOD | 6789.00 |
| EXCESSIVE HEAT | 6525.00 |
| LIGHTNING | 5230.00 |
| HEAT | 2100.00 |
| ICE STORM | 1975.00 |
| FLASH FLOOD | 1777.00 |
| THUNDERSTORM WIND | 1488.00 |
| HAIL | 1361.00 |
We access the economic effects through damages sustained by property and crops.
#Part1::Property damage
# convert PROPDMGEXP in upper case
data$PROPDMGEXP = toupper(data$PROPDMGEXP)
# transformation of Prop multiplier
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
data$PROPEXP[data$PROPDMGEXP == ""] <- 1
# convert to Prod Damage Value
data$property.value.dmg = data$PROPEXP * data$PROPDMG
#Part2 Crop damage
# convert CROPDMGEXP in upper case
data$CROPDMGEXP = toupper(data$CROPDMGEXP)
# transformation of Crop multiplier
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPEXP[data$CROPDMGEXP == ""] <- 1
# convert to Crop Damage Value
data$crop.value.dmg = data$CROPEXP * data$CROPDMG
economy = data %>%
group_by(EVTYPE) %>%
summarise(
totPropDmg = sum(property.value.dmg),
totCropDmg = sum(crop.value.dmg)
)
economy.long =
economy %>%
filter(totPropDmg >0 & totCropDmg >0) %>%
gather(type.of.dmg, count , -EVTYPE)
#Plot
p0 =ggplot(economy.long, aes(x=reorder(EVTYPE, log10(count)), y=log10(count), fill=type.of.dmg))+
geom_bar(stat="identity")+#, position="dodge")+
scale_fill_brewer("Type of Dmg", labels=c("Property", "Crop"), palette="Set1")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1))+
xlab("Weather Event")+
ylab("Frequency (log10)")
p1 = ggplot(subset(economy.long, type.of.dmg == 'totPropDmg'),
aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
geom_bar(fill="#e41a1c", stat="identity")+#, position="dodge")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
xlab("Weather Event")+
ylab("Frequency (log10)")+
ggtitle("PropertyDamages")
p2 =ggplot(subset(economy.long, type.of.dmg == 'totCropDmg'),
aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
geom_bar(fill="#377eb8", stat="identity")+#, position="dodge")+
theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
xlab("Weather Event")+
ylab("Frequency (log10)")+
ggtitle("CropDamages")
grid.arrange(p0, arrangeGrob(p1, p2, nrow=1), ncol=1)
The top 10 events in terms of property and crop damage observed are shown in the following table, with floods and heat waves being the main culprits.
| EVTYPE | totPropDmg |
|---|---|
| FLOOD | 144657709807.00 |
| HURRICANE/TYPHOON | 69305840000.00 |
| TORNADO | 56947380616.50 |
| STORM SURGE | 43323536000.00 |
| FLASH FLOOD | 16822673978.50 |
| HAIL | 15735267512.70 |
| HURRICANE | 11868319010.00 |
| TROPICAL STORM | 7703890550.00 |
| WINTER STORM | 6688497251.00 |
| HIGH WIND | 5270046260.00 |
| EVTYPE | totCropDmg |
|---|---|
| DROUGHT | 13972566000.00 |
| FLOOD | 5661968450.00 |
| RIVER FLOOD | 5029459000.00 |
| ICE STORM | 5022113500.00 |
| HAIL | 3025954473.00 |
| HURRICANE | 2741910000.00 |
| HURRICANE/TYPHOON | 2607872800.00 |
| FLASH FLOOD | 1421317100.00 |
| EXTREME COLD | 1292973000.00 |
| FROST/FREEZE | 1094086000.00 |
Based on our analyses, tornados seem cause the most number of injuries and death.
Whereas it is floods which cause the most property damage and droughts for crop damage.
Thus we suggest that the environmental agencies do set up the necessary protocols to deal promptly and decisively when any of the above are likely. As well as set up advance warning systems for tornadoes. And invest heavily in flood prevention in flood prone areas and drought remediation in drought prone areas.