Reproducible Research: Peer Assessment 2

Consequences of weather events on public health and the economy of the United States

Synopsis

We use the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, consisting of data collected between 1950-2011 to investigate the effects of major storms and weather events have on the United States.

We aim to achieve the above by basing our analyses around the following two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

In this report we also provide analyses which will prove useful in the future for prioritizing resources depending on the types of events in order to attenuate health and economic issues brought on by severe weather events in the United States.

Data Processing

#loading libraries
library(xtable)
library(gridExtra)
## Loading required package: grid
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## 
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## 
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## 
## R.utils v1.34.0 (2014-10-07) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## 
## The following object is masked from 'package:utils':
## 
##     timestamp
## 
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## 
## Attaching package: 'tidyr'
## 
## The following object is masked from 'package:R.utils':
## 
##     extract
#Set ggplot theme to black and white
theme_set(theme_bw())
#reads in data
filename = "data/repdata_data_StormData.csv.bz2"
if(!file.exists(filename)) {
        download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile=filename, method="curl")
}
csvFile <- "data/repdata_data_StormData.csv"
if(!file.exists(filename)) {
        bunzip2(zipFile, destname = csvFile, overwrite=TRUE, remove=FALSE)
}

data      = tbl_df(read.csv("data/repdata_data_StormData.csv"))
numEVTYPE = length(unique(data$EVTYPE))
str(data)
## Classes 'tbl_df', 'tbl' and 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "000","0000","0001",..: 152 167 2645 1563 2524 3126 122 1563 3126 3126 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Of the 985 types of weather events documented, we set out to find which of these were most harmful to (1) population heal and (2) economic consequences

Public Health

#Fatalities 
pubHealth = data %>%
group_by(EVTYPE) %>%
summarise(
    totFatal    = sum(FATALITIES),
    totInjuries = sum(INJURIES)
)

pubHealth.long = 
    pubHealth %>% 
    filter(totFatal >0 & totInjuries >0) %>%
    gather(type.of.dmg, count , -EVTYPE)
#Plot
p0 =ggplot(pubHealth.long, aes(x=reorder(EVTYPE, log10(count)), y=log10(count), fill=type.of.dmg))+
    geom_bar(stat="identity")+#, position="dodge")+
        scale_fill_brewer("Type of Dmg", labels=c("Fatalities", "Injuries"), palette="Set1")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1))+
        xlab("Weather Event")+
        ylab("Frequency (log10)")
p1 = ggplot(subset(pubHealth.long, type.of.dmg == 'totFatal'), 
       aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
    geom_bar(fill="#e41a1c", stat="identity")+#, position="dodge")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
        xlab("Weather Event")+
        ylab("Frequency (log10)")+
        ggtitle("Fatalities")

p2 =ggplot(subset(pubHealth.long, type.of.dmg == 'totInjuries'), 
       aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
    geom_bar(fill="#377eb8", stat="identity")+#, position="dodge")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
        xlab("Weather Event")+
        ylab("Frequency (log10)")+
        ggtitle("Injuries")
grid.arrange(p0, arrangeGrob(p1, p2, nrow=1), ncol=1)

The top 10 events in terms of fatalities observed are shown in the following tables.

EVTYPE totFatal
TORNADO 5633.00
EXCESSIVE HEAT 1903.00
FLASH FLOOD 978.00
HEAT 937.00
LIGHTNING 816.00
TSTM WIND 504.00
FLOOD 470.00
RIP CURRENT 368.00
HIGH WIND 248.00
AVALANCHE 224.00
EVTYPE totInjuries
TORNADO 91346.00
TSTM WIND 6957.00
FLOOD 6789.00
EXCESSIVE HEAT 6525.00
LIGHTNING 5230.00
HEAT 2100.00
ICE STORM 1975.00
FLASH FLOOD 1777.00
THUNDERSTORM WIND 1488.00
HAIL 1361.00

Economy

We access the economic effects through damages sustained by property and crops.

#Part1::Property damage

# convert PROPDMGEXP in upper case
data$PROPDMGEXP = toupper(data$PROPDMGEXP)
# transformation of Prop multiplier
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
data$PROPEXP[data$PROPDMGEXP == ""] <- 1
# convert to Prod Damage Value
data$property.value.dmg = data$PROPEXP * data$PROPDMG

#Part2 Crop damage

# convert CROPDMGEXP in upper case
data$CROPDMGEXP = toupper(data$CROPDMGEXP)
# transformation of Crop multiplier
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPEXP[data$CROPDMGEXP == ""] <- 1
# convert to Crop Damage Value
data$crop.value.dmg = data$CROPEXP * data$CROPDMG
economy = data %>%
group_by(EVTYPE) %>%
summarise(
    totPropDmg = sum(property.value.dmg),
    totCropDmg = sum(crop.value.dmg)
)

economy.long = 
    economy %>% 
    filter(totPropDmg >0 & totCropDmg >0) %>%
    gather(type.of.dmg, count , -EVTYPE)
#Plot
p0 =ggplot(economy.long, aes(x=reorder(EVTYPE, log10(count)), y=log10(count), fill=type.of.dmg))+
    geom_bar(stat="identity")+#, position="dodge")+
        scale_fill_brewer("Type of Dmg", labels=c("Property", "Crop"), palette="Set1")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1))+
        xlab("Weather Event")+
        ylab("Frequency (log10)")

p1 = ggplot(subset(economy.long, type.of.dmg == 'totPropDmg'), 
       aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
    geom_bar(fill="#e41a1c", stat="identity")+#, position="dodge")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
        xlab("Weather Event")+
        ylab("Frequency (log10)")+
        ggtitle("PropertyDamages")

p2 =ggplot(subset(economy.long, type.of.dmg == 'totCropDmg'), 
       aes(x=reorder(EVTYPE, log10(count)), y=log10(count)))+
    geom_bar(fill="#377eb8", stat="identity")+#, position="dodge")+
        theme(axis.text.x=element_text(angle=90, size=5, hjust=1), legend.position="none")+
        xlab("Weather Event")+
        ylab("Frequency (log10)")+
        ggtitle("CropDamages")
grid.arrange(p0, arrangeGrob(p1, p2, nrow=1), ncol=1)

The top 10 events in terms of property and crop damage observed are shown in the following table, with floods and heat waves being the main culprits.

EVTYPE totPropDmg
FLOOD 144657709807.00
HURRICANE/TYPHOON 69305840000.00
TORNADO 56947380616.50
STORM SURGE 43323536000.00
FLASH FLOOD 16822673978.50
HAIL 15735267512.70
HURRICANE 11868319010.00
TROPICAL STORM 7703890550.00
WINTER STORM 6688497251.00
HIGH WIND 5270046260.00
EVTYPE totCropDmg
DROUGHT 13972566000.00
FLOOD 5661968450.00
RIVER FLOOD 5029459000.00
ICE STORM 5022113500.00
HAIL 3025954473.00
HURRICANE 2741910000.00
HURRICANE/TYPHOON 2607872800.00
FLASH FLOOD 1421317100.00
EXTREME COLD 1292973000.00
FROST/FREEZE 1094086000.00

Results

Health effects

Based on our analyses, tornados seem cause the most number of injuries and death.

Economic effects

Whereas it is floods which cause the most property damage and droughts for crop damage.

Conclusion

Thus we suggest that the environmental agencies do set up the necessary protocols to deal promptly and decisively when any of the above are likely. As well as set up advance warning systems for tornadoes. And invest heavily in flood prevention in flood prone areas and drought remediation in drought prone areas.