Severe weather leads to health and economic consequences including injury, death, property, and crop damage. In this report we aim to examine the health and economic impact of severe weather events. More specifically data will be examined to determine which severe weather events have the greatest health and economic impact. Health impacts are to be measured through examination of injuries and fatalities while economic impacts are measured through property and crop damage. The data for this analysis is provided in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and covers a period from 1950 through 2011.
Source data is provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in a compressed format. The compression format is known by the extention bz2. This format can be read directly.
stormUS <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), stringsAsFactors=FALSE)
Health impacts are to be measured through examination of injuries and fatalities while economic impacts are measured through property and crop damage. Property and crop damage must be calculated by combining fields in the source data. The PROPDMGEXP and CROPDMGEXP provide the multiple to apply to PROPDMG and CROPDMG respectively. The product of these yield the economic impact to property and crop damage.
Define a function for converting the multipliers from a character identifier to the corresponding numeric values
GetMultiple <- function(strMult)
{
if(strMult=="H")
{
mult <- 100
}
else if(strMult=="K")
{
mult <- 1000
}
else if(strMult=="M")
{
mult <- 1000000
}
else if(strMult=="B")
{
mult <- 1000000000
}
else
{
mult <- 0
}
return(mult)
}
Create a simpler data frame with only the necessary variables. Calculate the property and crop damage using the function defined above. In addition, a variable called ECONDMG that is the sum of property damage and crop damage is introduced.
dat <- data.frame(EVTYPE=stormUS$EVTYPE, INJURIES=stormUS$INJURIES, FATALITIES=stormUS$FATALITIES)
dat$PROPDMG = stormUS$PROPDMG*sapply(stormUS$PROPDMGEXP, GetMultiple)
dat$CROPDMG = stormUS$CROPDMG*sapply(stormUS$CROPDMGEXP, GetMultiple)
dat$ECONDMG = dat$PROPDMG + dat$CROPDMG
head(dat)
## EVTYPE INJURIES FATALITIES PROPDMG CROPDMG ECONDMG
## 1 TORNADO 15 0 25000 0 25000
## 2 TORNADO 0 0 2500 0 2500
## 3 TORNADO 2 0 25000 0 25000
## 4 TORNADO 2 0 2500 0 2500
## 5 TORNADO 2 0 2500 0 2500
## 6 TORNADO 6 0 2500 0 2500
Upon viewing the data, it appears that combining events makes sense in some cases. An example includes THUNDERSTORM WIND,THUNDERSTORM WINDS, and TSTM WIND. In this study, these are all replaced with THUNDERSTORM WIND. Upon inspection, many of these examples of “name dispersion” have a dominant name reducing the need to combine all names.
library(plyr)
dat$EVTYPE <- replace(dat$EVTYPE, dat$EVTYPE=="THUNDERSTORM WINDS", "THUNDERSTORM WIND")
dat$EVTYPE <- replace(dat$EVTYPE, dat$EVTYPE=="TSTM WIND", "THUNDERSTORM WIND")
summaryDat <- ddply(dat, .(EVTYPE), summarize, INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES), PROPDMG=sum(PROPDMG), CROPDMG=sum(CROPDMG), ECONDMG=sum(ECONDMG))
The weather events leading to top 5 injuries, deaths, and total economic impact are identified and graphed below.
library(ggplot2)
top5Injuries <- head(summaryDat[order(summaryDat$INJURIES, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Injuries$EVTYPE, Injuries=top5Injuries$INJURIES), big.mark=","), row.names=F)
## Event Injuries
## TORNADO 91,346
## THUNDERSTORM WIND 9,353
## FLOOD 6,789
## EXCESSIVE HEAT 6,525
## LIGHTNING 5,230
ggplot(top5Injuries, aes(x=factor(1), y=INJURIES, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")
The figure above shows the 5 largest weather related causes of injury. Tornado clearly has caused the greatest number of injuries.
top5Deaths <- head(summaryDat[order(summaryDat$FATALITIES, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Deaths$EVTYPE, Deaths=top5Deaths$FATALITIES), big.mark=","), row.names=F)
## Event Deaths
## TORNADO 5,633
## EXCESSIVE HEAT 1,903
## FLASH FLOOD 978
## HEAT 937
## LIGHTNING 816
ggplot(top5Deaths, aes(x=factor(1), y=FATALITIES, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")+ylab("DEATHS")
The figure above shows the 5 largest weather related causes of death. Tornado clearly has caused the greatest number of deaths.
top5Econ <- head(summaryDat[order(summaryDat$ECONDMG, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Econ$EVTYPE, Property=top5Econ$PROPDMG, Crops=top5Econ$CROPDMG, Economic=top5Econ$ECONDMG), big.mark=","), row.names=F)
## Event Property Crops Economic
## FLOOD 144,657,709,800 5,661,968,450 150,319,678,250
## HURRICANE/TYPHOON 69,305,840,000 2,607,872,800 71,913,712,800
## TORNADO 56,925,660,480 414,953,110 57,340,613,590
## STORM SURGE 43,323,536,000 5,000 43,323,541,000
## HAIL 15,727,367,220 3,025,537,450 18,752,904,670
ggplot(top5Econ, aes(x=factor(1), y=ECONDMG, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")+ylab("ECONOMIC IMPACT (PROPERTY + CROP)")
The figure above shows the 5 largest weather related causes of economic loss. Flooding has historically caused the greatest economic loss.