Synopsis

Severe weather leads to health and economic consequences including injury, death, property, and crop damage. In this report we aim to examine the health and economic impact of severe weather events. More specifically data will be examined to determine which severe weather events have the greatest health and economic impact. Health impacts are to be measured through examination of injuries and fatalities while economic impacts are measured through property and crop damage. The data for this analysis is provided in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and covers a period from 1950 through 2011.

Data Processing

Source data is provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in a compressed format. The compression format is known by the extention bz2. This format can be read directly.

stormUS <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), stringsAsFactors=FALSE)

Isolate and process variables of interest

Health impacts are to be measured through examination of injuries and fatalities while economic impacts are measured through property and crop damage. Property and crop damage must be calculated by combining fields in the source data. The PROPDMGEXP and CROPDMGEXP provide the multiple to apply to PROPDMG and CROPDMG respectively. The product of these yield the economic impact to property and crop damage.

Define necessary functions

Define a function for converting the multipliers from a character identifier to the corresponding numeric values

GetMultiple <- function(strMult)
{
  if(strMult=="H")
  {
    mult <- 100
  }
  else if(strMult=="K")
  {
    mult <- 1000
  }
  else if(strMult=="M")
  {
    mult <- 1000000
  }
  else if(strMult=="B")
  {
    mult <- 1000000000
  }
  else
  {
    mult <- 0
  }
  return(mult)
}

Create a data frame with only the variables of interest.

Create a simpler data frame with only the necessary variables. Calculate the property and crop damage using the function defined above. In addition, a variable called ECONDMG that is the sum of property damage and crop damage is introduced.

dat <- data.frame(EVTYPE=stormUS$EVTYPE, INJURIES=stormUS$INJURIES, FATALITIES=stormUS$FATALITIES)
dat$PROPDMG = stormUS$PROPDMG*sapply(stormUS$PROPDMGEXP, GetMultiple)
dat$CROPDMG = stormUS$CROPDMG*sapply(stormUS$CROPDMGEXP, GetMultiple)
dat$ECONDMG = dat$PROPDMG + dat$CROPDMG
head(dat)
##    EVTYPE INJURIES FATALITIES PROPDMG CROPDMG ECONDMG
## 1 TORNADO       15          0   25000       0   25000
## 2 TORNADO        0          0    2500       0    2500
## 3 TORNADO        2          0   25000       0   25000
## 4 TORNADO        2          0    2500       0    2500
## 5 TORNADO        2          0    2500       0    2500
## 6 TORNADO        6          0    2500       0    2500

Group the processed data by weather event type.

Upon viewing the data, it appears that combining events makes sense in some cases. An example includes THUNDERSTORM WIND,THUNDERSTORM WINDS, and TSTM WIND. In this study, these are all replaced with THUNDERSTORM WIND. Upon inspection, many of these examples of “name dispersion” have a dominant name reducing the need to combine all names.

library(plyr)
dat$EVTYPE <- replace(dat$EVTYPE, dat$EVTYPE=="THUNDERSTORM WINDS", "THUNDERSTORM WIND")
dat$EVTYPE <- replace(dat$EVTYPE, dat$EVTYPE=="TSTM WIND", "THUNDERSTORM WIND")
summaryDat <- ddply(dat, .(EVTYPE), summarize, INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES), PROPDMG=sum(PROPDMG), CROPDMG=sum(CROPDMG), ECONDMG=sum(ECONDMG))

Results

The weather events leading to top 5 injuries, deaths, and total economic impact are identified and graphed below.

Injuries

library(ggplot2)
top5Injuries <- head(summaryDat[order(summaryDat$INJURIES, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Injuries$EVTYPE, Injuries=top5Injuries$INJURIES), big.mark=","), row.names=F)
##              Event Injuries
##            TORNADO   91,346
##  THUNDERSTORM WIND    9,353
##              FLOOD    6,789
##     EXCESSIVE HEAT    6,525
##          LIGHTNING    5,230
ggplot(top5Injuries, aes(x=factor(1), y=INJURIES, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")

The figure above shows the 5 largest weather related causes of injury. Tornado clearly has caused the greatest number of injuries.

Deaths

top5Deaths <- head(summaryDat[order(summaryDat$FATALITIES, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Deaths$EVTYPE, Deaths=top5Deaths$FATALITIES), big.mark=","), row.names=F)
##           Event Deaths
##         TORNADO  5,633
##  EXCESSIVE HEAT  1,903
##     FLASH FLOOD    978
##            HEAT    937
##       LIGHTNING    816
ggplot(top5Deaths, aes(x=factor(1), y=FATALITIES, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")+ylab("DEATHS")

The figure above shows the 5 largest weather related causes of death. Tornado clearly has caused the greatest number of deaths.

Economic Loss

top5Econ <- head(summaryDat[order(summaryDat$ECONDMG, decreasing=TRUE),],5)
print(format(data.frame(Event=top5Econ$EVTYPE, Property=top5Econ$PROPDMG, Crops=top5Econ$CROPDMG, Economic=top5Econ$ECONDMG), big.mark=","), row.names=F)
##              Event        Property         Crops        Economic
##              FLOOD 144,657,709,800 5,661,968,450 150,319,678,250
##  HURRICANE/TYPHOON  69,305,840,000 2,607,872,800  71,913,712,800
##            TORNADO  56,925,660,480   414,953,110  57,340,613,590
##        STORM SURGE  43,323,536,000         5,000  43,323,541,000
##               HAIL  15,727,367,220 3,025,537,450  18,752,904,670
ggplot(top5Econ, aes(x=factor(1), y=ECONDMG, fill=factor(EVTYPE))) + geom_bar(stat="identity") + coord_polar(theta="y")+xlab("")+labs(fill="Event")+ylab("ECONOMIC IMPACT (PROPERTY + CROP)")

The figure above shows the 5 largest weather related causes of economic loss. Flooding has historically caused the greatest economic loss.