Synopsis

This report is to examine the NOAA Storm Database and identify the following:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

This analysis uses the EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP and CRODMGEXP to classify damage by event in terms of population health and economic damage.

In the analysis, we find that Tornadoes, excessive heat, marine thunderstorm winds, flood, lighting to be the most damaging to population health, in that order. We find floods, typhoons, storm surges, droughts and hurricanes to be the most damaging economically, in that order.

These results have a loose correlation with a study conducted by University of South Carolina researchers: http://www.prb.org/Publications/Articles/2011/disasters-by-type.aspx - although the variable names and disaster type groupings are different, similar events show up within our top 10 and theirs.

Data Processing

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "Stormdata.csv.bz2")
StormData <- read.csv("Stormdata.csv.bz2")
StormData <- StormData %>% group_by(EVTYPE)

#check if data loaded properly
head(StormData)
## Source: local data frame [6 x 37]
## Groups: EVTYPE [1]
## 
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME  STATE
##     <dbl>             <fctr>   <fctr>    <fctr>  <dbl>     <fctr> <fctr>
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE     AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN     AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE     AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON     AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN     AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE     AL
## # ... with 30 more variables: EVTYPE <fctr>, BGN_RANGE <dbl>,
## #   BGN_AZI <fctr>, BGN_LOCATI <fctr>, END_DATE <fctr>, END_TIME <fctr>,
## #   COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <fctr>,
## #   END_LOCATI <fctr>, LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>,
## #   FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <fctr>,
## #   CROPDMG <dbl>, CROPDMGEXP <fctr>, WFO <fctr>, STATEOFFIC <fctr>,
## #   ZONENAMES <fctr>, LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>,
## #   LONGITUDE_ <dbl>, REMARKS <fctr>, REFNUM <dbl>

Analyzing damage to humans

The dataset contains two variables pertaining to human population damage: injuries and fatalities. Since we need to collapse both these factors to one number for ranking, I’ve chosen to weight each INJURY as 1/2 of a FATALITY. The tentative logic here is that small injuries (like scratches) would not have been reported, while things like broken bones, which are serious damage, are most likely to show up here.

#isolate the variables we need
HumanDamage <- subset(StormData, select = c(EVTYPE, INJURIES, FATALITIES))
#group, then aggregate data
HumanDamage <- group_by(HumanDamage, EVTYPE)
HumanDamage<- summarise(HumanDamage, INJURIES = sum(INJURIES), FATALITIES = sum(FATALITIES))
#drop cases of zero injuries or fatalities
HumanDamage<- subset(HumanDamage, INJURIES > 0 | FATALITIES > 0)
HumanDamage$DAMAGE <- (HumanDamage$FATALITIES + (HumanDamage$INJURIES / 2))
#sort descending
HumanDamage <- HumanDamage[order(-HumanDamage$DAMAGE),]

Analyzing economic damage

There are four variables pertaining to economic damage: CROPGMG, PROPDMG, and their relative …EXP variables. The …EXP variables are exponents of the values given, and are represented in letters (K = 1000, M=1000000, B=1000000000). There’s no need to weight things here, but a bit of conversion is required before we can calculate the final damage done.

#isolate the variables we need
EconDamage <- subset(StormData, select = c(EVTYPE, CROPDMG, CROPDMGEXP, PROPDMG, PROPDMGEXP))

#this dataset contains NA values that will need to be converted to 0 for the rest of this to work
EconDamage$CROPDMGEXP <- as.character(EconDamage$CROPDMGEXP)
EconDamage$PROPDMGEXP <- as.character(EconDamage$PROPDMGEXP)

#the ...EXP variables indicate the magnitude of the value in the associated column. They're represented
#on a scale of 1000 (K) to 1000000 (M) to 1000000000 (B). Convert these to power of exponents.

EconDamage[(EconDamage$PROPDMGEXP == ""), ]$PROPDMGEXP <- 0
EconDamage[(EconDamage$PROPDMGEXP == "K" | EconDamage$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
EconDamage[(EconDamage$PROPDMGEXP == "M" | EconDamage$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
EconDamage[(EconDamage$PROPDMGEXP == "B" | EconDamage$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9

EconDamage[(EconDamage$CROPDMGEXP == ""), ]$CROPDMGEXP <- 0
EconDamage[(EconDamage$CROPDMGEXP == "K" | EconDamage$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
EconDamage[(EconDamage$CROPDMGEXP == "M" | EconDamage$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
EconDamage[(EconDamage$CROPDMGEXP == "B" | EconDamage$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9


#compute the damage thus inflicted by multiplying property and crops damage by 10 ^ power of the associared exponent
EconDamage$PROPDMG <- EconDamage$PROPDMG * 10^as.numeric(EconDamage$PROPDMGEXP) 
## Warning: NAs introduced by coercion
EconDamage$CROPDMG <- EconDamage$CROPDMG * 10^as.numeric(EconDamage$CROPDMGEXP)
## Warning: NAs introduced by coercion
# compute combined economic damage (property damage + crops damage)
EconDamage$DAMAGE <- (EconDamage$PROPDMG + EconDamage$CROPDMG)

#group, then aggregate data
EconDamage <- group_by(EconDamage, EVTYPE)
EconDamage<- summarise(EconDamage, CROPDMG = sum(CROPDMG), PROPDMG = sum(PROPDMG), DAMAGE = sum(DAMAGE))

#drop cases of zero damage
EconDamage<- subset(EconDamage, DAMAGE > 0)

#sort descending
EconDamage <- EconDamage[order(-EconDamage$DAMAGE),]

Results

Which events cause the most amount of impact to population health?

#print top 10 most damaging events ranked by damage
head(HumanDamage, 10)
## # A tibble: 10 × 4
##               EVTYPE INJURIES FATALITIES  DAMAGE
##               <fctr>    <dbl>      <dbl>   <dbl>
## 1            TORNADO    91346       5633 51306.0
## 2     EXCESSIVE HEAT     6525       1903  5165.5
## 3          TSTM WIND     6957        504  3982.5
## 4              FLOOD     6789        470  3864.5
## 5          LIGHTNING     5230        816  3431.0
## 6               HEAT     2100        937  1987.0
## 7        FLASH FLOOD     1777        978  1866.5
## 8          ICE STORM     1975         89  1076.5
## 9  THUNDERSTORM WIND     1488        133   877.0
## 10      WINTER STORM     1321        206   866.5

Which events have the largest economic consequences?

#print top 10 most damaging events ranked by damage
head(EconDamage, 10)
## # A tibble: 10 × 4
##               EVTYPE     CROPDMG      PROPDMG       DAMAGE
##               <fctr>       <dbl>        <dbl>        <dbl>
## 1              FLOOD  5661968450 144657709807 150319678257
## 2  HURRICANE/TYPHOON  2607872800  69305840000  71913712800
## 3        STORM SURGE        5000  43323536000  43323541000
## 4            DROUGHT 13972566000   1046106000  15018672000
## 5          HURRICANE  2741910000  11868319010  14610229010
## 6        RIVER FLOOD  5029459000   5118945500  10148404500
## 7          ICE STORM  5022113500   3944927860   8967041360
## 8     TROPICAL STORM   678346000   7703890550   8382236550
## 9       WINTER STORM    26944000   6688497251   6715441251
## 10          WILDFIRE   295472800   4765114000   5060586800
#plot top 5 events by economic damage caused
testplot <- head(EconDamage, 5)
ggplot(testplot, aes(x = EVTYPE, y = DAMAGE)) + geom_bar(stat = "Identity")