Health and Economic Impacts from Severe Storms in US (NOAA Storm Database analysis)

The following analysis uses historical data from NOAA Storm Database. Our goal is to understand what types of storms have most impact on population health and economic. We cleaned the data, created summary measurements for variables of interest. Analysis shows that TORNADO is teh most dangerous factor interms of public health while FLOOD is teh moste damaging for economy.

Data Processing

The following section describes all the manipulation performed on data including reading, recoding, creating summary variables.

First, open libraries needed and read data from archived csv file.

library(descr)
library(plyr)
library(ggplot2)
indata <- read.csv("StormData.csv.bz2", stringsAsFactor = FALSE)

Create new variable - HEALTHT - which summarize numbers od fatalities and injuries. I decided to use a coefficient 10 for injuries becouse it's roughly equal to ratio in means for two variables of interest.

indata$HEALTHT <- indata$FATALITIES * 10 + indata$INJURIES

The economical data is recorded as property damage and crops damage. Data recorded in two variables for each type where firs is a number and the second indicate unit - K, M, B. I recode all the data into millions and then create summary variable for economic damage.

indata$PROPMULT <- 0
indata$PROPMULT[indata$PROPDMGEXP == "K"] <- 0.001
indata$PROPMULT[indata$PROPDMGEXP == "K"] <- 0.001
indata$PROPMULT[indata$PROPDMGEXP == "m"] <- 1
indata$PROPMULT[indata$PROPDMGEXP == "M"] <- 1
indata$PROPMULT[indata$PROPDMGEXP == "b"] <- 1000
indata$PROPMULT[indata$PROPDMGEXP == "B"] <- 1000

indata$PROPDMGr <- indata$PROPDMG * indata$PROPMULT

indata$CROPMULT <- 0
indata$CROPMULT[indata$CROPDMGEXP == "k"] <- 0.001
indata$CROPMULT[indata$CROPDMGEXP == "K"] <- 0.001
indata$CROPMULT[indata$CROPDMGEXP == "m"] <- 1
indata$CROPMULT[indata$CROPDMGEXP == "M"] <- 1
indata$CROPMULT[indata$CROPDMGEXP == "b"] <- 1000
indata$CROPMULT[indata$CROPDMGEXP == "B"] <- 1000

indata$CROPDMGr <- indata$CROPDMG * indata$CROPMULT

indata$TOTDMGr.M <- indata$PROPDMGr + indata$CROPDMGr

Recode EVTYPE to a factor variable for further analysis and charting

indata$EVENTT <- as.factor(indata$EVTYPE)

Results

Storms impact on Population Health

The data file consists limited information on storms effect on public health - just number of fatalities and injuries. Let's have a look at these results along with the summary variable (Health Damage Index) calculated above.

summHtable <- ddply(indata, .(EVENTT), summarize, StormHD.I = sum(HEALTHT), 
    StormHD.F = sum(FATALITIES), StormHD.Inj = sum(INJURIES))
summHtable <- arrange(summHtable, desc(StormHD.I))
summHtable10 <- summHtable[1:10, ]
summHtable10
##            EVENTT StormHD.I StormHD.F StormHD.Inj
## 1         TORNADO    147676      5633       91346
## 2  EXCESSIVE HEAT     25555      1903        6525
## 3       LIGHTNING     13390       816        5230
## 4       TSTM WIND     11997       504        6957
## 5     FLASH FLOOD     11557       978        1777
## 6           FLOOD     11489       470        6789
## 7            HEAT     11470       937        2100
## 8     RIP CURRENT      3912       368         232
## 9       HIGH WIND      3617       248        1137
## 10   WINTER STORM      3381       206        1321
mxlimits <- as.character(summHtable10$EVENTT)
ggplot(summHtable10, aes(x = EVENTT, y = StormHD.I)) + xlim(mxlimits) + geom_line(aes(group = 1), 
    colour = "#000099") + geom_point(size = 3, colour = "#CC0000") + ggtitle("Most harmful events with respect to population health") + 
    xlab("event") + ylab("Health Damage Index (10*FATALITIES+INJURIES)")

plot of chunk unnamed-chunk-5

The most dangerous storm is TORNADO for all teh parameters measured - summary index, fatalities and injuries.

Storms damage to Economic

Run the same analysis for economical damage data.

summEtable <- ddply(indata, .(EVENTT), summarize, StormED.M = sum(TOTDMGr.M), 
    StormED.P.M = sum(PROPDMGr), StormED.C.M = sum(CROPDMGr))
summEtable <- arrange(summEtable, desc(StormED.M))
summEtable10 <- summEtable[1:10, ]
summEtable10
##               EVENTT StormED.M StormED.P.M StormED.C.M
## 1              FLOOD    150320      144658    5661.968
## 2  HURRICANE/TYPHOON     71914       69306    2607.873
## 3            TORNADO     57352       56937     414.953
## 4        STORM SURGE     43324       43324       0.005
## 5               HAIL     18758       15732    3025.954
## 6        FLASH FLOOD     17562       16141    1421.317
## 7            DROUGHT     15019        1046   13972.566
## 8          HURRICANE     14610       11868    2741.910
## 9        RIVER FLOOD     10148        5119    5029.459
## 10         ICE STORM      8967        3945    5022.114
mxlimits <- as.character(summEtable10$EVENTT)
ggplot(summEtable10, aes(x = EVENTT, y = StormED.M)) + xlim(mxlimits) + geom_line(aes(group = 1), 
    colour = "#000099") + geom_point(size = 3, colour = "#CC0000") + ggtitle("Events with the greatest economic consequences") + 
    xlab("event") + ylab("Total Economical damage (M)")

plot of chunk unnamed-chunk-6

The most economically damaging event is FLOOD, followed by group of three - HURRICANE/TYPHOON, TORNADO and STORM SURGE.

Conclusions

The most dangerous event is TORNADO, the most costly - FLOOD. Would be interestiong to run similar analysis on regions base.