The following analysis uses historical data from NOAA Storm Database. Our goal is to understand what types of storms have most impact on population health and economic. We cleaned the data, created summary measurements for variables of interest. Analysis shows that TORNADO is teh most dangerous factor interms of public health while FLOOD is teh moste damaging for economy.
The following section describes all the manipulation performed on data including reading, recoding, creating summary variables.
First, open libraries needed and read data from archived csv file.
library(descr)
library(plyr)
library(ggplot2)
indata <- read.csv("StormData.csv.bz2", stringsAsFactor = FALSE)
Create new variable - HEALTHT - which summarize numbers od fatalities and injuries. I decided to use a coefficient 10 for injuries becouse it's roughly equal to ratio in means for two variables of interest.
indata$HEALTHT <- indata$FATALITIES * 10 + indata$INJURIES
The economical data is recorded as property damage and crops damage. Data recorded in two variables for each type where firs is a number and the second indicate unit - K, M, B. I recode all the data into millions and then create summary variable for economic damage.
indata$PROPMULT <- 0
indata$PROPMULT[indata$PROPDMGEXP == "K"] <- 0.001
indata$PROPMULT[indata$PROPDMGEXP == "K"] <- 0.001
indata$PROPMULT[indata$PROPDMGEXP == "m"] <- 1
indata$PROPMULT[indata$PROPDMGEXP == "M"] <- 1
indata$PROPMULT[indata$PROPDMGEXP == "b"] <- 1000
indata$PROPMULT[indata$PROPDMGEXP == "B"] <- 1000
indata$PROPDMGr <- indata$PROPDMG * indata$PROPMULT
indata$CROPMULT <- 0
indata$CROPMULT[indata$CROPDMGEXP == "k"] <- 0.001
indata$CROPMULT[indata$CROPDMGEXP == "K"] <- 0.001
indata$CROPMULT[indata$CROPDMGEXP == "m"] <- 1
indata$CROPMULT[indata$CROPDMGEXP == "M"] <- 1
indata$CROPMULT[indata$CROPDMGEXP == "b"] <- 1000
indata$CROPMULT[indata$CROPDMGEXP == "B"] <- 1000
indata$CROPDMGr <- indata$CROPDMG * indata$CROPMULT
indata$TOTDMGr.M <- indata$PROPDMGr + indata$CROPDMGr
Recode EVTYPE to a factor variable for further analysis and charting
indata$EVENTT <- as.factor(indata$EVTYPE)
The data file consists limited information on storms effect on public health - just number of fatalities and injuries. Let's have a look at these results along with the summary variable (Health Damage Index) calculated above.
summHtable <- ddply(indata, .(EVENTT), summarize, StormHD.I = sum(HEALTHT),
StormHD.F = sum(FATALITIES), StormHD.Inj = sum(INJURIES))
summHtable <- arrange(summHtable, desc(StormHD.I))
summHtable10 <- summHtable[1:10, ]
summHtable10
## EVENTT StormHD.I StormHD.F StormHD.Inj
## 1 TORNADO 147676 5633 91346
## 2 EXCESSIVE HEAT 25555 1903 6525
## 3 LIGHTNING 13390 816 5230
## 4 TSTM WIND 11997 504 6957
## 5 FLASH FLOOD 11557 978 1777
## 6 FLOOD 11489 470 6789
## 7 HEAT 11470 937 2100
## 8 RIP CURRENT 3912 368 232
## 9 HIGH WIND 3617 248 1137
## 10 WINTER STORM 3381 206 1321
mxlimits <- as.character(summHtable10$EVENTT)
ggplot(summHtable10, aes(x = EVENTT, y = StormHD.I)) + xlim(mxlimits) + geom_line(aes(group = 1),
colour = "#000099") + geom_point(size = 3, colour = "#CC0000") + ggtitle("Most harmful events with respect to population health") +
xlab("event") + ylab("Health Damage Index (10*FATALITIES+INJURIES)")
The most dangerous storm is TORNADO for all teh parameters measured - summary index, fatalities and injuries.
Run the same analysis for economical damage data.
summEtable <- ddply(indata, .(EVENTT), summarize, StormED.M = sum(TOTDMGr.M),
StormED.P.M = sum(PROPDMGr), StormED.C.M = sum(CROPDMGr))
summEtable <- arrange(summEtable, desc(StormED.M))
summEtable10 <- summEtable[1:10, ]
summEtable10
## EVENTT StormED.M StormED.P.M StormED.C.M
## 1 FLOOD 150320 144658 5661.968
## 2 HURRICANE/TYPHOON 71914 69306 2607.873
## 3 TORNADO 57352 56937 414.953
## 4 STORM SURGE 43324 43324 0.005
## 5 HAIL 18758 15732 3025.954
## 6 FLASH FLOOD 17562 16141 1421.317
## 7 DROUGHT 15019 1046 13972.566
## 8 HURRICANE 14610 11868 2741.910
## 9 RIVER FLOOD 10148 5119 5029.459
## 10 ICE STORM 8967 3945 5022.114
mxlimits <- as.character(summEtable10$EVENTT)
ggplot(summEtable10, aes(x = EVENTT, y = StormED.M)) + xlim(mxlimits) + geom_line(aes(group = 1),
colour = "#000099") + geom_point(size = 3, colour = "#CC0000") + ggtitle("Events with the greatest economic consequences") +
xlab("event") + ylab("Total Economical damage (M)")
The most economically damaging event is FLOOD, followed by group of three - HURRICANE/TYPHOON, TORNADO and STORM SURGE.
The most dangerous event is TORNADO, the most costly - FLOOD. Would be interestiong to run similar analysis on regions base.