Population and Economical Costs of Weather events in US (1950-2011)

Synopsis

The aim of this report is to show affect of most dangerous weather events on population and economics across US. As shown below tornadoes cause more deaths and injured people than any other weather event. For economics most damage came from floods (both, property and crop damage). Plot of frequency through years of most dangerous weather events is also presented.

Data Processing

Data was obtaines from Peer Assignment 2 webpage. File was saved as ‘stormdata.csv.bz2’. It is database of major storms and weather events in the United States since beginning of 1950 till November 2011. As database was compressed and was read using read.csv and bzfile commands.

dat <- read.csv(bzfile("stormdata.csv.bz2"))

The data.frame dat has dimensions:

dim(dat)
## [1] 902297     37

And columns have names, so no need to set it up:

names(dat)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Variables I’ll work with are BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.

suppressMessages(library(dplyr))
dat <- select(dat, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

Modifying variables

Before start I need to modify some variables. Dealing with data:

dat$BGN_DATE <- as.Date(as.character(dat$BGN_DATE), "%m/%d/%Y")

Setting damage exponential factor. Sorry, no neat code. After that I added new variable TD (total damage) which includes property and crop damages in dollars.

dat$PROPDMGEXP <- as.character(dat$PROPDMGEXP)
dat$PROPDMGEXP[dat$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
dat$PROPDMGEXP[dat$PROPDMGEXP == "B"] <- "1000000000"
dat$PROPDMGEXP[dat$PROPDMGEXP %in% c("M", "m")] <- "1000000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "K"] <- "1000"
dat$PROPDMGEXP[dat$PROPDMGEXP %in% c("H", "h")] <- "100"
dat$PROPDMGEXP[dat$PROPDMGEXP == "1"] <- "10"
dat$PROPDMGEXP[dat$PROPDMGEXP == "2"] <- "100"
dat$PROPDMGEXP[dat$PROPDMGEXP == "3"] <- "1000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "4"] <- "10000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "5"] <- "100000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "6"] <- "1000000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "7"] <- "10000000"
dat$PROPDMGEXP[dat$PROPDMGEXP == "8"] <- "100000000"
dat$PROPDMGEXP <- as.numeric(dat$PROPDMGEXP)
levels(factor(dat$PROPDMGEXP))
##  [1] "0"     "10"    "100"   "1000"  "10000" "1e+05" "1e+06" "1e+07"
##  [9] "1e+08" "1e+09"
dat$CROPDMGEXP <- as.character(dat$CROPDMGEXP)
dat$CROPDMGEXP[dat$CROPDMGEXP %in% c("", "?")] <- "0"
dat$CROPDMGEXP[dat$CROPDMGEXP == "B"] <- "1000000000"
dat$CROPDMGEXP[dat$CROPDMGEXP %in% c("M", "m")] <- "1000000"
dat$CROPDMGEXP[dat$CROPDMGEXP %in% c("K", "k")] <- "1000"
dat$CROPDMGEXP[dat$CROPDMGEXP == "2"] <- "100"
dat$CROPDMGEXP <- as.numeric(dat$CROPDMGEXP)
levels(factor(dat$CROPDMGEXP))
## [1] "0"     "100"   "1000"  "1e+06" "1e+09"
dat$TD <- dat$PROPDMG*dat$PROPDMGEXP + dat$CROPDMG*dat$CROPDMGEXP

Combinig weather events

Here’s a pair of top10 weather types by fatality and total damage:

suppressMessages(library(dplyr))
head(dat %>% group_by(EVTYPE) %>% summarise(Fatal=sum(FATALITIES)) %>% arrange(desc(Fatal)), 10)
## Source: local data frame [10 x 2]
## 
##            EVTYPE Fatal
##            (fctr) (dbl)
## 1         TORNADO  5633
## 2  EXCESSIVE HEAT  1903
## 3     FLASH FLOOD   978
## 4            HEAT   937
## 5       LIGHTNING   816
## 6       TSTM WIND   504
## 7           FLOOD   470
## 8     RIP CURRENT   368
## 9       HIGH WIND   248
## 10      AVALANCHE   224
head(dat %>% group_by(EVTYPE) %>% summarise(Damage=sum(TD)) %>% arrange(desc(Damage)), 10)
## Source: local data frame [10 x 2]
## 
##               EVTYPE       Damage
##               (fctr)        (dbl)
## 1              FLOOD 150319678250
## 2  HURRICANE/TYPHOON  71913712800
## 3            TORNADO  57362333590
## 4        STORM SURGE  43323541000
## 5               HAIL  18761221670
## 6        FLASH FLOOD  18243990610
## 7            DROUGHT  15018672000
## 8          HURRICANE  14610229010
## 9        RIVER FLOOD  10148404500
## 10         ICE STORM   8967041310

It’s easy to see that some types must be merged. Despite it’s open question which types should be merged. I decided to merge data this way:

dat$type <- as.character(dat$EVTYPE)
dat$type[grepl("torn", dat$type, ignore.case = T)] <- "TORNADO"
dat$type[grepl("thunderstorm|tstm", dat$type, ignore.case = T)] <- "THUNDERSTORM"
dat$type[grepl("flood", dat$type, ignore.case = T)] <- "FLOOD"
dat$type[grepl("heat|hot", dat$type, ignore.case = T)] <- "HEAT"
dat$type[grepl("hail", dat$type, ignore.case = T)] <- "HAIL"
dat$type[grepl("hurric", dat$type, ignore.case = T)] <- "HURRICANE"
dat$type[grepl("snow|ice|blizzard", dat$type, ignore.case = T)] <- "SNOW"
dat$type[grepl("lightning", dat$type, ignore.case = T)] <- "LIGHTNING"
dat$type[grepl("current", dat$type, ignore.case = T)] <- "RIP CURRENT"

Results

Using ggplot2 and next two variables will clarify influenses on economics and public health:

pop <- dat %>% group_by(type) %>% summarise(Fatalities = sum(FATALITIES), Injured = sum(INJURIES)) %>% arrange(desc(Fatalities))
dam <- dat %>% group_by(type) %>% summarise(Damage = sum(TD)) %>% arrange(desc(Damage))
library(ggplot2)

Affect on population health

Top-10 types of wheather events causes most injuries and deaths:

(pop %>% arrange(desc(Injured)) %>% select(type, Injured))[1:10,]
## Source: local data frame [10 x 2]
## 
##            type Injured
##           (chr)   (dbl)
## 1       TORNADO   91407
## 2  THUNDERSTORM    9544
## 3          HEAT    9224
## 4         FLOOD    8604
## 5     LIGHTNING    5231
## 6          SNOW    4123
## 7          HAIL    1371
## 8     HURRICANE    1328
## 9  WINTER STORM    1321
## 10    HIGH WIND    1137
(pop %>% arrange(desc(Fatalities)) %>% select(type, Fatalities))[1:10,]
## Source: local data frame [10 x 2]
## 
##            type Fatalities
##           (chr)      (dbl)
## 1       TORNADO       5661
## 2          HEAT       3138
## 3         FLOOD       1525
## 4     LIGHTNING        817
## 5  THUNDERSTORM        729
## 6   RIP CURRENT        577
## 7          SNOW        367
## 8     HIGH WIND        248
## 9     AVALANCHE        224
## 10 WINTER STORM        206

To compare first most harmful 5 disasters look at plot below (note that Tornado injuries not in scale, look table above).

library(reshape2)
pop5 <- melt(pop[1:5, ], id = "type")
ggplot(pop5, aes(x=type, y = value, fill=variable)) + geom_bar(position = "dodge", stat = "identity") + coord_cartesian(ylim = c(0,10000)) + ggtitle("5 Most harmful disasters for population health")

Affect on economics

10 types of weather events that couse most damage. It contains both, property damage and crop damage.

(dam %>% arrange(desc(Damage)) %>% select(type, Damage))[1:10,]
## Source: local data frame [10 x 2]
## 
##              type       Damage
##             (chr)        (dbl)
## 1           FLOOD 180591769420
## 2       HURRICANE  90271472810
## 3         TORNADO  59020779590
## 4     STORM SURGE  43323541000
## 5            HAIL  19024451820
## 6         DROUGHT  15018672000
## 7    THUNDERSTORM  12456456330
## 8            SNOW  10917175400
## 9  TROPICAL STORM   8382236550
## 10   WINTER STORM   6715441250

Next plot allows to compare 5 most damaged weather events in dollars.

qplot(type, Damage, data = dam[1:5, ], fill = type, geom = "bar", stat="identity", main = "Damage from weather events 1950-2011")

Fixed weather events through years

First, it’s need obtain most dangerous weather events both, for economix and population. Second, then extract frequenses of every of most dangerous disasters per year. These new variables allow to plot frequencies of disasters per year.

mostDang <- union(dam$type[1:5], pop$type[1:5])
disFreq <- dat %>% filter(type %in% mostDang) %>% mutate(Year = format(BGN_DATE, "%Y")) %>% group_by(Year, type) %>% summarise(Frequency = n())
qplot(Year, Frequency, data = disFreq, fill = type, group = type, geom = "line", colour = type)

Fin.