Consequences of storms events between 1950 and 2011

Synopsis

The aim of this report is to describe the consequences of storm events from 1950 to 2011 in the USA. For this purpose we tried to answer two questions : firstly which types of events are most harmful with respect to the population health ? and which types of events have the greatest economic consequences ?
We found that the tornados had the most impact to the population health, both regarding fatalies or injuries. Also hurricanes had the greatest economic consequences during this period of time.

Data processing

From Coursera Reproductible Reserarch class we obtained data of the consequences of storms across USA, betwenn 1950 and 2011.

Reading the data

This step can take a few minutes to process, patience may be needed.

storm <- read.csv("stormData.csv.bz2", na.string="")

Getting the columns we will need to use.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
data <- select(storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, REFNUM)
head(data, n=3)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP REFNUM
## 1 TORNADO          0       15    25.0          K       0       <NA>      1
## 2 TORNADO          0        0     2.5          K       0       <NA>      2
## 3 TORNADO          0        2    25.0          K       0       <NA>      3

Results

Harm of the population, by event type

We calculate the sum of injuries and fatalities, for each event type.

harm <- group_by(data, EVTYPE) %>% 
        summarize(INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES))

Top 5 for injuries :

injuries <- arrange(harm, desc(INJURIES))
head(injuries[,-3])
## # A tibble: 6 x 2
##           EVTYPE INJURIES
##           <fctr>    <dbl>
## 1        TORNADO    91346
## 2      TSTM WIND     6957
## 3          FLOOD     6789
## 4 EXCESSIVE HEAT     6525
## 5      LIGHTNING     5230
## 6           HEAT     2100

Top 5 for fatalities :

fatalities <- arrange(harm, desc(FATALITIES))
head(fatalities[,-2])
## # A tibble: 6 x 2
##           EVTYPE FATALITIES
##           <fctr>      <dbl>
## 1        TORNADO       5633
## 2 EXCESSIVE HEAT       1903
## 3    FLASH FLOOD        978
## 4           HEAT        937
## 5      LIGHTNING        816
## 6      TSTM WIND        504

Tornado has the most injuries and fatalities, during the period between 1950 and 2011.

We can plot the results for injuries :

injuries10 <- head(injuries, n=10)
ggplot(data=injuries10, aes(x=reorder(EVTYPE, INJURIES), y=INJURIES))+geom_bar(stat="identity", fill="navy")+coord_flip()+labs(title="Number of injuries by type of events, top 10", x="Event types", y="Number of injuries")

and for fatalities :

fatalities10 <- head(fatalities, n=10)
ggplot(data=fatalities10, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES))+geom_bar(stat="identity", fill="orange")+coord_flip()+labs(title="Number of fatalities by type of events, top 10", x="Event types", y="Number of fatalities")

Economic consequences by event type

First we have to replace the exposants in letters by numbers to calculate value of the damages, first for the Proper Damages.

data$pd <- 0
data[data$PROPDMGEXP %in% c("H","h"),]$pd <- 100
data[data$PROPDMGEXP %in% c("K","k"),]$pd <- 1000
data[data$PROPDMGEXP %in% c("M","m"),]$pd <- 10^6
data[data$PROPDMGEXP %in% c("B","b"),]$pd <- 10^9
data$propTotal <- data$PROPDMG * data$pd

Then we do the same for Crop Damages.

data$cd <- 0
data[data$CROPDMGEXP %in% c("K","k"),]$cd <- 1000
data[data$CROPDMGEXP %in% c("M","m"),]$cd <- 10^6
data[data$CROPDMGEXP %in% c("B","b"),]$cd <- 10^9
data$cropTotal <- data$CROPDMG * data$pd

And calculate total damage (crop + proper).

data$totalDmg <- data$propTotal + data$cropTotal

We now can find which event type made the most damages.

dmg <- group_by(data, EVTYPE) %>% 
        summarize(totalDmg=sum(totalDmg))
dmg <- arrange(dmg, desc(totalDmg))
head(dmg)
## # A tibble: 6 x 2
##              EVTYPE     totalDmg
##              <fctr>        <dbl>
## 1         HURRICANE 814750235010
## 2 HURRICANE/TYPHOON 802074291330
## 3             FLOOD 231909682070
## 4           TORNADO  85207032660
## 5       FLASH FLOOD  54962948390
## 6       STORM SURGE  43328536000

Hurricanes made the most expensive damages during the period ! Let’s plot it.

dmg10 <- head(dmg, n=10)
ggplot(data=dmg10, aes(x=reorder(EVTYPE, totalDmg), y=totalDmg))+geom_bar(stat="identity", fill="green")+coord_flip()+labs(title="Cost of the damages by event type, top 10", x="Event types", y="Damages, in $")