The aim of this report is to describe the consequences of storm events from 1950 to 2011 in the USA. For this purpose we tried to answer two questions : firstly which types of events are most harmful with respect to the population health ? and which types of events have the greatest economic consequences ?
We found that the tornados had the most impact to the population health, both regarding fatalies or injuries. Also hurricanes had the greatest economic consequences during this period of time.
From Coursera Reproductible Reserarch class we obtained data of the consequences of storms across USA, betwenn 1950 and 2011.
This step can take a few minutes to process, patience may be needed.
storm <- read.csv("stormData.csv.bz2", na.string="")
Getting the columns we will need to use.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
data <- select(storm, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP, REFNUM)
head(data, n=3)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP REFNUM
## 1 TORNADO 0 15 25.0 K 0 <NA> 1
## 2 TORNADO 0 0 2.5 K 0 <NA> 2
## 3 TORNADO 0 2 25.0 K 0 <NA> 3
We calculate the sum of injuries and fatalities, for each event type.
harm <- group_by(data, EVTYPE) %>%
summarize(INJURIES=sum(INJURIES), FATALITIES=sum(FATALITIES))
Top 5 for injuries :
injuries <- arrange(harm, desc(INJURIES))
head(injuries[,-3])
## # A tibble: 6 x 2
## EVTYPE INJURIES
## <fctr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
Top 5 for fatalities :
fatalities <- arrange(harm, desc(FATALITIES))
head(fatalities[,-2])
## # A tibble: 6 x 2
## EVTYPE FATALITIES
## <fctr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
Tornado has the most injuries and fatalities, during the period between 1950 and 2011.
We can plot the results for injuries :
injuries10 <- head(injuries, n=10)
ggplot(data=injuries10, aes(x=reorder(EVTYPE, INJURIES), y=INJURIES))+geom_bar(stat="identity", fill="navy")+coord_flip()+labs(title="Number of injuries by type of events, top 10", x="Event types", y="Number of injuries")
and for fatalities :
fatalities10 <- head(fatalities, n=10)
ggplot(data=fatalities10, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES))+geom_bar(stat="identity", fill="orange")+coord_flip()+labs(title="Number of fatalities by type of events, top 10", x="Event types", y="Number of fatalities")
First we have to replace the exposants in letters by numbers to calculate value of the damages, first for the Proper Damages.
data$pd <- 0
data[data$PROPDMGEXP %in% c("H","h"),]$pd <- 100
data[data$PROPDMGEXP %in% c("K","k"),]$pd <- 1000
data[data$PROPDMGEXP %in% c("M","m"),]$pd <- 10^6
data[data$PROPDMGEXP %in% c("B","b"),]$pd <- 10^9
data$propTotal <- data$PROPDMG * data$pd
Then we do the same for Crop Damages.
data$cd <- 0
data[data$CROPDMGEXP %in% c("K","k"),]$cd <- 1000
data[data$CROPDMGEXP %in% c("M","m"),]$cd <- 10^6
data[data$CROPDMGEXP %in% c("B","b"),]$cd <- 10^9
data$cropTotal <- data$CROPDMG * data$pd
And calculate total damage (crop + proper).
data$totalDmg <- data$propTotal + data$cropTotal
We now can find which event type made the most damages.
dmg <- group_by(data, EVTYPE) %>%
summarize(totalDmg=sum(totalDmg))
dmg <- arrange(dmg, desc(totalDmg))
head(dmg)
## # A tibble: 6 x 2
## EVTYPE totalDmg
## <fctr> <dbl>
## 1 HURRICANE 814750235010
## 2 HURRICANE/TYPHOON 802074291330
## 3 FLOOD 231909682070
## 4 TORNADO 85207032660
## 5 FLASH FLOOD 54962948390
## 6 STORM SURGE 43328536000
Hurricanes made the most expensive damages during the period ! Let’s plot it.
dmg10 <- head(dmg, n=10)
ggplot(data=dmg10, aes(x=reorder(EVTYPE, totalDmg), y=totalDmg))+geom_bar(stat="identity", fill="green")+coord_flip()+labs(title="Cost of the damages by event type, top 10", x="Event types", y="Damages, in $")