Synopsis

In this report we aim to describe which weather events in the United States between 1950-2011 were most harmful with respect to population health and have to greatest economic consequences. For our analysis we used storm data obtained from the National Weather Service. From these data, we found that, tornado’s caused most fatalities and injuries. Furthermore

Packages

For our analysis we used the following packages on top of the R base package:

library(plyr)
## Warning: package 'plyr' was built under R version 3.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.3

Data Processing

From the National Weather Service we download data on weather events for the years between 1950 and 2011.

tmp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2", tmp)

We read the zipped CSV-file into a dataframe and group the data by event type.

data <- read.csv(tmp)
data <- group_by(data, EVTYPE)

Then we calculate the total number of fatalities and select the ten event types with the highest total number of fatalities.

fatalities <- summarise(data, number = sum(FATALITIES))
fatalities <- arrange(fatalities, desc(number))[1:10, ]

We do the same for the total number of injuries.

injuries <- summarise(data, number = sum(INJURIES))
injuries <- arrange(injuries, desc(number))[1:10, ]

Property damage is represented with two fields, a number PROPDMG in dollars and a multiplier PROPDMGEXP. We calculate the property damage for each observation in PROPDMG, by multiplying PROPDMG by 10 exponent PROPDMGEXP. But first we have to decode PROPDMGEXP to a numeric value.

data$NUM.PROPDMGEXP <- revalue(data$PROPDMGEXP, c("?" = "0", "-" = "0", "+" = "0", "h" = "2", "H" = "2", "K" = "3", "m" = "6", "M" = "6", "B" = "9"))
data$NUM.PROPDMGEXP[data$PROPDMGEXP == ""] <- "0"
data$TOT.PROPDMG <- data$PROPDMG * 10^(as.numeric(data$NUM.PROPDMGEXP))

We do the same for crop damage.

data$NUM.CROPDMGEXP <- revalue(data$CROPDMGEXP, c("?" = "0", "k" = "3", "K" = "3", "m" = "6", "M" = "6", "B" = "9"))
data$NUM.CROPDMGEXP[data$CROPDMGEXP == ""] <- "0"
data$TOT.CROPDMG <- data$CROPDMG * 10^(as.numeric(data$NUM.CROPDMGEXP))

Finally we calculate the total property damage by event type and select the ten event types which have caused the largest damage to property.

propdmg <- summarise(data, damage = sum(TOT.PROPDMG))
propdmg <- arrange(propdmg, desc(damage))[1:10, ]

We do the same for the total crop damage.

cropdmg <- summarise(data, damage = sum(TOT.CROPDMG))
cropdmg <- arrange(cropdmg, desc(damage))[1:10, ]

Results

To answer the question which types of weather events are most harmful to public health, we create two ordered list. The first is the top ten of types of weather events that caused the most fatalities across the USA between 1950 and 2011.

print(fatalities)
## Source: local data frame [10 x 2]
## 
##            EVTYPE number
## 1         TORNADO   5633
## 2  EXCESSIVE HEAT   1903
## 3     FLASH FLOOD    978
## 4            HEAT    937
## 5       LIGHTNING    816
## 6       TSTM WIND    504
## 7           FLOOD    470
## 8     RIP CURRENT    368
## 9       HIGH WIND    248
## 10      AVALANCHE    224

The second is the top ten of types of weather events that caused the most injuries.

print(injuries)
## Source: local data frame [10 x 2]
## 
##               EVTYPE number
## 1            TORNADO  91346
## 2          TSTM WIND   6957
## 3              FLOOD   6789
## 4     EXCESSIVE HEAT   6525
## 5          LIGHTNING   5230
## 6               HEAT   2100
## 7          ICE STORM   1975
## 8        FLASH FLOOD   1777
## 9  THUNDERSTORM WIND   1488
## 10              HAIL   1361

Next we create a pair of graphs of total fatalities and total injuries caused by these most harmful weather events.

plot1 <- ggplot(fatalities, aes(x = reorder(EVTYPE, desc(number)), y = number)) +
         geom_bar(stat="identity") +
         ylab("Number of fatalities") +
         xlab("Event type") + 
         ggtitle("Total number of fatalities per weather event type\nacross the USA (1950-2011)") + 
         theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot2 <- ggplot(injuries, aes(x = reorder(EVTYPE, desc(number)), y = number)) +
         geom_bar(stat="identity", las = 3) +
         ylab("Number of injuries") +
         xlab("Event type") +
         ggtitle("Total number of injuries per weather event type\nacross the USA (1950-2011)") + 
         theme(axis.text.x = element_text(angle = 90, hjust = 1))

grid.arrange(plot1, plot2, ncol = 1)

Based on the above pair of bar charts, we find that tornado’s have caused most fatalities and most injuries in the United States from 1950 to 2011.

The question which types of weather events have the greatest economic consequences is answered by analyzing total property and crop damage. The top 10 event types that caused the largest overall property damage are:

print(propdmg)
## Source: local data frame [10 x 2]
## 
##               EVTYPE       damage
## 1              FLOOD 1.446577e+13
## 2  HURRICANE/TYPHOON 6.930584e+12
## 3            TORNADO 5.694738e+12
## 4        STORM SURGE 4.332354e+12
## 5        FLASH FLOOD 1.682267e+12
## 6               HAIL 1.573527e+12
## 7          HURRICANE 1.186832e+12
## 8     TROPICAL STORM 7.703891e+11
## 9       WINTER STORM 6.688497e+11
## 10         HIGH WIND 5.270046e+11

The top 10 event types that caused the largest overall crop damage are:

print(cropdmg)
## Source: local data frame [10 x 2]
## 
##                EVTYPE      damage
## 1                HAIL 60161277300
## 2               FLOOD 21753275000
## 3         FLASH FLOOD 19039070000
## 4             DROUGHT 14595735000
## 5           TSTM WIND 11320985000
## 6             TORNADO 10269737000
## 7   THUNDERSTORM WIND  6992705000
## 8           HURRICANE  2999310000
## 9           HIGH WIND  2288040000
## 10 THUNDERSTORM WINDS  2014708800

We display our findings in the following pair of bar charts.

plot1 <- ggplot(propdmg, aes(x = reorder(EVTYPE, desc(damage)), y = damage)) +
         geom_bar(stat="identity") +
         ylab("Amount of damage (in dollars)") +
         xlab("Event type") + 
         ggtitle("Total amount of damage to property per weather event type\nacross the USA (1950-2011)") + 
         theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot2 <- ggplot(cropdmg, aes(x = reorder(EVTYPE, desc(damage)), y = damage)) +
         geom_bar(stat="identity", las = 3) +
         ylab("Amount of damage (in dollars)") +
         xlab("Event type") +
         ggtitle("Total amount of damage to crop per weather event type\nacross the USA (1950-2011)") + 
         theme(axis.text.x = element_text(angle = 90, hjust = 1))

grid.arrange(plot1, plot2, ncol = 1)

We find that overall floods are the weather event type that have had the greatest economic consequences in the USA from 1950-2011.