In this report we aim to describe which weather events in the United States between 1950-2011 were most harmful with respect to population health and have to greatest economic consequences. For our analysis we used storm data obtained from the National Weather Service. From these data, we found that, tornado’s caused most fatalities and injuries. Furthermore
For our analysis we used the following packages on top of the R base package:
library(plyr)
## Warning: package 'plyr' was built under R version 3.1.3
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.1.3
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:plyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.1.3
From the National Weather Service we download data on weather events for the years between 1950 and 2011.
tmp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2", tmp)
We read the zipped CSV-file into a dataframe and group the data by event type.
data <- read.csv(tmp)
data <- group_by(data, EVTYPE)
Then we calculate the total number of fatalities and select the ten event types with the highest total number of fatalities.
fatalities <- summarise(data, number = sum(FATALITIES))
fatalities <- arrange(fatalities, desc(number))[1:10, ]
We do the same for the total number of injuries.
injuries <- summarise(data, number = sum(INJURIES))
injuries <- arrange(injuries, desc(number))[1:10, ]
Property damage is represented with two fields, a number PROPDMG in dollars and a multiplier PROPDMGEXP. We calculate the property damage for each observation in PROPDMG, by multiplying PROPDMG by 10 exponent PROPDMGEXP. But first we have to decode PROPDMGEXP to a numeric value.
data$NUM.PROPDMGEXP <- revalue(data$PROPDMGEXP, c("?" = "0", "-" = "0", "+" = "0", "h" = "2", "H" = "2", "K" = "3", "m" = "6", "M" = "6", "B" = "9"))
data$NUM.PROPDMGEXP[data$PROPDMGEXP == ""] <- "0"
data$TOT.PROPDMG <- data$PROPDMG * 10^(as.numeric(data$NUM.PROPDMGEXP))
We do the same for crop damage.
data$NUM.CROPDMGEXP <- revalue(data$CROPDMGEXP, c("?" = "0", "k" = "3", "K" = "3", "m" = "6", "M" = "6", "B" = "9"))
data$NUM.CROPDMGEXP[data$CROPDMGEXP == ""] <- "0"
data$TOT.CROPDMG <- data$CROPDMG * 10^(as.numeric(data$NUM.CROPDMGEXP))
Finally we calculate the total property damage by event type and select the ten event types which have caused the largest damage to property.
propdmg <- summarise(data, damage = sum(TOT.PROPDMG))
propdmg <- arrange(propdmg, desc(damage))[1:10, ]
We do the same for the total crop damage.
cropdmg <- summarise(data, damage = sum(TOT.CROPDMG))
cropdmg <- arrange(cropdmg, desc(damage))[1:10, ]
To answer the question which types of weather events are most harmful to public health, we create two ordered list. The first is the top ten of types of weather events that caused the most fatalities across the USA between 1950 and 2011.
print(fatalities)
## Source: local data frame [10 x 2]
##
## EVTYPE number
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
The second is the top ten of types of weather events that caused the most injuries.
print(injuries)
## Source: local data frame [10 x 2]
##
## EVTYPE number
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Next we create a pair of graphs of total fatalities and total injuries caused by these most harmful weather events.
plot1 <- ggplot(fatalities, aes(x = reorder(EVTYPE, desc(number)), y = number)) +
geom_bar(stat="identity") +
ylab("Number of fatalities") +
xlab("Event type") +
ggtitle("Total number of fatalities per weather event type\nacross the USA (1950-2011)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot2 <- ggplot(injuries, aes(x = reorder(EVTYPE, desc(number)), y = number)) +
geom_bar(stat="identity", las = 3) +
ylab("Number of injuries") +
xlab("Event type") +
ggtitle("Total number of injuries per weather event type\nacross the USA (1950-2011)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot1, plot2, ncol = 1)
Based on the above pair of bar charts, we find that tornado’s have caused most fatalities and most injuries in the United States from 1950 to 2011.
The question which types of weather events have the greatest economic consequences is answered by analyzing total property and crop damage. The top 10 event types that caused the largest overall property damage are:
print(propdmg)
## Source: local data frame [10 x 2]
##
## EVTYPE damage
## 1 FLOOD 1.446577e+13
## 2 HURRICANE/TYPHOON 6.930584e+12
## 3 TORNADO 5.694738e+12
## 4 STORM SURGE 4.332354e+12
## 5 FLASH FLOOD 1.682267e+12
## 6 HAIL 1.573527e+12
## 7 HURRICANE 1.186832e+12
## 8 TROPICAL STORM 7.703891e+11
## 9 WINTER STORM 6.688497e+11
## 10 HIGH WIND 5.270046e+11
The top 10 event types that caused the largest overall crop damage are:
print(cropdmg)
## Source: local data frame [10 x 2]
##
## EVTYPE damage
## 1 HAIL 60161277300
## 2 FLOOD 21753275000
## 3 FLASH FLOOD 19039070000
## 4 DROUGHT 14595735000
## 5 TSTM WIND 11320985000
## 6 TORNADO 10269737000
## 7 THUNDERSTORM WIND 6992705000
## 8 HURRICANE 2999310000
## 9 HIGH WIND 2288040000
## 10 THUNDERSTORM WINDS 2014708800
We display our findings in the following pair of bar charts.
plot1 <- ggplot(propdmg, aes(x = reorder(EVTYPE, desc(damage)), y = damage)) +
geom_bar(stat="identity") +
ylab("Amount of damage (in dollars)") +
xlab("Event type") +
ggtitle("Total amount of damage to property per weather event type\nacross the USA (1950-2011)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot2 <- ggplot(cropdmg, aes(x = reorder(EVTYPE, desc(damage)), y = damage)) +
geom_bar(stat="identity", las = 3) +
ylab("Amount of damage (in dollars)") +
xlab("Event type") +
ggtitle("Total amount of damage to crop per weather event type\nacross the USA (1950-2011)") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
grid.arrange(plot1, plot2, ncol = 1)
We find that overall floods are the weather event type that have had the greatest economic consequences in the USA from 1950-2011.