In this report we aim to figure out what types of severe weather events in the United States are most harmful. We want to address the following questions:
In order to investigate this questions, we obtained the storm data from the U.S. National Oceanic and Atmospheric Administration (NOAA). The events in the database start in the year 1950 and end in November 2011. From these data, we found that tornado has the biggest total numbers of fatalities and injuries and flood has the greatest economic damage.
From the Reproducible Research course web site we obtained data on characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We first read in the data from the comma-separated-value file compressed via the bzip2 algorithm.
con <- bzfile("repdata-data-StormData.csv.bz2")
open(con, "r")
data <- read.csv(con)
close(con)
After reading in the data we convert it to tbl in order to use it with dplyr package.
library(dplyr)
data <- tbl_df(data)
First we create a fatalities.data tbl which contains top 10 events that have the biggest total numbers of fatalities.
fatalities.data <- group_by(data, EVTYPE) %>%
summarize(total.fatalities = sum(FATALITIES)) %>%
arrange(desc(total.fatalities)) %>%
top_n(10, total.fatalities)
Then we print fatalities.data.
fatalities.data
## Source: local data frame [10 x 2]
##
## EVTYPE total.fatalities
## (fctr) (dbl)
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
As we can see, tornado has the biggest total number of fatalities with 5633 fatalities, followed by excessive heat and flash flood with 1903 and 978 fatalities respectively. An avalanche has the 10th place with 224 fatalities.
We can construct a barplot to visualize our data.
library(ggplot2)
qplot(EVTYPE, data = fatalities.data, geom = "bar", weight = total.fatalities,
xlab = "Type of events", ylab = "Total number of fatalities",
main = "Top 10 events that have the biggest total numbers of fatalities") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
We do the analogy with injuries by creating an injuries.data tbl which contains top 10 events that have the biggest total numbers of injuries.
injuries.data <- group_by(data, EVTYPE) %>%
summarize(total.injuries = sum(INJURIES)) %>%
arrange(desc(total.injuries)) %>%
top_n(10, total.injuries)
Then we print injuries.data.
injuries.data
## Source: local data frame [10 x 2]
##
## EVTYPE total.injuries
## (fctr) (dbl)
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
As we can see, tornado has the biggest total number of injuries with 91346 injuries, followed by tstm wind and flood with 6957 and 6789 injuries respectively. A hail has the 10th place with 1361 injuries.
We can construct a barplot to visualize our data.
qplot(EVTYPE, data = injuries.data, geom = "bar", weight = total.injuries,
xlab = "Type of events", ylab = "Total number of injuries",
main = "Top 10 events that have the biggest total numbers of injuries") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
There are estimates of property damage and crop damage in the database. Variables PROPDMG and CROPDMG contain numbers of estimates of property damage and crop damage respectively, while PROPDMGEXP and CROPDMGEXP contain alphabetical characters signifying the magnitude of estimates, i.e., “K” for thousands, “M” for millions, “B” for billions, etc. In order to transform PROPDMGEXP and CROPDMGEXP variables into convenient form we create extract.magnitude function which takes alphabetical character signifying the magnitude and returns order of the magnitude.
extract.magnitude <- function(x) {
mag <- if (x %in% c("", "-", "?", "+")) {
0
}
else if (x == "B") {
9
}
else if (x %in% c("h", "H")) {
2
}
else if (x %in% c("k", "K")) {
3
}
else if (x %in% c("m", "M")) {
6
}
else {
as.numeric(as.character(x))
}
mag
}
Then we create a damage.data tbl which contains top 10 events that have the greatest total damage amounts.
damage.data <- mutate(data, PROPDMGEXP = sapply(PROPDMGEXP, extract.magnitude),
CROPDMGEXP = sapply(CROPDMGEXP, extract.magnitude),
damage = PROPDMG * 10^PROPDMGEXP + CROPDMG * 10^CROPDMGEXP) %>%
group_by(EVTYPE) %>%
summarize(total.damage = sum(damage)) %>%
arrange(desc(total.damage)) %>%
top_n(10, total.damage)
We take a look at damage.data.
damage.data
## Source: local data frame [10 x 2]
##
## EVTYPE total.damage
## (fctr) (dbl)
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57362333946
## 4 STORM SURGE 43323541000
## 5 HAIL 18761221986
## 6 FLASH FLOOD 18243991078
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
As we can see, flood has the greatest total damage amounts with $150,319,678,257, followed by hurricane/typhoon and tornado with $71,913,712,800 and $57,362,333,946 respectively. An ice storm has the 10th place with $8,967,041,360.
We can construct a barplot to visualize our data.
qplot(EVTYPE, data = damage.data, geom = "bar", weight = total.damage,
xlab = "Type of events", ylab = "Total damage amounts",
main = "Top 10 events that have the greatest total damage amounts") +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))