This analysis uses the data made available by the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database to show the public health and economic implications of storms and other weather events.
The NOOA database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011.
require(R.utils)
require(ggplot2)
if (!file.exists("data/StormData.csv")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
"data/StormData.csv.bz2", "curl")
bunzip2("data/StormData.csv.bz2")
}
storm.data <- read.csv("data/StormData.csv")
data <- storm.data[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")]
str(data)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
tail(data, n = 3)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 902295 HIGH WIND 0 0 0 K 0
## 902296 BLIZZARD 0 0 0 K 0
## 902297 HEAVY SNOW 0 0 0 K 0
## CROPDMGEXP
## 902295 K
## 902296 K
## 902297 K
# Convert exponents to numeric, so that they can be used to calculate the
# total damage
exponents.to.numeric <- function(data) {
data = toupper(as.character(data))
data[data == ""] <- 0
data[(data == "+") | (data == "-") | (data == "?")] <- 1
data[data == "H"] <- 2
data[data == "K"] <- 3
data[data == "M"] <- 6
data[data == "B"] <- 9
as.numeric(data)
}
data$PROPDMGEXP <- exponents.to.numeric(data$PROPDMGEXP)
data$CROPDMGEXP <- exponents.to.numeric(data$CROPDMGEXP)
data$ECONOMICDAMAGE <- data$PROPDMG * 10^data$PROPDMGEXP + data$CROPDMG * 10^data$CROPDMGEXP
tail(data, n = 3)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 902295 HIGH WIND 0 0 0 3 0
## 902296 BLIZZARD 0 0 0 3 0
## 902297 HEAVY SNOW 0 0 0 3 0
## CROPDMGEXP ECONOMICDAMAGE
## 902295 3 0
## 902296 3 0
## 902297 3 0
Using the weather events data set, we investigate which events were the most harmful both economically and with respect to population.
In order to find out what are the most harmful event types with respect to the population health we check the fatality and injury rates.
# Base function for all plots
weather.event.plot <- function(data, var, ylabel = paste("Number of", var),
title = paste(var, "per event type")) {
values <- head(sort(tapply(data[[toupper(var)]], data$EVTYPE, sum), decreasing = T))
variable <<- data.frame(names(values), values, row.names = NULL)
head(variable)
ggplot(data = variable, aes(x = variable$names, y = variable$values)) +
geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 90)) +
xlab("Event Type") + ylab(ylabel) + ggtitle(title)
}
weather.event.plot(data, "Fatalities")
weather.event.plot(data, "Injuries")
Clearly, the tornado is the most harmful event for the population health, considering both fatality and injuries. Weather events such as excessive heat and flood also play an important role.
weather.event.plot(data, "economicdamage", ylabel = "Economic damage (USD)",
title = "Economic damage per event type")
Flood is by far the event that has the greatest economic impact, tornado and typhon also have a significant participation.