We analize the Storm data from NOAA from 1951 to 2011 to find out the impact of natural disasters on human health and economically.
if (!(file.exists("data"))) {
dir.create("data")
}
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
# download.file(url, destfile = "./data/data.csv.bz2", method = "curl")
data <- read.csv("./data/data.csv.bz2", colClasses = "character")
As mentioned in the documentation, the variables PROPDMG and CROPDMG register the damages caused by natural disasters in US dollars. The variables PROPDMGEXP and CROPDMGEXP contain the units of these estimated damages. Let’s look at the content of PROPDMGEXP:
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
Now let’s look at the content of CROPDMGEXP:
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
As it can be seen, there are several exponents for the units of US dollar amounts that were generated because of the natural disasters. There are some exponents familiar for example h which stands for hundreds and K which stands for thousands. However, there are others which are not familiar. Let’s look how representative are these exponents in the data. Let’s look first at the property data:
table(data$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5 6
## 465934 1 8 5 216 25 13 4 4 28 4
## 7 8 B h H K m M
## 5 1 40 1 6 424665 7 11330
Now let’s look at the crop data:
table(data$CROPDMGEXP)
##
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
As it can be seen, most of the familiar exponents are representative. However, the exponent “-” in the property data and the exponent “?” in the crop should be taking into account. Searching in the internet, we found this web page that explains how to deal with the exponents. We use this information:
# For the property data
data$PROPDMGEXP[data$PROPDMGEXP == "?" ] <- 0
data$PROPDMGEXP[data$PROPDMGEXP == "" ] <- 0
data$PROPDMGEXP[data$PROPDMGEXP == "+" ] <- 0
data$PROPDMGEXP[data$PROPDMGEXP == "-" ] <- 0
data$PROPDMGEXP[data$PROPDMGEXP == "0" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "1" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "2" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "3" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "4" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "5" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "6" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "7" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "8" ] <- 10
data$PROPDMGEXP[data$PROPDMGEXP == "h" ] <- 1e02
data$PROPDMGEXP[data$PROPDMGEXP == "H" ] <- 1e02
data$PROPDMGEXP[data$PROPDMGEXP == "K" ] <- 1e03
data$PROPDMGEXP[data$PROPDMGEXP == "m" ] <- 1e06
data$PROPDMGEXP[data$PROPDMGEXP == "M" ] <- 1e06
data$PROPDMGEXP[data$PROPDMGEXP == "B" ] <- 1e09
# For the crop data
data$CROPDMGEXP[data$CROPDMGEXP == ""] <- 0
data$CROPDMGEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPDMGEXP[data$CROPDMGEXP == "0"] <- 10
data$CROPDMGEXP[data$CROPDMGEXP == "2"] <- 10
data$CROPDMGEXP[data$CROPDMGEXP == "k"] <- 1e03
data$CROPDMGEXP[data$CROPDMGEXP == "K"] <- 1e03
data$CROPDMGEXP[data$CROPDMGEXP == "m"] <- 1e06
data$CROPDMGEXP[data$CROPDMGEXP == "M"] <- 1e06
data$CROPDMGEXP[data$CROPDMGEXP == "B"] <- 1e09
# Transforming the data
data$PROPDMG <- as.numeric(data$PROPDMG) * as.numeric(data$PROPDMGEXP)
data$CROPDMG <- as.numeric(data$CROPDMG) * as.numeric(data$CROPDMGEXP)
Now that we did some treatment to the variables PROPDMG and CROPDMG, it is time to aggregate the data.
fatdata <- aggregate(as.numeric(FATALITIES) ~ EVTYPE, data = data, FUN = sum)
injdata <- aggregate(as.numeric(INJURIES) ~ EVTYPE, data = data, FUN = sum)
propdata <- aggregate(PROPDMG ~ EVTYPE, data = data, FUN = sum)
cropdata <- aggregate(CROPDMG ~ EVTYPE, data = data, FUN = sum)
There are 985 events to be considered which is a big number. We can focus on the 15 most relevant events for this we should reorder the dataframes by the most harmful and analyze them.
fatdata <- fatdata[order(fatdata[, 2], decreasing = TRUE), ]
injdata <- injdata[order(injdata[, 2], decreasing = TRUE), ]
propdata <- propdata[order(propdata[, 2], decreasing = TRUE), ]
cropdata <- cropdata[order(cropdata[, 2], decreasing = TRUE), ]
Now we can focus on the first 15 events.
fatdata <- fatdata[1:15, ]
injdata <- injdata[1:15, ]
propdata <- propdata[1:15, ]
cropdata <- cropdata[1:15, ]
Finally, we express the economic damage in Billions of US Dollars and the fatalities and injuries in thousands.
fatdata[, 2] <- fatdata[, 2] / 1e03
injdata[, 2] <- injdata[, 2] / 1e03
propdata[, 2] <- propdata[, 2] / 1e09
cropdata[, 2] <- cropdata[, 2] / 1e09
We first analize the human impact of natural disasters by looking at the number of fatalities and injuries. The following plot shows the results:
par(mfrow = c(1,2))
barplot(fatdata[, 2], names.arg = fatdata$EVTYPE, las = 3, cex.names = 0.6,
ylab = "Total Fatalities in thousands")
barplot(injdata[, 2], names.arg = injdata$EVTYPE, las = 3, cex.names = 0.6,
ylab = "Total Injuries in thousands")
We can see that there are many more injuries than fatalities and the natural disaster that is the most harmful is Tornado.
We finally analize the economic impact of natural disasters in Billions of US Dollars. The following plot shows the results:
par(mfrow = c(1,2))
barplot(propdata[, 2], names.arg = propdata$EVTYPE, las = 3, cex.names = 0.6,
ylab = "Billions of US Dollars", main = "Property Damage")
barplot(cropdata[, 2], names.arg = cropdata$EVTYPE, las = 3, cex.names = 0.6,
ylab = "Billions of US Dollars", main = "Crop Damage")
We clearly see that the economic impact is more severe for properties than for crops. The most harmful natural disaster for properties is Flood and for crops is Drought.