The goal of this assignment is to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and answer questions about severe weather events. In particular, the focus is given on determining, across the United States, which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences.
Data loaded from the NOAA’s database
if(!file.exists("./data/NOAA_data.bz2")) {
fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./data/NOAA_data.bz2", method = "curl")
}
Data unpacked and assigned to the variable NOAA (results are chached)
NOAA <- read.csv(bzfile("./data/NOAA_data.bz2"), sep=",", header=T)
Data cleanup to remove unnecessary information for the study
cleanNOAA <- NOAA[c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
Convert from exponential to standard values
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "K"] <- 1e3
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "M"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == ""] <- 1
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "B"] <- 1e9
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "m"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "+"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "0"] <- 1
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "5"] <- 1e5
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "6"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "?"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "4"] <- 1e4
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "2"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "3"] <- 1e3
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "h"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "7"] <- 1e7
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "H"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "-"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "1"] <- 10
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "8"] <- 1e8
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "M"] <- 1e6
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "K"] <- 1e3
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "m"] <- 1e6
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "B"] <- 1e9
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "0"] <- 1
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "k"] <- 1e3
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "2"] <- 1e2
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == ""] <- 1
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "?"] <- 0
cleanNOAA$PROPERTY_DAMAGE <- cleanNOAA$PROPDMG * cleanNOAA$PROPEXP
cleanNOAA$CROP_DAMAGE <- cleanNOAA$CROPDMG * cleanNOAA$CROPEXP
Datasets to be plot:
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = cleanNOAA, FUN = sum)
injuries<- aggregate(INJURIES ~ EVTYPE, data = cleanNOAA, FUN = sum)
prop_dam <- aggregate(PROPERTY_DAMAGE ~ EVTYPE, data = cleanNOAA, FUN = sum)
crop_dam <- aggregate(CROP_DAMAGE ~ EVTYPE, data = cleanNOAA, FUN = sum)
We consider the top 10 events in terms of public health and economical damage as follows:
# Public health
fatalities_10 <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ][1:10, ]
injuries_10 <- injuries[order(injuries$INJURIES, decreasing = TRUE), ][1:10, ]
# Economic
property_10 <- prop_dam[order(prop_dam$PROPERTY_DAMAGE, decreasing = TRUE), ][1:10, ]
crop_10 <- crop_dam[order(crop_dam$CROP_DAMAGE, decreasing = TRUE), ][1:10, ]
We first consider the most harmful events for the population in therms of fatalities.
# Load the plotting library
library(ggplot2)
ggplot(fatalities_10, aes(x = EVTYPE, y = FATALITIES))+
geom_bar(stat = "identity", fill = "red", na.rm = FALSE, show.legend = FALSE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Fatalities") +
ggtitle("Fatalities in the U.S. by top 10 events") +
theme(plot.title = element_text(hjust = 0.5))
It can be noticed that the first cause of fatalities, in the observed period, is the tornado. On the other side, the second position is occupied by a less catastrophic effect, namely the excessive heat. This is an important warning that has to kept in account when trying to prevent further fatalities.
Furthermore, we consider the most harmful events for the population in terms of injuries.
ggplot(injuries_10, aes(x = EVTYPE, y = INJURIES))+
geom_bar(stat = "identity", fill = "blue", na.rm = FALSE, show.legend = FALSE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
xlab("Event Type") + ylab("Injuries") +
ggtitle("Injuries in the U.S. by top 10 events") +
theme(plot.title = element_text(hjust = 0.5))
Also in this case, the biggest damage in terms of population health comes from the tornadoes, being bigger than the other causes by an order of magnitude. The excessive heat is still present, but its influence on injuries compared to fatalities is less evident. We notice the appearence of other events such as floods, lightnings and TSTM winds with a similar effect to the excessive heat.
We compare the damage (in terms of millions of dollars) to properties and crops in the following plot.
par(mfrow = c(1, 2), mar = c(10, 4, 3, 3))
barplot(property_10$PROPERTY_DAMAGE, las = 3, names.arg = property_10$EVTYPE,
main = "Top 10 Property Damage", ylab = "Cost [$]")
barplot(crop_10$CROP_DAMAGE, las = 3, names.arg = crop_10$EVTYPE,
main = "Top 10 Crop Damages", ylab = "Cost [$]")
It can be noticed that the floods are the main cause of property damage, doubling the damages due to typhoons, tornadoes and storm surges. On the other hand, droughts are the main problem when talking about crops.