Synopsis

The goal of this assignment is to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and answer questions about severe weather events. In particular, the focus is given on determining, across the United States, which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences.

Data Processing

Data loading

Data loaded from the NOAA’s database

if(!file.exists("./data/NOAA_data.bz2")) {
  fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileUrl,  destfile = "./data/NOAA_data.bz2", method = "curl")
}

Data unpacked and assigned to the variable NOAA (results are chached)

NOAA <- read.csv(bzfile("./data/NOAA_data.bz2"), sep=",", header=T)

Data cleanup to remove unnecessary information for the study

cleanNOAA <- NOAA[c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]

Convert from exponential to standard values

cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "K"] <- 1e3
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "M"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == ""]  <- 1
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "B"] <- 1e9
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "m"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "+"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "0"] <- 1
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "5"] <- 1e5
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "6"] <- 1e6
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "?"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "4"] <- 1e4
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "2"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "3"] <- 1e3
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "h"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "7"] <- 1e7
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "H"] <- 1e2
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "-"] <- 0
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "1"] <- 10
cleanNOAA$PROPEXP[cleanNOAA$PROPDMGEXP == "8"] <- 1e8


cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "M"] <- 1e6
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "K"] <- 1e3
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "m"] <- 1e6
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "B"] <- 1e9
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "0"] <- 1
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "k"] <- 1e3
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "2"] <- 1e2
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == ""] <- 1
cleanNOAA$CROPEXP[cleanNOAA$CROPDMGEXP == "?"] <- 0

cleanNOAA$PROPERTY_DAMAGE <- cleanNOAA$PROPDMG * cleanNOAA$PROPEXP
cleanNOAA$CROP_DAMAGE <- cleanNOAA$CROPDMG * cleanNOAA$CROPEXP

Datasets to be plot:

  1. Fatalities as a function of the event type
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = cleanNOAA, FUN = sum)
  1. Injuries as a function of the event type
injuries<- aggregate(INJURIES ~ EVTYPE, data = cleanNOAA, FUN = sum)
  1. Property damage as a function of the event type
prop_dam <- aggregate(PROPERTY_DAMAGE ~ EVTYPE, data = cleanNOAA, FUN = sum)
  1. Crop damage as a function of the event type
crop_dam <- aggregate(CROP_DAMAGE ~ EVTYPE, data = cleanNOAA, FUN = sum)

We consider the top 10 events in terms of public health and economical damage as follows:

# Public health
fatalities_10 <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ][1:10, ]
injuries_10 <- injuries[order(injuries$INJURIES, decreasing = TRUE), ][1:10, ]
# Economic
property_10 <- prop_dam[order(prop_dam$PROPERTY_DAMAGE, decreasing = TRUE), ][1:10, ]
crop_10 <- crop_dam[order(crop_dam$CROP_DAMAGE, decreasing = TRUE), ][1:10, ]

Results

Population health

We first consider the most harmful events for the population in therms of fatalities.

# Load the plotting library
library(ggplot2)
ggplot(fatalities_10, aes(x = EVTYPE, y = FATALITIES))+ 
  geom_bar(stat = "identity", fill = "red", na.rm = FALSE, show.legend = FALSE) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("Event Type") + ylab("Fatalities") +
  ggtitle("Fatalities in the U.S. by top 10 events") +
  theme(plot.title = element_text(hjust = 0.5))

It can be noticed that the first cause of fatalities, in the observed period, is the tornado. On the other side, the second position is occupied by a less catastrophic effect, namely the excessive heat. This is an important warning that has to kept in account when trying to prevent further fatalities.

Furthermore, we consider the most harmful events for the population in terms of injuries.

ggplot(injuries_10, aes(x = EVTYPE, y = INJURIES))+ 
  geom_bar(stat = "identity", fill = "blue", na.rm = FALSE, show.legend = FALSE) +
  theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
  xlab("Event Type") + ylab("Injuries") +
  ggtitle("Injuries in the U.S. by top 10 events") +
  theme(plot.title = element_text(hjust = 0.5))

Also in this case, the biggest damage in terms of population health comes from the tornadoes, being bigger than the other causes by an order of magnitude. The excessive heat is still present, but its influence on injuries compared to fatalities is less evident. We notice the appearence of other events such as floods, lightnings and TSTM winds with a similar effect to the excessive heat.

Economic consequences

We compare the damage (in terms of millions of dollars) to properties and crops in the following plot.

par(mfrow = c(1, 2), mar = c(10, 4, 3, 3))
barplot(property_10$PROPERTY_DAMAGE, las = 3, names.arg = property_10$EVTYPE, 
    main = "Top 10 Property Damage", ylab = "Cost [$]")
barplot(crop_10$CROP_DAMAGE, las = 3, names.arg = crop_10$EVTYPE, 
    main = "Top 10 Crop Damages", ylab = "Cost [$]")

It can be noticed that the floods are the main cause of property damage, doubling the damages due to typhoons, tornadoes and storm surges. On the other hand, droughts are the main problem when talking about crops.