Synopsis

This publication uses the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which kind of of weather events cause more public health and property damage in the US. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

The following code loads the necessary libraries, downloads and loads the data into a data frame named stormdata

#loading libraries
library(tidyverse)
library(reshape2)
library(ggthemes)

#Creating a directory to work
if(!dir.exists("reproducible"))
    dir.create("reproducible")

#Sets the new directory
setwd("reproducible")

#Downloads the file
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "~/reproducible/repdata_data_StormData.csv.bz2")

#Reads the data into a data frame
stormdata <- read.csv("~/reproducible/repdata_data_StormData.csv.bz2")

With the data on hand, we want to know what kinds of events are more harmful to the population health and what kinds of event generate more property damage. For that we’ll be looking at the number of fatalities and injuries caused by each type of event. With the following code we’ll group the data by event type, sum all fatalities and injuries and finally, order it by descending order of fatalities (with injuries as a tie breaker). We’ll also print the first 5 results so we can get a sense if our transformations worked alright. For economic damage, we’ll consider property damage and crop damage. We’ll use the PROPDMGEXP and CROPDMGEXP columns to convert all values to USD (multiplying k, m and b by 1000, 1000000 and 1000000000 respectively), add them all and then convert to Million USD for the final report.

#Selects the important variables from the dataframe, groups by event type using sum as a summarizing function
#Converts property damage to US$ before summarizing and then stores it as Million US$

storm_grouped <- select(stormdata, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
                 mutate(PROPDMG = ifelse(PROPDMGEXP == "K" | PROPDMGEXP == "k",yes = PROPDMG * 1000, no = PROPDMG)) %>%
                 mutate(PROPDMG = ifelse(PROPDMGEXP == "M" | PROPDMGEXP == "m",yes = PROPDMG * 1000000,no = PROPDMG)) %>%
                 mutate(PROPDMG = ifelse(PROPDMGEXP == "B" | PROPDMGEXP == "b",yes = PROPDMG * 1000000000,no = PROPDMG)) %>%
                 mutate(CROPDMG = ifelse(CROPDMGEXP == "K" | CROPDMGEXP == "k",yes = CROPDMG * 1000, no = CROPDMG)) %>%
                 mutate(CROPDMG = ifelse(CROPDMGEXP == "M" | CROPDMGEXP == "m",yes = CROPDMG * 1000000,no = CROPDMG)) %>%
                 mutate(CROPDMG = ifelse(CROPDMGEXP == "B" | CROPDMGEXP == "b",yes = CROPDMG * 1000000000,no = CROPDMG)) %>%
                 group_by(EVTYPE) %>%
                 summarise(fatal = sum(FATALITIES), injur = sum(INJURIES), 
                           property_m = sum(PROPDMG+CROPDMG)/1000000, n = n()) %>%
                 arrange(desc(fatal, injur))

#Visualizes the top 5 fatalities events 
head(storm_grouped, 5)
## # A tibble: 5 x 5
##   EVTYPE         fatal injur property_m     n
##   <fct>          <dbl> <dbl>      <dbl> <int>
## 1 TORNADO         5633 91346     57352. 60652
## 2 EXCESSIVE HEAT  1903  6525       500.  1678
## 3 FLASH FLOOD      978  1777     17562. 54277
## 4 HEAT             937  2100       403.   767
## 5 LIGHTNING        816  5230       941. 15754
#Visualizes the top 5 property damage events
head(storm_grouped %>% arrange(desc(property_m)), 5)
## # A tibble: 5 x 5
##   EVTYPE            fatal injur property_m      n
##   <fct>             <dbl> <dbl>      <dbl>  <int>
## 1 FLOOD               470  6789    150320.  25326
## 2 HURRICANE/TYPHOON    64  1275     71914.     88
## 3 TORNADO            5633 91346     57352.  60652
## 4 STORM SURGE          13    38     43324.    261
## 5 HAIL                 15  1361     18758. 288661

It seems like everything is working and we can proceed with our analysis.

Results

Let’s plot some data so we can visualize the results

melt_storm <- melt(head(storm_grouped,5), id.vars = "EVTYPE", measure.vars = 2:3, variable.name = "kind")

plot1 <- ggplot(melt_storm, aes(x = EVTYPE, y = value, fill = kind)) + geom_col() + theme_few() +xlab("") + ggtitle("Injuries per Event Type") + ylab("") + scale_fill_discrete(name="Kind of Injury",breaks=c("fatal", "injur"), labels=c("Fatal", "Injury"))

plot1

We can observe that a Tornado is, by far, the deadliest event. Accounting both fatalities and injuries, it affected over 10 times the number of people that the second deadliest event (Excessive heat). We can conclude that Tornadoes are the events that cause most damage to public health.

As for property damage:

plot2 <- ggplot(head(storm_grouped %>% arrange(desc(property_m)),5), aes(x = EVTYPE, y = property_m, fill = EVTYPE)) + geom_col() + theme_few() + ylab("Property damage (Million US$)") + ggtitle("Property and Crop Damage by Event type") + labs(fill = "Event Type") + xlab("")

plot2

We can observe that Floods are the main offenders in regards to property and crop damage. Flash Floods, Typhoons and Storm Surges and Tornadoes also cause a significant amount of damage. When considering economic damage, more event types call our attention.

Conclusions

We conclude that Tornadoes cause the vast majority of public health damage amongst the observed event types. Their health damage is vastly superior to all other events. We also conclude that Tornadoes are the largest cause of property damage, but floods, flash floods and hail also pose a significant threat to property.