Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database.

Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer the following questions about severe weather events:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Firstly, the necessary libraries are loaded

library(readr)
library(tidyverse)
library(reshape2)

The data is loaded into a dataframe

if(!file.exists("stormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "stormData.csv.bz2", method = "curl")
}

stormData <- read_csv("stormData.csv.bz2")

We strip the dataframe and leave only the necessary columns

df <- stormData %>% select('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'CROPDMG')

Now we are ready to address the two questions. For the first question we will group and summarise the data by the count of fatalities and injuries. They are two separate entities and they cannot possibly be standardised. We will then look at the top ten contributing event types. As the assignment does not ask for any specific figures, the results will only be visualised.

q1 <- df %>% select(EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>% 
             summarise(sum(FATALITIES), sum(INJURIES)) %>% 
             rename("TOTAL_FATALITIES" = "sum(FATALITIES)", "TOTAL_INJURIES" = "sum(INJURIES)") %>% 
             arrange(desc(TOTAL_FATALITIES, TOTAL_INJURIES)) %>% 
             head(10)

At this point I want to get the data into tidy format suitable for plotting.

melted1 <- q1 %>% melt() 

The data is now ready to answer the first question (see Results). Similar preparation process is used for the second question. Here I could have simply added Damage to Property and Damage to Crops but, as I have already said, no figures are required for this analysis, while each contribution is more demonstative in the chart to follow.

q2 <- df %>% select(EVTYPE, PROPDMG, CROPDMG) %>% group_by(EVTYPE) %>% 
             summarise(sum(PROPDMG), sum(CROPDMG)) %>% 
             rename("DamageToProperty" = "sum(PROPDMG)", "DamageToCrops" = "sum(CROPDMG)") %>% 
             arrange(desc(DamageToProperty, DamageToCrops)) %>% 
             head(10) 

melted2 <- q2 %>% melt()

Results

For the first question, which events are most harmful with respect to population’s health, Tornado is by far the most harmful.

plot1 <- ggplot(melted1, aes(x = reorder(EVTYPE, -value), value, fill=as.factor(variable))) + geom_col() + coord_flip() 
plot1 + labs(x = "Event Type", y = "Count") +
      scale_fill_discrete(name = "Type of Harm", labels = c("Total Fatalities", "Total Injuries"))

For the second question, which types of events have the greatest economic consequences, again, Tornado is in the lead.

plot2 <- ggplot(melted2, aes(x = reorder(EVTYPE, -value), value, fill=as.factor(variable))) + geom_col() + coord_flip() 
plot2 + labs(x = "Event Type", y = "Count") +
             scale_fill_discrete(name = "Type of Damage", labels = c("Damage to Property", "Damage to Crops")) +
             labs(fill = "Type of Damage")