Synopsis

The Storm Events Database contains the records used to create the official NOAA Storm Data publication, documenting:

The database currently contains data from January 1950 to March 2014, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type.

NCDC has performed data reformatting and standardization of event types but has not changed any data values for locations, fatalities, injuries, damage, narratives and any other event specific information. Please refer to the Database Details page for more information.

Reproducible Research: Peer Assessment 2

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in order to address the following questions:

Data Processing

First of all, we need to set your working directory properly.

setwd("~/GitHub/RepData_PeerAssessment2/")

The following step consists in loading the packages and the raw data required to our analysis.

library(knitr)      # Tool for dynamic report generation in R
library(markdown)   # Creation of dynamic reports from R
library(data.table) # Large data manipulation package
library(lubridate)  # Date manipulation package
library(ggplot2)    # Graphical package
StormData <- read.csv("~/GitHub/RepData_PeerAssessment2/data/StormData.csv", na.strings=",,", stringsAsFactors=FALSE)

In addition to what we just stated about the data, is worth noting that it comprises 902297 observations among 37 variables.

dim(StormData)
## [1] 902297     37

At this point, we can select only the variables* (columns) required to our analysis which are the following:

id.vars <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
StormData <- StormData[, id.vars]

Despite the variable reduction (37 to 8), we still dealing with a pretty large data frame. To address that we convert the data structure from data frame to data table and set all letter cases to lower case.

library(data.table)
DTs <- data.table(StormData)

# Set 'variable names' to lower case characters
setnames(DTs, names(DTs), tolower(names(DTs)))

# Set column entries to lower case characters
DTs <- DTs[, evtype    := tolower(evtype)];
DTs <- DTs[, propdmgexp:= tolower(propdmgexp)];
DTs <- DTs[, cropdmgexp:= tolower(cropdmgexp)];

Import a text file which holds the 48 different ‘Events’ found in the supplement National Weather Service Instruction 10-1605

DTe <- fread("~/GitHub/RepData_PeerAssessment2/data/eventType.txt", header=FALSE, colClasses = "character")
setnames(DTe, "V1", "type")

Defining a function called ‘abbr’ which string as a input and returns a abbreviation of the first nine alphanumeric characters of the original string.

abbr <- function(x) {
        library(stringr)
        x <- gsub(" |[[:punct:]]", "", x)
        x <- tolower(x)
        x <- substr(x, 1, 9)
        return(x)
}

Now we have to merge the two data sets

# Append a new colunm to both data sets 'eventLevels' and 'StormData' so we can easily merge them by this new colunm
DTs <- DTs[,abbr:=abbr(evtype)];
DTe <- DTe[,abbr:=abbr(type)];

# Set keys
setkey(DTs, abbr)
setkey(DTe, abbr)

# Merge data
DTm <- DTs[DTe]

# Select colunms from merged data
DTm <- DTm[, list(type, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp)]
setkey(DTm, type)

# Get rid of NAs
DTm <- DTm[complete.cases(DTm)]

And multiply the damage variables by their respective magnitude

# Define DamageMag
DamageMag <- function(x){
    library(data.table)
    x <- data.table(x)

    x[propdmgexp == "k", propdmg:= propdmg*1000]
    x[propdmgexp == "m", propdmg:= propdmg*1000000]
    x[propdmgexp == "b", propdmg:= propdmg*1000000000]

    x[cropdmgexp == "k", cropdmg:= cropdmg*1000]
    x[cropdmgexp == "m", cropdmg:= cropdmg*1000000]
    x[cropdmgexp == "b", cropdmg:= cropdmg*1000000000]
    
    x <- x[, economyDamage:= propdmg + cropdmg]
    x <- x[, c("propdmg", "cropdmg", "propdmgexp", "cropdmgexp"):=NULL]

    return(x)
}

# Calculate harm and damage
DTm <- DamageMag(DTm)

In face of the lack of definition of the term harmful regarding public health in this specific scenario, we simply add the fatalities and injuries variables for each observation.

DTm <- DTm[, healthDamage:= fatalities + injuries]
DTm <- DTm[, list(type, healthDamage, economyDamage)]

Now we split the data in two different tables (DT.health and DT.economy).

DT.health <- DTm[, sum(healthDamage), by=type]
DT.health <- DT.health[order(DT.health[, V1], decreasing = TRUE)]

# We use the log of the casualties to deal the dispersed and near zero values. 
DT.health <- DT.health[, V1:= log(V1 + 1)]

DT.economy <- DTm[, sum(economyDamage), by=type]
DT.economy <- DT.economy[order(DT.economy[, V1], decreasing = TRUE)]

Results

Finally, we can plot and the 10 most harmful and economy costing events:

** Health Damage

# Casualties plot
ggplot(DT.health[1:10], aes(x=V1, y=reorder(type, V1))) +
    geom_segment(aes(yend=type), xend=0, colour="grey50") +
    geom_point(size=3) +
    ggtitle("Most Harmful Events with Respect to Population Health") +
    labs(x="log(Casualties)", y="Events") +
    theme_bw() +
    theme(panel.grid.major.x = element_blank(),
              panel.grid.minor.x = element_blank(),
              panel.grid.major.y = element_line(colour="grey60", linetype="dashed"))

plot of chunk unnamed-chunk-13

R.: TORNADO.

** Economy Damage

# Economic damage plot
ggplot(DT.economy[1:10], aes(x=V1, y=reorder(type, V1))) +
    geom_segment(aes(yend=type), xend=0, colour="grey50") +
    geom_point(size=3) +
    ggtitle("Events with the Greatest Economic Consequences") +
    labs(x="US Dollars", y="Events") +
    theme_bw() +
    theme(panel.grid.major.x = element_blank(),
              panel.grid.minor.x = element_blank(),
              panel.grid.major.y = element_line(colour="grey60", linetype="dashed"))

plot of chunk unnamed-chunk-14

R.: FLOOD.