Health and Economic Impact of Hydrometeorological Events in United States

Synopsis

The Storm Events Database contains the records used to create the official NOAA Storm Data publication, documenting:

The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce;
Rare, unusual, weather phenomena that generate media attention, such as snow flurries in South Florida or the San Diego coastal area; and
Other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occur in connection with another event.

The database currently contains data from January 1950 to March 2014, as entered by NOAA’s National Weather Service (NWS). Due to changes in the data collection and processing procedures over time, there are unique periods of record available depending on the event type.

NCDC has performed data reformatting and standardization of event types but has not changed any data values for locations, fatalities, injuries, damage, narratives and any other event specific information. Please refer to the Database Details page for more information.

Reproducible Research: Peer Assessment 2

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database in order to address the following questions:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Data Processing

First of all, we need to set your working directory properly.

setwd("~/GitHub/RepData_PeerAssessment2/")

The following step consists in loading the packages and the raw data required to our analysis.

library(knitr)      # Tool for dynamic report generation in R
library(markdown)   # Creation of dynamic reports from R
library(data.table) # Large data manipulation package
library(lubridate)  # Date manipulation package
library(ggplot2)    # Graphical package

StormData <- read.csv("~/GitHub/RepData_PeerAssessment2/data/StormData.csv", na.strings=",,", stringsAsFactors=FALSE)

In addition to what we just stated about the data, is worth noting that it comprises 902297 observations among 37 variables.

dim(StormData)

## [1] 902297     37

At this point, we can select only the variables* (columns) required to our analysis which are the following:

“BGN_DATE” : Date for given event observation
“EVTYPE” : Events types (48 levels)
“FATALITIES”: Absolute number of fatal cases for a given event observation
“INJURIES” : Absolute number of injuries for a given event observation
“PROPDMG” : Private and public propriety damage rounded to three significant digits
“PROPDMGEXP”: Alphabetical character signifying the magnitude of the number
“CROPDMG” : Crop related damage rounded to three significant digits
“CROPDMGEXP”: Alphabetical character signifying the magnitude of the number

id.vars <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
StormData <- StormData[, id.vars]

Despite the variable reduction (37 to 8), we still dealing with a pretty large data frame. To address that we convert the data structure from data frame to data table and set all letter cases to lower case.

library(data.table)
DTs <- data.table(StormData)

# Set 'variable names' to lower case characters
setnames(DTs, names(DTs), tolower(names(DTs)))

# Set column entries to lower case characters
DTs <- DTs[, evtype    := tolower(evtype)];
DTs <- DTs[, propdmgexp:= tolower(propdmgexp)];
DTs <- DTs[, cropdmgexp:= tolower(cropdmgexp)];

Import a text file which holds the 48 different ‘Events’ found in the supplement National Weather Service Instruction 10-1605

DTe <- fread("~/GitHub/RepData_PeerAssessment2/data/eventType.txt", header=FALSE, colClasses = "character")
setnames(DTe, "V1", "type")

Defining a function called ‘abbr’ which string as a input and returns a abbreviation of the first nine alphanumeric characters of the original string.

abbr <- function(x) {
        library(stringr)
        x <- gsub(" |[[:punct:]]", "", x)
        x <- tolower(x)
        x <- substr(x, 1, 9)
        return(x)
}

Now we have to merge the two data sets

# Append a new colunm to both data sets 'eventLevels' and 'StormData' so we can easily merge them by this new colunm
DTs <- DTs[,abbr:=abbr(evtype)];
DTe <- DTe[,abbr:=abbr(type)];

# Set keys
setkey(DTs, abbr)
setkey(DTe, abbr)

# Merge data
DTm <- DTs[DTe]

# Select colunms from merged data
DTm <- DTm[, list(type, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp)]
setkey(DTm, type)

# Get rid of NAs
DTm <- DTm[complete.cases(DTm)]

And multiply the damage variables by their respective magnitude

# Define DamageMag
DamageMag <- function(x){
    library(data.table)
    x <- data.table(x)

    x[propdmgexp == "k", propdmg:= propdmg*1000]
    x[propdmgexp == "m", propdmg:= propdmg*1000000]
    x[propdmgexp == "b", propdmg:= propdmg*1000000000]

    x[cropdmgexp == "k", cropdmg:= cropdmg*1000]
    x[cropdmgexp == "m", cropdmg:= cropdmg*1000000]
    x[cropdmgexp == "b", cropdmg:= cropdmg*1000000000]
    
    x <- x[, economyDamage:= propdmg + cropdmg]
    x <- x[, c("propdmg", "cropdmg", "propdmgexp", "cropdmgexp"):=NULL]

    return(x)
}

# Calculate harm and damage
DTm <- DamageMag(DTm)

In face of the lack of definition of the term harmful regarding public health in this specific scenario, we simply add the fatalities and injuries variables for each observation.

DTm <- DTm[, healthDamage:= fatalities + injuries]
DTm <- DTm[, list(type, healthDamage, economyDamage)]

Now we split the data in two different tables (DT.health and DT.economy).

DT.health <- DTm[, sum(healthDamage), by=type]
DT.health <- DT.health[order(DT.health[, V1], decreasing = TRUE)]

# We use the log of the casualties to deal the dispersed and near zero values. 
DT.health <- DT.health[, V1:= log(V1 + 1)]

DT.economy <- DTm[, sum(economyDamage), by=type]
DT.economy <- DT.economy[order(DT.economy[, V1], decreasing = TRUE)]

Results

Finally, we can plot and the 10 most harmful and economy costing events:

** Health Damage

# Casualties plot
ggplot(DT.health[1:10], aes(x=V1, y=reorder(type, V1))) +
    geom_segment(aes(yend=type), xend=0, colour="grey50") +
    geom_point(size=3) +
    ggtitle("Most Harmful Events with Respect to Population Health") +
    labs(x="log(Casualties)", y="Events") +
    theme_bw() +
    theme(panel.grid.major.x = element_blank(),
              panel.grid.minor.x = element_blank(),
              panel.grid.major.y = element_line(colour="grey60", linetype="dashed"))

plot of chunk unnamed-chunk-13

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

R.: TORNADO.

** Economy Damage

# Economic damage plot
ggplot(DT.economy[1:10], aes(x=V1, y=reorder(type, V1))) +
    geom_segment(aes(yend=type), xend=0, colour="grey50") +
    geom_point(size=3) +
    ggtitle("Events with the Greatest Economic Consequences") +
    labs(x="US Dollars", y="Events") +
    theme_bw() +
    theme(panel.grid.major.x = element_blank(),
              panel.grid.minor.x = element_blank(),
              panel.grid.major.y = element_line(colour="grey60", linetype="dashed"))

plot of chunk unnamed-chunk-14

Across the United States, which types of events have the greatest economic consequences?

R.: FLOOD.

Health and Economic Impact of Hydrometeorological Events in United States

Danilo Carvalho

July 13, 2014

Synopsis

Data Processing

Results