Synopsis

In this article we’ll use the NOAA storm database to determine which types of storm are the most harmful to population health, and which have the greatest economic consequences. The storm database contains measurements of human fatalities and injuries, as well damage to property and crops categorized by storm type. We will focus on identifying the storm types that have the greatest relative impact and their frequency, so that resources can be allocated in an efficient manner.

Setup

We’ll be using data.table instead of data.frame, reshape2 to help with aggregation, ggplot2 for graphics, knitr for processing R Markdown to HTML, xtable for HTML table formatting and the R.utils package to help with file decompression. Here we set global knitr options and allow printing of larger numerics in fixed format and set the working directory.

suppressMessages({ 
  library(data.table)
  library(reshape2)
  library(ggplot2)
  library(knitr)
  library(xtable)
  library(R.utils)
}) 

opts_knit$set(progress=FALSE)
opts_chunk$set(echo=TRUE, message=FALSE, tidy=TRUE, comment=NA,
               fig.path="figure/", fig.keep="high", fig.width=10, fig.height=6,
               fig.align="center")

options(scipen=10, stringsAsFactors=FALSE)

setwd('~/R_projects/jhuds/c5p2')

Data Processing

The data used in this analysis was originally sourced from the U.S. Dept. of Commerce, National Oceanic & Atmospheric Administration (NOAA), National Weather Service. The specific version of the storm database we’ll be using is:

Department of Commerce / National Oceanic & Atmospheric Administration / National Weather Service NATIONAL WEATHER SERVICE INSTRUCTION 10-1605 AUGUST 17, 2007 Operations and Services Performance, NWSPD 10-16 STORM DATA PREPARATION

This document is availble available at http://www.nws.noaa.gov/directives/

If necessary, download and unzip the storm database. Read decompressed file into a data.table.

csvFile <- file.path(getwd(), "StormData.csv")
bz2File <- paste0(csvFile, ".bz2")
if (!file.exists(csvFile)) {
    if (!file.exists(bz2File)) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
            bz2File)
    }
    bunzip2(bz2File)
}
data <- as.data.table(read.csv(csvFile))

From the storm database documentation the following items have been determined to be relevant to this analysis:

  1. EVTYPE - categorical event type
  2. FATALITIES - count of deaths attributed to event
  3. INJURIES - count of injuries attributed to event
  4. PROPDMG - property damage units (to 2 significant digits)
  5. PROPDMGEXP - base 10 exponent of property damage units
  6. CROPDMG - crop damage units (to 2 significant digits)
  7. CROPDMGEXP - base exponent of crop damage units

Create a subset of these columns for both the health & ecomonic analysis. Since there is mixed case in the character fields, convert them to uppercase to help with aggregation.

ads <- data[, list(EVTYPE = toupper(EVTYPE), FATALITIES, INJURIES, PROPDMG, 
    PROPDMGEXP = toupper(PROPDMGEXP), CROPDMG, CROPDMGEXP = toupper(CROPDMGEXP))]

There are 0 NA values in this data set.

Find the event types that have recorded adverse health impacts, then subset the data to just those event types.

health_impact_events <- unique(ads[FATALITIES > 0 | INJURIES > 0, EVTYPE])
health_impact_data <- ads[EVTYPE %in% health_impact_events, list(EVTYPE = as.factor(EVTYPE), 
    FATALITIES, INJURIES)]

Find the event types that have recorded adverse economic impacts, then subset the data to just those event types.

econ_impact_events <- unique(ads[PROPDMG > 0 | CROPDMG > 0, EVTYPE])
econ_impact_data <- ads[EVTYPE %in% econ_impact_events, list(EVTYPE = as.factor(EVTYPE), 
    PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)]

Only two significant digits for the crop and property damage measurements are stored, so we’ll need to convert these to a common scale using the corresponding exponent. Display a frequency table of all exponents used. Set non-numeric exponents to their numeric equivalents, and inspect converted values.

sort(table(c(econ_impact_data$PROPDMGEXP, econ_impact_data$CROPDMGEXP)), decreasing = TRUE)

              K       M       0       B       5       1       ?       2 
1080977  706403   13332     235      49      27      25      15      14 
      H       +       7       3       4       6       -       8 
      7       5       5       4       4       4       1       1 
econ_impact_data <- econ_impact_data[PROPDMGEXP %in% c("", "?", "+", "-"), `:=`(PROPDMGEXP, 
    "0")]
econ_impact_data <- econ_impact_data[CROPDMGEXP %in% c("", "?", "+", "-"), `:=`(CROPDMGEXP, 
    "0")]
econ_impact_data <- econ_impact_data[PROPDMGEXP == "H", `:=`(PROPDMGEXP, "2")]
econ_impact_data <- econ_impact_data[CROPDMGEXP == "H", `:=`(CROPDMGEXP, "2")]
econ_impact_data <- econ_impact_data[PROPDMGEXP == "K", `:=`(PROPDMGEXP, "3")]
econ_impact_data <- econ_impact_data[CROPDMGEXP == "K", `:=`(CROPDMGEXP, "3")]
econ_impact_data <- econ_impact_data[PROPDMGEXP == "M", `:=`(PROPDMGEXP, "6")]
econ_impact_data <- econ_impact_data[CROPDMGEXP == "M", `:=`(CROPDMGEXP, "6")]
econ_impact_data <- econ_impact_data <- econ_impact_data[PROPDMGEXP == "B", 
    `:=`(PROPDMGEXP, "9")]
econ_impact_data <- econ_impact_data[CROPDMGEXP == "B", `:=`(CROPDMGEXP, "9")]
sort(table(c(econ_impact_data$PROPDMGEXP, econ_impact_data$CROPDMGEXP)), decreasing = TRUE)

      0       3       6       9       5       1       2       7       4 
1081233  706407   13336      49      27      25      21       5       4 
      8 
      1 

Convert the exponet columns to numeric and then use the exponent to convert the measurements to a common scale

econ_impact_data <- econ_impact_data[, `:=`(PROPDMGEXP, as.numeric(PROPDMGEXP))]
econ_impact_data <- econ_impact_data[, `:=`(CROPDMGEXP, as.numeric(CROPDMGEXP))]
econ_impact_data <- econ_impact_data[, `:=`(PROPDMGSCALED, (PROPDMG * 10^PROPDMGEXP)/10^6)]
econ_impact_data <- econ_impact_data[, `:=`(CROPDMGSCALED, (CROPDMG * 10^CROPDMGEXP)/10^6)]

Results

Storm Type Impact on Population Health

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? Compute total and average fatalities and injuries by event type. Plot the top 10 storm event types having biggest impact on health in terms of average number of fatalities or injuries.

agg_health <- health_impact_data[, list(n = as.double(.N), tot_fatalities = as.double(sum(FATALITIES)), 
    tot_injuries = as.double(sum(INJURIES)), avg_fatalities = sum(FATALITIES)/.N, 
    avg_injuries = sum(INJURIES)/.N), by = EVTYPE]

top <- 10
agg_health_top <- agg_health[order(avg_fatalities, decreasing = TRUE), ][1:top]
agg_health_top <- rbind(agg_health_top, agg_health[order(avg_injuries, decreasing = TRUE), 
    ][1:top])

agg_health_melt <- melt(agg_health_top, id = "EVTYPE", measure.vars = c("avg_fatalities", 
    "avg_injuries"))

gh <- ggplot(data = agg_health_melt, aes(x = reorder(EVTYPE, value), y = value))
gh <- gh + geom_bar(stat = "identity") + coord_flip()
gh <- gh + facet_grid(~variable)
gh <- gh + ggtitle("Top Storm Events with respect to population health")
gh <- gh + xlab("Storm Event Type")
gh + ylab("Average impact (number of health casualties)")

plot of chunk unnamed-chunk-8

On average, we can see that tornadoes, thunderstorm wind, hail, and tropical storms are the most deadly storm events. Tropical storms, high wind and seas, and wild fires cause the most injuries. All of these have wind as a component or are exacerbated by wind.

Storm Type Economic Impact

Across the United States, which types of events have the greatest economic consequences? Compute total and average property & crop damage by event type. Plot the top 10 storm event types having biggest economic impact in terms of the average cost of property and crop damage.

agg_econ <- econ_impact_data[, list(n = as.double(.N), tot_prop_damage = as.double(sum(PROPDMGSCALED)), 
    tot_crop_damage = as.double(sum(CROPDMGSCALED)), avg_prop_damage = sum(PROPDMGSCALED)/.N, 
    avg_crop_damage = sum(CROPDMGSCALED)/.N), by = EVTYPE]

top <- 10
agg_econ_top <- agg_econ[order(avg_prop_damage, decreasing = TRUE), ][1:top]
agg_econ_top <- rbind(agg_econ_top, agg_econ[order(avg_crop_damage, decreasing = TRUE), 
    ][1:top])

agg_econ_melt <- melt(agg_econ_top, id = "EVTYPE", measure.vars = c("avg_prop_damage", 
    "avg_crop_damage"))

ge <- ggplot(data = agg_econ_melt, aes(x = reorder(EVTYPE, value), y = value))
ge <- ge + geom_bar(stat = "identity") + coord_flip()
ge <- ge + facet_grid(~variable)
ge <- ge + ggtitle("Top Storm Events with respect to Economic Health Impact")
ge <- ge + xlab("Storm Event Type")
ge + ylab("Average impact (milions $ US)")

plot of chunk unnamed-chunk-9

On average, we can see that tornadoes, heavy rain / severe weather, and hurricanes / typhoons cause the most property damage. The cost of crop damage from these events is much less than property damage. Excessive wetness and hurricanes / typhoons cause the most crop damage. All of these have wind as a component or are exacerbated by wind.

Storm Type Frequency

Finally, we’ll take a look a overall storm type frequency, by combining the frequency figures from the health and economic impact sumamries.

event_freq <- rbind(agg_health[, list(EVTYPE, n)], agg_econ[, list(EVTYPE, n)])
agg_event <- event_freq[, list(n = sum(n)), by = EVTYPE]

agg_event_top <- agg_event[order(n, decreasing = TRUE), ][1:20]

gf <- ggplot(data = agg_event_top, aes(x = reorder(EVTYPE, n), y = n))
gf <- gf + geom_bar(stat = "identity") + coord_flip()
gf <- gf + ggtitle("Top 20 Storm Event Types")
gf <- gf + xlab("Storm Event Type")
gf + ylab("Number of Storms")

plot of chunk unnamed-chunk-10

Here we can see that hail & thunderstom wind occur far more frequently than other types of storms, followed by tornadoes and flash floods. Of these high-frequency storm types, we notice that the top three also had very high average impact in terms population health and damage costs.