Synopsis

To determine the primary impacts of varous weather events on US health and economic wellbeing, we will analyze data from the NOAA National Climatic Data Center. This dataset covers the years 1950 to 2011. However, only data for 1996 to 2011 include all 48 event types which have been recorded, so our analysis will only cover those years.

The data includes counts by storm event type for injuries, fatalities, property damage and crop damage. We will compare totals of these counts for the entire time period to determine the top ten weather event impacts. Then we will group these totals by month to plot the seasonal occurrence of these events. By seeing the most significant weather events and their impacts over the course of the year, decisions can be made to allocate resources to better manage these impacts.

Across the United States from 1996 to 2011, the types of events most associated with adverse impacts to population health are tornados, heat, flooding, lightning, and thunderstorm winds. Floods, hurricanes, typhoons, storm surge, tornadoes, and hail have the greatest economic consequences. Other important events are rip currents, high winds, ice storms, avalanche, floods, drought, and ice storms.

Data Processing

Load and preprocess the data

Download the compressed CSV file and load it into a data.table. Remove extra columns, format the date columns, add a Month column, filter to include only data from the fifty states of the US, and convert property and crop damage values to US dollars.

# Download the data file (if missing).
url <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zipfile <- "StormData.csv.bz2"
if (! file.exists(zipfile)) { 
    download.file(url, zipfile , "auto") 
}

# Load into a data.table (faster and more feature-rich than a data.frame).
library(data.table)
stormdata <- as.data.table(read.csv(bzfile(zipfile), header = TRUE, 
                      stringsAsFactors = FALSE))

# Remove extra columns (to free up memory).
stormdata[, c("STATE__","BGN_TIME","TIME_ZONE","COUNTY","COUNTYNAME",
              "BGN_RANGE","BGN_AZI","BGN_LOCATI","END_TIME","COUNTY_END",
              "COUNTYENDN","END_RANGE","END_AZI","END_LOCATI","LENGTH",
              "WIDTH","F","MAG","WFO","STATEOFFIC","ZONENAMES","LATITUDE",
              "LONGITUDE","LATITUDE_E","LONGITUDE_","REMARKS","REFNUM") := NULL]

# Prepare date columns. (Convert from character variable type to date type.)
suppressMessages(library(lubridate))
stormdata[, c("BGN_DATE", "END_DATE") 
                    := list(mdy_hms(BGN_DATE), mdy_hms(END_DATE))]
stormdata[, "Month" := as.factor(month(BGN_DATE))]

# Filter storm data to include only the 50 US states and years 1996 to present.
# See: http://www.ncdc.noaa.gov/stormevents/details.jsp
library(datasets)
stormdata <- stormdata[STATE %in% state.abb & BGN_DATE >= mdy("01/01/1996")]

# Convert damage estimates to US dollars. See section 2.7 "Damage" in this PDF:
# http://www.ncdc.noaa.gov/stormevents/pd01016005curr.pdf

# Convert property damage amounts to USD. (Convert exponent code and multiply.)
stormdata[, PROPDMGEXP := toupper(PROPDMGEXP)]
stormdata[, PROPDMGEXP := sub("^[^KMB]*$", 0, PROPDMGEXP)]
stormdata[, PROPDMGEXP := sub("K", 3, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMGEXP := sub("M", 6, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMGEXP := sub("B", 9, PROPDMGEXP, fixed=TRUE)]
stormdata[, PROPDMG := as.numeric(PROPDMG)*10^as.numeric(PROPDMGEXP)]

# Convert crop damage amounts to USD. (Convert exponent code and multiply.)
stormdata[, PROPDMGEXP := toupper(CROPDMGEXP)]
stormdata[, CROPDMGEXP := sub("^[^KMB]*$", 0, CROPDMGEXP)]
stormdata[, CROPDMGEXP := sub("K", 3, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMGEXP := sub("M", 6, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMGEXP := sub("B", 9, CROPDMGEXP, fixed=TRUE)]
stormdata[, CROPDMG := as.numeric(CROPDMG)*10^as.numeric(CROPDMGEXP)]

Public Health Effects

Group fatalities and injuries by event type.

health <- stormdata[, .(FATALITIES, INJURIES, Month), by=EVTYPE]

Top 10 Weather Event Types for Injuries

Determine the top ten weather event types for injuries with a sum of all injuries, grouped by event type, for the entire time period.

# Calculate sums by event type, then sort, and find top ten.
injuries <- health[, .(Injuries=sum(INJURIES)), EVTYPE]
setkey(injuries, Injuries)
topteninjuries <- tail(injuries, 10)

From the list of top ten weather event types for injuries, calculate injury sums by event type and month.

# Sum by event type and month, convert event type to factor.
injuriesbymonth <- health[EVTYPE %in% topteninjuries$EVTYPE,
                        .(Injuries=sum(INJURIES)), .(EVTYPE, Month)]
injuriesbymonth[, "EventType" := factor(EVTYPE, levels=rev(topteninjuries$EVTYPE))]

Top 10 Weather Events for Fatalities

Determine the top ten weather event types for fatalities with a sum of all fatalities, grouped by event type, for the entire time period.

# Calculate sums by event type, then sort, and find top ten.
fatalities <- health[, .(Fatalities=sum(FATALITIES)), EVTYPE]
setkey(fatalities, Fatalities)
toptenfatalities <- tail(fatalities, 10)

From the list of top ten weather event types for fatalities, calculate fatality sums by event type and month.

# Sum by event type and month, convert event type to factor.
fatalitiesbymonth <- health[EVTYPE %in% toptenfatalities$EVTYPE,
                        .(Fatalities=sum(FATALITIES)), .(EVTYPE, Month)]
fatalitiesbymonth[, "EventType" := factor(EVTYPE, levels=rev(toptenfatalities$EVTYPE))]

Economic Effects

Group property and crop damage by event type.

economic <- stormdata[, .(PROPDMG, CROPDMG, Month), by=EVTYPE]

Top 10 Weather Event Types for Damage

Determine the top ten weather event types for property and crop damage with a sum of all damage, grouped by event type, for the entire time period.

# Calculate sums by event type, then sort, and find top ten.
damage <- economic[, .(Damage=sum(
    sum(PROPDMG, na.rm=TRUE), sum(CROPDMG, na.rm=TRUE), na.rm=TRUE)/1e6), EVTYPE]
setkey(damage, Damage)
toptendamage <- tail(damage, 10)

From the list of top ten weather event types for crop and property damage, calculate damage sums by event type and month. Convert damage sums to millions of US dollars.

# Sum by event type and month, convert event type to factor.
damagebymonth <- economic[EVTYPE %in% toptendamage$EVTYPE,
    .(Damage=sum(
        sum(PROPDMG, na.rm=TRUE), sum(CROPDMG, na.rm=TRUE), na.rm=TRUE)/1e6), 
    .(EVTYPE, Month)]
damagebymonth[, "EventType" := factor(EVTYPE, levels=rev(toptendamage$EVTYPE))]

Results

We will show three plots which demonstrate the top ten storm event types which have the greatest impact on population health and economics.

# Use a color-blind-friendly pallette for plotting 10 colors. 
# See: http://www.cookbook-r.com/Graphs/Colors_%28ggplot2%29/
cbPalette <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", 
               "#0072B2", "#D55E00", "#CC79A7", "#996633", 
               "#999999", "#000000")

Public Health Effects

This plot shows the top ten weather event types for injuries.

library(ggplot2)
ggplot(injuriesbymonth, 
       aes(Month, Injuries, group=EventType, fill=EventType)) +
    geom_bar(stat="identity") +
    ggtitle("Public Health Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Injuries per Month, 1996-2011)") +
    ylab("Number of Injuries") +
    theme(plot.title = element_text(lineheight=.8, face="bold")) +
    scale_fill_manual(values=cbPalette) + 
    theme(legend.title=element_blank())

Here is a table showing the total number of injuries from 1996 to 2011 for the top ten extreme weather types.

topteninjuries[order(-Injuries)]
##                EVTYPE Injuries
##  1:           TORNADO    91346
##  2:         TSTM WIND     6941
##  3:             FLOOD     6786
##  4:    EXCESSIVE HEAT     6209
##  5:         LIGHTNING     5212
##  6:              HEAT     2100
##  7:         ICE STORM     1975
##  8:       FLASH FLOOD     1767
##  9: THUNDERSTORM WIND     1471
## 10:              HAIL     1361

This plot shows the top ten weather event types for fatalities.

library(ggplot2)
ggplot(fatalitiesbymonth, 
       aes(Month, Fatalities, group=EventType, fill=EventType)) +
    geom_bar(stat="identity") +
    ggtitle("Public Health Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Fatalities per Month, 1996-2011)") +
    ylab("Number of Fatalities") +
    theme(plot.title = element_text(lineheight=.8, face="bold")) +
    scale_fill_manual(values=cbPalette) +
    theme(legend.title=element_blank())

Here is a table showing the total number of fatalities from 1996 to 2011 for the top ten extreme weather types.

toptenfatalities[order(-Fatalities)]
##             EVTYPE Fatalities
##  1:        TORNADO       5633
##  2: EXCESSIVE HEAT       1883
##  3:    FLASH FLOOD        939
##  4:           HEAT        935
##  5:      LIGHTNING        806
##  6:      TSTM WIND        502
##  7:          FLOOD        464
##  8:    RIP CURRENT        343
##  9:      HIGH WIND        248
## 10:      AVALANCHE        224

Summary of Health Effects

Across the United States, tornadoes are associated with the highest injury rate for the weather events recorded from 1996 to 2011. Tornados are most predominant in the first half of the year. Heat, particularly in midsummer, appears the most deadly. Floods, especially in October, as well as flash floods, lightning and thunderstorm winds in the summer months, also have a high impact on population health. Other important events are rip currents, high winds, ice storms, and avalanche.

Economic Effects

This plot shows the top ten weather event types for economic impact.

library(ggplot2)
ggplot(damagebymonth, 
       aes(Month, Damage, group=EventType, fill=EventType)) +
    geom_bar(stat="identity") +
    ggtitle("Economic Impact of\nTop 10 Most Harmful Weather Event Types\n(Total US Damage per Month, 1996-2011)") +
    ylab("Damage in Millions of US$") +
    theme(plot.title = element_text(lineheight=.8, face="bold")) +
    scale_fill_manual(values=cbPalette) + 
    theme(legend.title=element_blank())

Here is a table showing the total damage in millions of US dollars from 1996 to 2011 for the top ten extreme weather types.

toptendamage[order(-Damage)]
##                EVTYPE     Damage
##  1:             FLOOD 150145.287
##  2: HURRICANE/TYPHOON  71636.601
##  3:           TORNADO  57351.641
##  4:       STORM SURGE  43323.466
##  5:              HAIL  18758.215
##  6:       FLASH FLOOD  17275.076
##  7:           DROUGHT  15013.467
##  8:         HURRICANE  12103.928
##  9:       RIVER FLOOD  10148.401
## 10:         ICE STORM   8967.021

Summary of Economic Effects

Across the United States, flooding, especially in the winter and spring, causes the most economic damage, followed by hurricanes, typhoons, storm surge, and tornadoes. Hail also causes significant damage, mostly in the spring and fall. Other important events are floods, drought, and ice storms.