Synopsis

This study seeks to identify which storm events, recorded in the United States from 1990 to 2011, caused the greatest were associated with the greatest population and economic tolls. The data is taken from the National Oceanic and Atmospheric Administration’ storm database.

The data indiciates that tornados have caused both the greatest recorded tolls both in terms of population health, while thunderstorms have been the most damaging in terms of economic health.

Documentation of the data can be found at the following urls:

National Weather Service Storm Data Documentation

National Climatic Data Center Storm Events FAQ

Data Processing

Loading Packages

The analysis uses:

  • dplyr for cleaning the data
  • data.table for fast operations
  • car for recoding the property and crop damange multipliers
  • ggplot2 for plotting
  • scales for converting scientific notation to standard
library(dplyr)
library(data.table)
library(car)
library(ggplot2)
library(scales)

Loading Data

To speed up data loading time, we only include the data columns we need to determine the effects of storms on population and econmic health. Because most data recorded before 1990 features only tornadoes, we also include the event begninning date to select the storm events that took place only in that year or later. Finally, we cast the data frame as a data table for improved performance later in the analysis.

stormData <- read.table("repdata-data-StormData.csv.bz2",
                        sep = ",",
                        header = TRUE,
                        na.strings = "",
                        nrows = 902298,
                        colClasses = c(
                            "NULL",           
                            "character",      # BGN_DATE
                            rep("NULL", 5),   
                            "factor",         # EVTYPE
                            rep("NULL", 14),  
                            "numeric",        # FATALITIES
                            "numeric",        # INJURIES
                            "numeric",        # PROPDMG
                            "factor",         # PROPDMGEXP
                            "numeric",        # CROPDMG
                            "factor",         # CROPDMGEXP
                            rep("NULL", 9)    
                        ))

stormData <- data.table(stormData)

Cleaning Data

First, we remove all events that take place before the year 1990. We also remove summary events, as these are not weather events but rather summaries of particular dates.

stormData <- stormData %>%
    mutate(DATE = as.Date(BGN_DATE, format = '%m/%d/%Y %H:%M:%S')) %>%
    filter(year(DATE) >= 1990 & !grepl("SUMMARY", EVTYPE)) %>%
    mutate(EVTYPE = toupper(EVTYPE))

We recode the property and crop damage multipliers to the value they represent, and then obtain the true property and crop damage values.

recodeString <- "'K'=1000; 'M'=1000000; 'B'=1000000000;
                 c('+', '0', '5', '6', '4', 'H', '2', '7', '3', '-', '?')=0; else=1"

stormData$PROPDMGEXP <- as.numeric(recode(stormData$PROPDMGEXP, recodeString))
stormData$CROPDMGEXP <- as.numeric(recode(stormData$CROPDMGEXP, recodeString))

stormData$PROPDMG <- stormData$PROPDMG * stormData$PROPDMGEXP
stormData$CROPDMG <- stormData$CROPDMG * stormData$CROPDMGEXP

Aggregating Data

Many events share several storm types, such as “Thunderstorm/Tornado”. So, instead of trying to split all observations into different, discrete categories of storm type, we allow events to match multiple storm types. This is simply to reflect the way that a single observation can detail several different storm events, which are often highly coupled and difficult to distinguish.

First, we create a vector of all storm types. These are consolidated both by exteremly similar events (such as those related to snow, flood and dust, as well as marine events and their terrestrial counterparts). We also include spelling mistakes as these are incredibly common in the data set.

This aggregation leaves out landslide and ice related events, which have high effects on population and economic health in the dataset. This is because they are not classified as storm events by NOAA.

stormTypes <- c(
    "Astronomical Low Tide",
    "Avalanche|AVALANCE",
    "Blizzard",
    "Cold/Wind Chill|Cold|Wind Chill|Windchill",
    "Debris Flow",
    "Dense Fog",
    "Dense Smoke",
    "Drought",
    "Dust Storm|Dust Devil",
    "Flood|Stream FLD|Flash Flood|Coastal Flood",
    "Frost/Freeze|Frost|Freeze",
    "Funnel Cloud",
    "Freezing Fog",
    "Hail", 
    "Heat|Excessive Heat",
    "High Surf",
    "Hurricane|Hurricane (Typhoon)|Typhoon",
    "Ice Storm",
    "Lakeshore Flood",
    "Lightning|LIGNTNING",
    "Rain",
    "Rip Current",
    "Seiche",
    "Sleet",
    "Snow",
    "Storm Surge/Tide|Storm Surge",
    "Strong Wind|High Wind|^winds$|^wind$|^gusty wind",
    "Thunderstorm|TSTM|THUNDERESTORM|THUNDERSTROM|THUNDERTSORM|THUDERSTORM|THUNDEERSTORM|THUNDERTORM|THUNDESTORM|THUNERSTORM",
    "Tornado|Torndao",
    "Tropical Depression",
    "Tropical Storm",
    "Tsunami",
    "Volcanic Ash",
    "Waterspout",
    "Wildfire|Wild Fires|Wild/Forest Fire|Wild/Forest Fires|Forest Fires",
    "Winter Storm",
    "Winter Weather")

stormTypes <- toupper(stormTypes)

Next, for each storm type, we isolate the observations that match it. We then take the sum of fatalities, injuries, property damange and crop damage, and add it to a new data table that records the total damages (both human and financial) for each storm type.

stormDataTotals <- data.table()

for (type in stormTypes) {
    mainType = base::strsplit(type, "\\|")[[1]][1]
    
    stormTypeData <- stormData[grepl(type, stormData$EVTYPE), ]
    
    fatalities = sum(stormTypeData$FATALITIES)
    injuries = sum(stormTypeData$INJURIES)
    propdmg = sum(stormTypeData$PROPDMG)
    cropdmg = sum(stormTypeData$CROPDMG)

    newRow <- list(
        stormType = mainType,
        fatalities = fatalities,
        injuries = injuries,
        propdmg = propdmg,
        cropdmg = cropdmg
    )
    stormDataTotals <- rbind(stormDataTotals, newRow)
}

Results

First we look at the storm events with the greatest injuries and fatalities. We take our table of total storm damages, order it by fatalities and injuries together, and take the top ten items. After further cleaning the data for graphing purposes, plotting reveals that tornadoes cause the highest population damage, both in terms of fatalities and injuries. They are followed by heat, floods, thunderstorms, and lightning.

topPopDmg <- stormDataTotals[order(-(fatalities + injuries)), list(stormType, fatalities, injuries)][1:10]
topPopDmg <- melt(topPopDmg, id.vars = "stormType")
topPopDmg$stormType <- factor(topPopDmg$stormType, levels = unique(topPopDmg$stormType))

p1 = ggplot(topPopDmg, aes(x=stormType, y=value, fill=variable), main) +
    geom_bar(position = "stack", stat = "identity") +
    theme(axis.text.x = element_text(angle = -45, hjust = -0.1)) +
    scale_fill_discrete(name = NULL) + 
    ggtitle("Storm types with greatest impact on population health") +
    labs(x = "storm type", y = "people")
p1

Repeating the process for the economic toll - that is, property damage and crop damage together - reveals that thunderstorms are the most impactful in this category, followed by floods, tornadoes, hail, and lightning.

topPropDmg <- stormDataTotals[order(-(propdmg + cropdmg)), list(stormType, propdmg, cropdmg)][1:10]
topPropDmg <- melt(topPropDmg, id.vars = "stormType")
topPropDmg$stormType <- factor(topPropDmg$stormType, levels = unique(topPropDmg$stormType))

p2 = ggplot(topPropDmg, aes(x=stormType, y=value, fill=variable)) +
    geom_bar(position = "stack", stat = "identity") +
    theme(axis.text.x = element_text(angle = -45, hjust = -0.1)) +
    scale_y_continuous(labels = comma) +
    scale_fill_discrete(name = NULL) +
    ggtitle("Storm types with greatest impact on economic health") +
    labs(x = "storm type", y = "dollars")
p2