The Storms and Weather Event Types That Have Most Harmful or Greatest Economic Consequences Between 1996 and 2011

Author: Ken Ho

Synopsis

In this report the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database is explored. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Based on the analysis in the given data, the top 3 most harmful storms and weather events with respect to population health are: Tornado, Excessive Heat, and Flood, respectively. And the top 3 storms and weather events have the greatest economic consequences are: Flood, Hurricane (Typhoon), and Storm Surge/Tide, respectively. Flood is among the top 3 most harmful to population health AND one of the top 3 events that has the greatest economic consequences.

Data Processing

Loading and Processing the Raw Data

The Storm Data is the data file. There are also some documentations of the database available:
. National Weather Service Storm Data Documentation
. National Climatic Data Center Storm Events FAQ

Reading the Data

First download the data file, then read in the data from the raw csv file included in the downloaded zip archive. The data is a delimited file where fields are delimited with the ‘,’ character and missing values are coded as blank fields. Some string fields are double-quoted (“). The data also contains a header line.

fileLocal <- "StormData.csv.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if(!file.exists(fileLocal)) {
        download.file(fileURL, destfile = fileLocal)
}

# Reads data file into data table
data <- data.table(read.csv(fileLocal, header = TRUE, sep = ",", quote = "\"", na.strings = ""))

After reading in the data we check the first few rows. There are total of 902,297 rows, and 37 columns in this dataset.

dim(data)
## [1] 902297     37
head(data[, 2:8, with = FALSE], n = 6)
##              BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1:  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2:  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3:  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4:   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5: 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6: 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO

According to National Climate Data Center of NOAA, only a few storm and weather event types were recorded prior to year 1996. From year 1996 to present, more completed data were recorded; 48 event types were recorded as defined in NWS Directive 10-1605. For the purpose of this analysis and the sake of fairness of comparison, only data from year 1996 and later are used:

data[["DATE_OBJ"]] <- as.POSIXct(strptime(data$BGN_DATE, "%m/%d/%Y"))
data1 <- subset(data, DATE_OBJ >= as.POSIXct("1996-01-01"))

To further simplyfy and expedite the cleaning and processing in the later steps, only the observations that meet the following criteria are extracted:
. Has fatalities, or
. Has injuries, or
. Has property damage with valid monetary multiplier, or
. Has crop damage with valid monetary multiplier.

data1 <- subset(data1, (FATALITIES > 0 | INJURIES > 0) |
                        (PROPDMG > 0 & grepl("b|B|h|H|k|K|m|M", PROPDMGEXP)) | 
                        (CROPDMG > 0 & grepl("b|B|h|H|k|K|m|M", CROPDMGEXP)))

As the recorded event types (“EVTYPE”) appear to be not as clean as expected, a clean up or recode is needed. First, let’s create a white list of event types based on info provided by National Weather Service Storm Data Documentation. Then mark the event types that match exactly with the given white list, and mark the unmatched ones as “OTHER”.

weatherEventsStd <- c(
        "ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD",
        "COASTAL FLOOD", "COLD/WIND CHILL", "DEBRIS FLOW",
        "DENSE FOG", "DENSE SMOKE", "DROUGHT",
        "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT",
        "EXTREME COLD/WIND CHILL", "FLASH FLOOD", "FLOOD",
        "FROST/FREEZE", "FUNNEL CLOUD", "FREEZING FOG",
        "HAIL", "HEAT", "HEAVY RAIN",
        "HEAVY SNOW", "HIGH SURF", "HIGH WIND",
        "HURRICANE (TYPHOON)", "ICE STORM", "LAKE-EFFECT SNOW", 
        "LAKESHORE FLOOD", "LIGHTNING", "MARINE HAIL",
        "MARINE HIGH WIND", "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND", 
        "RIP CURRENT", "SEICHE", "SLEET", 
        "STORM SURGE/TIDE", "STRONG WIND", "THUNDERSTORM WIND", 
        "TORNADO", "TROPICAL DEPRESSION", "TROPICAL STORM", 
        "TSUNAMI", "VOLCANIC ASH", "WATERSPOUT", 
        "WILDFIRE", "WINTER STORM", "WINTER WEATHER"
)

data1$EVENT_SDT <- ifelse(trimws(toupper(data1$EVTYPE)) %in% weatherEventsStd, 
                          trimws(toupper(as.character(data1$EVTYPE))), "OTHER")

There are still quite many of “non-standard” event types left:

length(unique(filter(data1, EVENT_SDT == "OTHER")$EVTYPE))
## [1] 171
head(unique(filter(data1, EVENT_SDT == "OTHER")$EVTYPE), n = 10)
##  [1] TSTM WIND            FREEZING RAIN        EXTREME COLD        
##  [4] TSTM WIND/HAIL       RIP CURRENTS         Other               
##  [7] WILD/FOREST FIRE     STORM SURGE          Ice jam flood (minor
## [10] Tstm Wind           
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

Before jumping into this list and try to categorize them, let’s take a look at this list from casualties and economic consequences perspectives. Perhaps only a handful of event types made up the majority of casualties or economic consequences. If so, only those event types are to be categorized, then the remaining uncategorized event types will make no unsignificant impact to our analysis.

First, looks into the observations with event types categorized as “OTHER” in the earlier steps. Then factorizes the EVENT_SDT for grouping purpose:

dataOther <- filter(data1, EVENT_SDT %in% "OTHER")
dataOther$EVENT_SDT <- as.factor(dataOther$EVENT_SDT)

Top 30 uncategorized event types (OTHER) that are most harmful to population health:

# Summarizes the data "dataOther" to get top harmful events to population health.
dataPopHealth_other <- dataOther %>% group_by(EVTYPE) %>%
        summarize("POPULATION" = sum(FATALITIES + INJURIES)) %>%
        arrange(desc(POPULATION))
head(dataPopHealth_other, n = 30)
##                  EVTYPE POPULATION
## 1             TSTM WIND       3870
## 2     HURRICANE/TYPHOON       1339
## 3                   FOG        772
## 4      WILD/FOREST FIRE        557
## 5          RIP CURRENTS        496
## 6                 GLAZE        213
## 7          EXTREME COLD        192
## 8  URBAN/SML STREAM FLD        107
## 9             HURRICANE        107
## 10                 WIND        102
## 11       TSTM WIND/HAIL        100
## 12   WINTER WEATHER/MIX        100
## 13 HEAVY SURF/HIGH SURF         90
## 14            LANDSLIDE         89
## 15           WINTRY MIX         78
## 16            Heat Wave         70
## 17   WINTER WEATHER MIX         68
## 18           HEAVY SURF         45
## 19          STORM SURGE         39
## 20          SNOW SQUALL         37
## 21       DRY MICROBURST         28
## 22         MIXED PRECIP         28
## 23                 COLD         27
## 24         STRONG WINDS         27
## 25            ICY ROADS         26
## 26            BLACK ICE         25
## 27    EXTREME WINDCHILL         22
## 28    UNSEASONABLY WARM         17
## 29     MARINE TSTM WIND         17
## 30     FREEZING DRIZZLE         15

Top 20 uncategorized event types (OTHER) that have greatest econonmic consequences:

# Summarizes the data "dataOther" to get top events that have greatest econonmic consequences
dataEcoDmg_other <- dataOther %>% mutate(prop_dmg = ifelse(
                grepl("b|B", PROPDMGEXP), PROPDMG * 1e+09,
                ifelse(
                        grepl("m|M", PROPDMGEXP), PROPDMG * 1e+06,
                        ifelse(
                                grepl("k|K", PROPDMGEXP), PROPDMG * 1e+03,
                                ifelse(grepl("h|H", PROPDMGEXP), PROPDMG * 1e+02, 0)
                        )
                )
        )) %>%
        mutate(crop_dmg = ifelse(
                grepl("b|B", CROPDMGEXP), CROPDMG * 1e+09,
                ifelse(
                        grepl("m|M", CROPDMGEXP), CROPDMG * 1e+06,
                        ifelse(
                                grepl("k|K", CROPDMGEXP), CROPDMG * 1e+03,
                                ifelse(grepl("h|H", CROPDMGEXP), CROPDMG * 1e+02, 0)
                        )
                )
        )) %>%
        group_by(EVTYPE) %>%
        summarize("Total_DMG" = sum(prop_dmg + crop_dmg)) %>%
        arrange(desc(Total_DMG))
head(dataEcoDmg_other, n = 20)
##                      EVTYPE   Total_DMG
## 1         HURRICANE/TYPHOON 71913712800
## 2               STORM SURGE 43193541000
## 3                 HURRICANE 14554229010
## 4                 TSTM WIND  5031941790
## 5          WILD/FOREST FIRE  3108564830
## 6              EXTREME COLD  1308733400
## 7                   TYPHOON   601055000
## 8                 LANDSLIDE   344595000
## 9                    FREEZE   146425000
## 10           River Flooding   134175000
## 11           TSTM WIND/HAIL   109031750
## 12         COASTAL FLOODING    97484000
## 13     URBAN/SML STREAM FLD    66797750
## 14              Early Frost    42000000
## 15          Damaging Freeze    34130000
## 16      AGRICULTURAL FREEZE    28820000
## 17        UNSEASONABLY COLD    25042500
## 18              RIVER FLOOD    22157000
## 19               SMALL HAIL    20863000
## 20 COASTAL FLOODING/EROSION    20030000

Let’s perform a brute force recoding of these event types. The following helper function is used:

recodeEventType
## function (data, pattern, eventType) 
## {
##     ifelse(data1$EVENT_SDT == "OTHER", ifelse(grepl(pattern, 
##         trimws(data$EVTYPE), ignore.case = TRUE), eventType, 
##         data$EVENT_SDT), data$EVENT_SDT)
## }

To perform the event type recode, the top 20 uncategorized event types that made up the major part of the casualties, and the top 10 uncategorized event types that made up the major part of monetory loss are chosen. The two numbers were derived by running this markdown file multiple times until the remaining uncategorized event types make no significant impact to the analysis are confirmed.

# Top 20 - Non-standard event types with highest casualties.
data1$EVENT_SDT <- recodeEventType(data1, "^TSTM", "THUNDERSTORM WIND")
data1$EVENT_SDT <- recodeEventType(data1, "^HURRICANE|^TYPHOON", "HURRICANE (TYPHOON)")
data1$EVENT_SDT <- recodeEventType(data1, "^FOG$", "DENSE FOG")
data1$EVENT_SDT <- recodeEventType(data1, "^WILD.*FIRE$", "WILDFIRE")
data1$EVENT_SDT <- recodeEventType(data1, "^RIP CURRENT", "RIP CURRENT")
data1$EVENT_SDT <- recodeEventType(data1, "^GLAZE$|^FREEZE$", "FROST/FREEZE")
data1$EVENT_SDT <- recodeEventType(data1, "^EXTREME COLD$", "EXTREME COLD/WIND CHILL")
data1$EVENT_SDT <- recodeEventType(data1, "^URBAN/SML STREAM FLD$|^RIVER FLOODING$", "FLOOD")
data1$EVENT_SDT <- recodeEventType(data1, "^WIND$", "HIGH WIND")
data1$EVENT_SDT <- recodeEventType(data1, "^WINTER WEATHER|^WINTRY MIX$", "WINTER WEATHER")
data1$EVENT_SDT <- recodeEventType(data1, "^HEAVY SURF|HIGH SURF$", "HIGH SURF")
data1$EVENT_SDT <- recodeEventType(data1, "^STORM SURGE$", "STORM SURGE/TIDE")
data1$EVENT_SDT <- recodeEventType(data1, "^LANDSLIDE$", "DEBRIS FLOW")
data1$EVENT_SDT <- recodeEventType(data1, "^HEAT WAVE$", "EXCESSIVE HEAT")
data1$EVENT_SDT <- recodeEventType(data1, "^SNOW SQUALL$", "BLIZZARD")

# Top 10 - Non-standard event types with greatest economic consequences.
# All event types were already covered by the above function calls. 

Now, factorizes the EVENT_SDT for summary and plotting purposes:

data1$EVENT_SDT <- as.factor(data1$EVENT_SDT)

Summarizes the data for top 10 harmful events:

dataPopHealth <- data1 %>% group_by(EVENT_SDT) %>%
        summarize("POPULATION" = sum(FATALITIES + INJURIES)) %>%
        arrange(desc(POPULATION))

dataPopHealth_10 <- head(dataPopHealth, 10)

Lists the top 10 harmful events:

dataPopHealth_10
##              EVENT_SDT POPULATION
## 1              TORNADO      22178
## 2       EXCESSIVE HEAT       8258
## 3                FLOOD       7281
## 4    THUNDERSTORM WIND       5505
## 5            LIGHTNING       4792
## 6          FLASH FLOOD       2561
## 7             WILDFIRE       1543
## 8         WINTER STORM       1483
## 9                 HEAT       1459
## 10 HURRICANE (TYPHOON)       1453

Lists the uncategorized events (OTHER) to ensure that the amount of casualties are insignificant to at least the top 3 categorized events.

filter(dataPopHealth, EVENT_SDT %in% "OTHER")
## Source: local data table [1 x 2]
## 
##   EVENT_SDT POPULATION
##      (fctr)      (dbl)
## 1     OTHER        423

It turns out that the remaining uncategorized events (OTHER) are insignificant to the top 6 categorized events.

Summarizes the data for top 10 events that have greatest economic consequences:

dataEcoDmg <- data1 %>% mutate(prop_dmg = ifelse(
                grepl("b|B", PROPDMGEXP), PROPDMG * 1e+09,
                ifelse(
                        grepl("m|M", PROPDMGEXP), PROPDMG * 1e+06,
                        ifelse(
                                grepl("k|K", PROPDMGEXP), PROPDMG * 1e+03,
                                ifelse(grepl("h|H", PROPDMGEXP), PROPDMG * 1e+02, 0)
                        )
                )
        )) %>%
        mutate(crop_dmg = ifelse(
                grepl("b|B", CROPDMGEXP), CROPDMG * 1e+09,
                ifelse(
                        grepl("m|M", CROPDMGEXP), CROPDMG * 1e+06,
                        ifelse(
                                grepl("k|K", CROPDMGEXP), CROPDMG * 1e+03,
                                ifelse(grepl("h|H", CROPDMGEXP), CROPDMG * 1e+02, 0)
                        )
                )
        )) %>%
        group_by(EVENT_SDT) %>%
        summarize("Total_DMG" = sum(prop_dmg + crop_dmg)) %>%
        arrange(desc(Total_DMG))

dataEcoDmg_10 <- head(dataEcoDmg, 10)

# Gets the monetary damages in billion dollars.
dataEcoDmg_10$Total_DMG_BIL <- sapply(dataEcoDmg_10$Total_DMG, function(figure) round(figure/1e+09, digits=2))

Lists the top 10 events that have greatest economic consequences:

dataEcoDmg_10
##              EVENT_SDT    Total_DMG Total_DMG_BIL
## 1                FLOOD 149120584700        149.12
## 2  HURRICANE (TYPHOON)  87068996810         87.07
## 3     STORM SURGE/TIDE  47835579000         47.84
## 4              TORNADO  24900370720         24.90
## 5                 HAIL  17071172870         17.07
## 6          FLASH FLOOD  16557155610         16.56
## 7              DROUGHT  14413667000         14.41
## 8    THUNDERSTORM WIND   8930498480          8.93
## 9       TROPICAL STORM   8320186550          8.32
## 10            WILDFIRE   8162704630          8.16

Lists the uncategorized events (OTHER) to ensure that its amount of property and crop damages are insignificant to at least the top 3 categorized events.

filter(dataEcoDmg, EVENT_SDT %in% "OTHER")
## Source: local data table [1 x 2]
## 
##   EVENT_SDT Total_DMG
##      (fctr)     (dbl)
## 1     OTHER 420551090

The remaining uncategorized events (OTHER) are insignificant to all the top 10 categorized events.

Results

Below shows the code and plot of top 10 harmful events vs. number of casualties:

ggplot(dataPopHealth_10, aes(EVENT_SDT, POPULATION, fill=EVENT_SDT)) + 
        geom_bar(stat = "identity", width = 0.8) + coord_flip() + 
        labs(title = "Top 10 Most Harmful Event Types w/. Respect to Population Health (1996 - 2011)") +
        labs(x = "Event Type", y = "Number of Casualties") + theme_bw() +
        geom_text(aes(y=POPULATION, ymax=POPULATION, label=POPULATION), 
                  position= position_dodge(width=0.9), vjust=.5, hjust=-0.1, color="black") +
        ##scale_fill_discrete(name ="Event Type") + 
        guides(fill=FALSE) +
        scale_y_discrete(expand = c(0, 4000), breaks = seq(0, 40000, 5000)) 

The top 3 most harmful storms and weather events with respect to population health are: Tornado, Excessive Heat, and Flood, respectively.

Below shows the code and plot of top 10 events that have greatest economic consequences vs. amount of damage in billion dollars:

ggplot(dataEcoDmg_10, aes(EVENT_SDT, Total_DMG_BIL, fill=EVENT_SDT)) + 
        geom_bar(stat = "identity", width = 0.8) + coord_flip() + 
        labs(title = "Top 10 Event Types Have Greatest Economic Consequences (1996 - 2011)") +
        labs(x = "Event Type", y = "Amount of Damage (in Billion Dollars)") + theme_bw() +
        geom_text(aes(y=Total_DMG_BIL, ymax=Total_DMG_BIL, label=Total_DMG_BIL), 
                  position= position_dodge(width=0.9), vjust=.5, hjust=-0.1, color="black") +
        ##scale_fill_discrete(name ="Event Type") + 
        guides(fill=FALSE) +
        scale_y_discrete(expand = c(0, 18), breaks = seq(0, 200, 40)) 

The top 3 storms and weather events have the greatest economic consequences are: Flood, Hurricane (Typhoon), and Storm Surge/Tide, respectively.