Synopsis

We analyse the U.S. NOAA Storm Database, looking specifically at the harm caused to both populations and property. We consider the entire data set (1950-2011). To assist with setting resource priorities, we show the top 10 harm-causing weather events in each case.

Data Processing

Required libraries:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(reshape2)

We retrieve the US NOAA Storm database, which is a BZipped CSV file. We also download the supplemental PDF documentation. This analysis assumes R has the working directory set to analysis directory already (i.e. ‘.’ is the analysis directory).

get_file <- function(url,datafile) {
    if (!file.exists(datafile)) {
        download.file(url,datafile)
    }
}
get_file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','./noaa-storm-db.csv.bz2')
get_file('https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf','./noaa-storm-data-docs.pdf')
get_file('https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf','./noaa-storm-data-faq.pdf')
storm_db <- as_tibble(read.csv('./noaa-storm-db.csv.bz2'))

A quick overview of the dataset.

names(storm_db)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

head(storm_db)

## # A tibble: 6 x 37
##   STATE… BGN_… BGN_… TIME… COUN… COUN… STATE EVTY… BGN_… BGN_… BGN_… END_…
##    <dbl> <fct> <fct> <fct> <dbl> <fct> <fct> <fct> <dbl> <fct> <fct> <fct>
## 1   1.00 4/18… 0130  CST   97.0  MOBI… AL    TORN…     0 ""    ""    ""   
## 2   1.00 4/18… 0145  CST    3.00 BALD… AL    TORN…     0 ""    ""    ""   
## 3   1.00 2/20… 1600  CST   57.0  FAYE… AL    TORN…     0 ""    ""    ""   
## 4   1.00 6/8/… 0900  CST   89.0  MADI… AL    TORN…     0 ""    ""    ""   
## 5   1.00 11/1… 1500  CST   43.0  CULL… AL    TORN…     0 ""    ""    ""   
## 6   1.00 11/1… 2000  CST   77.0  LAUD… AL    TORN…     0 ""    ""    ""   
## # ... with 25 more variables: END_TIME <fctr>, COUNTY_END <dbl>,
## #   COUNTYENDN <lgl>, END_RANGE <dbl>, END_AZI <fctr>, END_LOCATI <fctr>,
## #   LENGTH <dbl>, WIDTH <dbl>, F <int>, MAG <dbl>, FATALITIES <dbl>,
## #   INJURIES <dbl>, PROPDMG <dbl>, PROPDMGEXP <fctr>, CROPDMG <dbl>,
## #   CROPDMGEXP <fctr>, WFO <fctr>, STATEOFFIC <fctr>, ZONENAMES <fctr>,
## #   LATITUDE <dbl>, LONGITUDE <dbl>, LATITUDE_E <dbl>, LONGITUDE_ <dbl>,
## #   REMARKS <fctr>, REFNUM <dbl>

We can see that the EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP columns will be the focus of this analysis.

Population Harm

For the first question, the impact on public health, we examine the FATALITIES and INJURIES columns, grouped by EVTYPE. This will tell us the most impactful weather event types, and for interest we keep both the fatalities and the injuries, as well as the sum.

health <- storm_db %>% 
            select(EVTYPE, FATALITIES, INJURIES) %>% 
            group_by(EVTYPE) %>% 
            mutate(IMPACT = FATALITIES + INJURIES) %>% 
            summarize_all(sum) %>% 
            arrange(desc(IMPACT))

Economic Harm

For the second question, the economic impact, we first need to construct a cost estimate. We note there are four relevant fields - PROPDMG, PROPDMGEXP, CROPDMG, & CROPDMGEXP. The documents do not show the usage of these columns, but we assume these form pairs of the form $PROPDMG x10 ^ $PROPDMGEXP.

Looking at PROPDMGEXP, we note several types of usage.

summary(storm_db$PROPDMGEXP)

##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

There are numbers, letters, and other characters here. With no clear index of what these mean, we make the following assumptions:

Numerical values are base-10 exponents
Letters represent the usual multipliers (k = thousands, m = millions, b = billions)
No difference between upper and lower case (k == K, …)
‘h’, ‘H’, ‘+’, ‘?’ and ‘-’ have no obvious interpretation
- These are only a few records, so they are discounted

To do this, we use mutate with a case statement to convert the character fields to the equivalent numerical exponent, and then multiply the raw values by the exponent to get a final value of the damage.

raw_costs <- storm_db %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
                mutate(PROPDMGEXP = case_when(PROPDMGEXP == 'K' ~ '3', #thousands
                                              PROPDMGEXP == 'm' ~ '6', # millions
                                              PROPDMGEXP == 'M' ~ '6',
                                              PROPDMGEXP == 'B' ~ '9', #billions
                                              PROPDMGEXP == '' ~ '0', #empty value
                                              TRUE ~ as.character(PROPDMGEXP))) %>%
                mutate(CROPDMGEXP = case_when(CROPDMGEXP == 'k' ~ '3', #thousands
                                              CROPDMGEXP == 'K' ~ '3',
                                              CROPDMGEXP == 'm' ~ '6', # millions
                                              CROPDMGEXP == 'M' ~ '6',
                                              CROPDMGEXP == 'B' ~ '9', #billions
                                              CROPDMGEXP == '' ~ '0', #empty value
                                              TRUE ~ as.character(CROPDMGEXP))) %>%
                filter(PROPDMGEXP %in% as.character(seq(0,9))) %>%
                filter(CROPDMGEXP %in% as.character(seq(0,9)))

costs <- raw_costs %>% 
            mutate(PROPCOST = PROPDMG * 10 ^ as.numeric(PROPDMGEXP)) %>%
            mutate(CROPCOST = CROPDMG * 10 ^ as.numeric(PROPDMGEXP)) %>%
            mutate(COST = PROPCOST + CROPCOST) %>%
            select(EVTYPE, PROPCOST, CROPCOST, COST) %>%
            group_by(EVTYPE) %>%
            summarize_all(sum) %>% 
            arrange(desc(COST))

Results

Harm to population health

“Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?”

To answer this, we’ll take the top 10 event types by total of fatalities and injuries, and then plot the resulting data.

health_10 <- head(health, n = 10) %>%
    select(EVTYPE,FATALITIES,INJURIES) %>%
    melt(id.vars=c("EVTYPE"))

ggplot(health_10, aes(x = reorder(EVTYPE, -value), y = value, fill = variable)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_fill_discrete(name='') +
    labs(title = 'Top ten weather events by harm to U.S. population',
         x = 'Event Type', y = 'Total Injuries & Fatalities')

We can clearly see that tornadoes represent the vast majority of both the injuries and fatalities to the U.S. population. Excluding that, we see largly similar values for ‘Excessive Heat’, ‘Thunderstorm Wind’, ‘Flood’, and ‘Lightning’.

Harm to the U.S. Economy

“Across the United States, which types of events have the greatest economic consequences?”

Again, we’ll take the top 10 event types by total of property damages and crop damages, and then plot the resulting data. We’ll divide the damage totals by 1 billion so that the y-axis is readable.

costs_10 <- head(costs, n = 10) %>%
    select(EVTYPE,PROPCOST,CROPCOST) %>%
    mutate(PROPCOST = PROPCOST/1000000000, CROPCOST = CROPCOST/1000000000) %>%
    melt(id.vars=c("EVTYPE"))

ggplot(costs_10, aes(x = reorder(EVTYPE, -value), y = value, fill = variable)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_fill_discrete(name = '', labels = c('Property', 'Crops')) +
    labs(title = 'Top ten weather events by damage to U.S. Economy',
         x = 'Event Type', y = 'Damages (billions of dollars)')

The largest damage by far comes from the ‘Hurricane’ and ‘Hurricane/Typhoon’ category, and we also see that this is driven strongly by damage to crops. Looking more closely at property damage, we see that tornadoes and flood-related events (flash floods, storm surges, etc) are also noteworthy.

Impact of Storm Damage Across the United States

Greg Sutcliffe

15 April 2018