Severe Weather Events 1996-2011: Population Health and Economic Impact

Synopsis

This analysis uses Storm Data collected by the U.S. National Oceanic and Atmospheric Administration (NOAA) to consider which types of storms and weather events result in the greatest number of fatalities, injuries, property damage and crop damage. In order to accomplish this in a reproduceable fashion, we will provide code and prose on how the NOAA data was processed, cleaned and subsetted, and then additional details about the analysis and visualization of the resulting data.

Data Processing

First we set our library requirements and load in the raw data.

require(plyr)
require(ggplot2)
noaa <- read.csv('repdata%2Fdata%2FStormData.csv.bz2')
names(noaa) <- tolower(names(noaa))

Then we subset the data so that it only reflects the time period after 1996. The data collection methods from 1950 to 1954 primarily focused on tornados. The period from 1954 to 1995 is focused only on tornados, thunderstorm wind, and hail. Our study will cover the period from 1996 to 2011.

noaa$bgn_date <- as.Date(noaa$bgn_date,"%m/%d/%Y")
noaa2 <- subset(noaa, bgn_date >= "1996-01-01")

Next we subset so our data only includes columns for variables related to event types, fatalities, injuries, property damage, and crop damage.

noaa_sub <- select(noaa2, evtype, fatalities, injuries, propdmg, propdmgexp, cropdmg, cropdmgexp)

Unfortunately the event type data is extremely messy and would require considerable work to categorize. We can see evidence of this by briefly checking for the number of evtypes that begin with the letter “C”.

sample <- noaa_sub$evtype
sample <- grep("^[cC].*", sample, value = T)
table(sample)
## sample
##    COASTAL  FLOODING/EROSION              COASTAL EROSION 
##                            1                            1 
##                Coastal Flood                COASTAL FLOOD 
##                            6                          589 
##             coastal flooding             Coastal Flooding 
##                            2                           38 
##             COASTAL FLOODING     COASTAL FLOODING/EROSION 
##                          107                            5 
##                Coastal Storm                COASTAL STORM 
##                            2                            8 
##                 COASTALFLOOD                 COASTALSTORM 
##                            1                            1 
##                         Cold                         COLD 
##                           10                           34 
##               Cold and Frost               COLD AND FROST 
##                            6                            1 
##                COLD AND SNOW             Cold Temperature 
##                            1                            2 
##            COLD TEMPERATURES                 COLD WEATHER 
##                            4                            1 
## COLD WIND CHILL TEMPERATURES              COLD/WIND CHILL 
##                            6                          539 
##                   COOL SPELL        CSTL FLOODING/EROSION 
##                            1                            2

An effective approach will be to simply standardize the cases of evtype using ‘toupper’. This will cut the event types down by half. Lazy, but effective.

ev_all <- noaa_sub$evtype
str(ev_all)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 972 834 856 856 856 244 359 856 856 856 ...
noaa_sub$evtype <- as.factor(toupper(noaa_sub$evtype))
str(noaa_sub$evtype)
##  Factor w/ 438 levels "   HIGH SURF ADVISORY",..: 432 358 365 365 365 113 142 365 365 365 ...

Next we need to clean the prop and crop dmg and corresponding exponenatial fields to make a single clean field indicating total damage cost per event.

table(noaa_sub$propdmgexp)
## 
##             -      ?      +      0      1      2      3      4      5 
## 276185      0      0      0      1      0      0      0      0      0 
##      6      7      8      B      h      H      K      m      M 
##      0      0      0     32      0      0 369938      0   7374
table(noaa_sub$cropdmgexp)
## 
##             ?      0      2      B      k      K      m      M 
## 373069      0      0      0      4      0 278686      0   1771

We can see that there only a few variables we need to account for: B, K, M. We’ll write a function called ‘calcfunc’ that handles those cases plus the majority of instances where the field is blank. We then apply that to the prop and crop fields to create a new field indicating total cost.

calcfunc <- function(damage, exp) {
    if (exp == ""){
        damage 
    } else if (exp == 'K'){
        damage * 1000
    } else if (exp == "M"){
        damage * 1e+06
    } else if (exp == "B"){
        damage * 1e+09
    } else 0
}

noaa_sub$propcost <- unlist(mapply(calcfunc, noaa_sub$propdmg, noaa_sub$propdmgexp, SIMPLIFY=FALSE))
noaa_sub$cropcost <- unlist(mapply(calcfunc, noaa_sub$cropdmg, noaa_sub$cropdmgexp, SIMPLIFY=FALSE))

Results

1: Across the United States, which types of events are most harmful with respect to population health?

First we took a look at the top 5 events with the most fatalities.

fatal <- arrange(aggregate(fatalities ~ evtype, noaa_sub, sum), fatalities, decreasing = T)
fatal <- head(fatal, 5)
names(fatal) <- c("evtype", "total")
fatal
##           evtype total
## 1 EXCESSIVE HEAT  1797
## 2        TORNADO  1511
## 3    FLASH FLOOD   887
## 4      LIGHTNING   651
## 5          FLOOD   414

Next we do the same for injuries.

injure <- arrange(aggregate(injuries ~ evtype, noaa_sub, sum), injuries, decreasing = T)
injure <- head(injure, 5)
names(injure) <- c("evtype", "total")
injure
##           evtype total
## 1        TORNADO 20667
## 2          FLOOD  6758
## 3 EXCESSIVE HEAT  6391
## 4      LIGHTNING  4141
## 5      TSTM WIND  3629

Next we plot them together.

harmful <- rbind(mutate(fatal, factor="fatalities"), mutate(injure, factor="injuries"))
ggplot(harmful, aes(evtype, total, fill=factor)) +
    geom_bar(position='stack', stat='identity') +
    labs(title = "Public Health - Most Harmful Events since 1996", fill = "", x = "Event Type", y = "# of Fatalities / Injuries") + 
    theme(axis.text.x = element_text(angle = 45, hjust=1))

plot of chunk unnamed-chunk-10

From this we can see clearly the events most detrimental to public health are tornados and excessive heat.

2: Across the United States, which types of events have the greatest economic consequences?

First we took a look at the top 5 events with the most property damage.

property <- aggregate(propcost ~ evtype, noaa_sub, sum)
property <- property[order(-property$propcost),]
property <- head(property, 5)
names(property) <- c("evtype", "total")
property
##                evtype     total
## 88              FLOOD 1.439e+11
## 150 HURRICANE/TYPHOON 6.931e+10
## 279       STORM SURGE 4.319e+10
## 358           TORNADO 2.462e+10
## 85        FLASH FLOOD 1.522e+10

Next we do the same for crops.

crop <- aggregate(cropcost ~ evtype, noaa_sub, sum)
crop <- crop[order(-crop$cropcost),]
crop <- head(crop, 5)
names(crop) <- c("evtype", "total")
crop
##                evtype     total
## 53            DROUGHT 1.337e+10
## 88              FLOOD 4.975e+09
## 148         HURRICANE 2.741e+09
## 150 HURRICANE/TYPHOON 2.608e+09
## 113              HAIL 2.476e+09

Next we plot them together.

ecoimpact <- rbind(mutate(property, factor="property damage"), mutate(crop, factor="crop damage"))
ggplot(ecoimpact, aes(evtype, total, fill=factor)) +
    geom_bar(position='stack', stat='identity') +
    labs(title = "Economic Impact - Most Harmful Events since 1996", fill = "", x = "Event Type", y = "$s of Damage") + 
    theme(axis.text.x = element_text(angle = 45, hjust=1))

plot of chunk unnamed-chunk-13

From this analysis we can conclude that floods and hurricane/typhoon have caused the most overall property damage, however drought has the highest impact on crop damage.

Conclusion.

From 1996 to 2011, the severe weather events most detrimental to public health were tornados and excessive heat. Floods and hurricanes have caused the most overall property damage, and drought has exhibited the highest impact on crop damage.