Severe weather analysis: most influential events of 2000-2011

Synopsis

Across the United States, which types of severe weather events have the most economic and public health impact? To answer this question, we explore the U.S. National Oceanic and Atmospheric Adminstration’s (NOAA) storm database. We focus on the years 2000-2011, and find top ten events across each of three measures: total monetary damage, injuries, and fatalities.

The results indicate that tornadoes, excessive heat, floods, and hurricanes are the most impactful events.

Data Processing

Since the database is compiled from user reports going as far back as the 1950’s, there are several important considerations to take into account. The event type (EVTYPE) column is not standardized, and contains many more unique values than the 47 described in the documentation. Additionally, going back in time, there are less events reported, thus giving the appearance of less damage. Finally, damage amounts several decades ago would be increased if inflation were accounted for. For these reasons, and to avoid spending the bulk of project time on data cleanup, we consider only events in the year 2000 and onwards.

The data was downloaded from here. Processing starts from the downloaded .bz2 archive.

Read in data and check the dimensions:

data <- read.csv(bzfile('data/repdata-data-StormData.csv.bz2'))
dim(data)
## [1] 902297     37

Remove columns not used in analysis:

keep <- which(names(data) %in% c("EVTYPE","BGN_DATE","FATALITIES","INJURIES",
                                 "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP"))
data <- data[, keep]

Convert BGN_DATE (event beginning date) to Date format, and discard data prior to 2000:

data$BGN_DATE <- as.Date(data$BGN_DATE,format="%m/%d/%Y 0:00:00")
data <- data[data$BGN_DATE >= '2000-01-01',]

Further reduce the dataset by removing any rows where damage, fatalities and injuries are all zero:

data <- data[data$FATALITIES != 0 | data$INJURIES != 0 | 
                 data$PROPDMG != 0 | data$CROPDMG != 0, ]

Convert the PROPDMGEXP and CROPDMGEXP columns to integer multipliers, and combine with PROPDMG and CROPDMG to create columns with dollar damage amounts. Then combine these two columns to get TOTAL_DMG, the total dollar damage.

data$PROPDMG_NEW <- NA
data$PROPDMG_NEW <- sapply(1:nrow(data), function(x) {
    if (data$PROPDMGEXP[x]=="K") {data$PROPDMG[x] * 1000}
    else if (data$PROPDMGEXP[x]=="M") {data$PROPDMG[x] * 1000000}
    else if (data$PROPDMGEXP[x]=="B") {data$PROPDMG[x] * 1000000000}
    else {data$PROPDMG[x]}
})

data$CROPDMG_NEW <- NA
data$CROPDMG_NEW <- sapply(1:nrow(data), function(x) {
    if (data$CROPDMGEXP[x]=="K") {data$CROPDMG[x] * 1000}
    else if (data$CROPDMGEXP[x]=="M") {data$CROPDMG[x] * 1000000}
    else if (data$CROPDMGEXP[x]=="B") {data$CROPDMG[x] * 1000000000}
    else {data$CROPDMG[x]}
})

data$TOTAL_DMG <- data$PROPDMG_NEW + data$CROPDMG_NEW

Some of the values in EVTYPE have more than one spelling. For example, TSTM WIND and THUNDERSTORM WIND refer to the same event type. Examine the top 50 current `EVTYPE values:

head(sort(table(data$EVTYPE),decreasing=T),50)
## 
##       THUNDERSTORM WIND               TSTM WIND                    HAIL 
##                   43097                   37975                   17732 
##             FLASH FLOOD                 TORNADO               LIGHTNING 
##                   14950                    9495                    8377 
##                   FLOOD               HIGH WIND             STRONG WIND 
##                    7857                    4229                    3344 
##            WINTER STORM                WILDFIRE              HEAVY RAIN 
##                    1098                     847                     846 
##              HEAVY SNOW               ICE STORM          EXCESSIVE HEAT 
##                     682                     468                     449 
##          WINTER WEATHER          TROPICAL STORM             RIP CURRENT 
##                     402                     381                     363 
##        WILD/FOREST FIRE               AVALANCHE        LAKE-EFFECT SNOW 
##                     227                     224                     194 
##               LANDSLIDE                 DROUGHT    URBAN/SML STREAM FLD 
##                     190                     187                     181 
##                    HEAT          TSTM WIND/HAIL                BLIZZARD 
##                     159                     156                     154 
##            RIP CURRENTS           COASTAL FLOOD      WINTER WEATHER/MIX 
##                     146                     144                     139 
##            FROST/FREEZE EXTREME COLD/WIND CHILL               HIGH SURF 
##                     116                     111                     110 
##        MARINE TSTM WIND         COLD/WIND CHILL             STORM SURGE 
##                     109                      90                      89 
##              DUST STORM       HURRICANE/TYPHOON              DUST DEVIL 
##                      77                      72                      70 
##              LIGHT SNOW               DENSE FOG                     FOG 
##                      65                      58                      53 
##    HEAVY SURF/HIGH SURF        STORM SURGE/TIDE      MARINE STRONG WIND 
##                      50                      47                      46 
##               HURRICANE            EXTREME COLD          DRY MICROBURST 
##                      43                      41                      38 
##                    WIND     TROPICAL DEPRESSION 
##                      36                      35

Fix some frequently-occurring strings:

levels(data$EVTYPE)[levels(data$EVTYPE)=='TSTM WIND'] <- 'THUNDERSTORM WIND'
levels(data$EVTYPE)[levels(data$EVTYPE)=='TSTM WIND/HAIL'] <- 'THUNDERSTORM WIND'
levels(data$EVTYPE)[levels(data$EVTYPE)=='THUNDERSTORM WIND/HAIL'] <- 'THUNDERSTORM WIND'
levels(data$EVTYPE)[levels(data$EVTYPE)=='WILD/FOREST FIRE'] <- 'WILDFIRE'
levels(data$EVTYPE)[levels(data$EVTYPE)=='URBAN/SML STREAM FLD'] <- 'FLOOD'
levels(data$EVTYPE)[levels(data$EVTYPE)=='RIP CURRENTS'] <- 'RIP CURRENT'
levels(data$EVTYPE)[levels(data$EVTYPE)=='WINTER WEATHER/MIX'] <- 'WINTER WEATHER'
levels(data$EVTYPE)[levels(data$EVTYPE)=='STORM SURGE/TIDE'] <- 'STORM SURGE'

Drop unused EVTYPE levels, and check number of levels and dimensions of the processed dataset:

data$EVTYPE <- droplevels(data$EVTYPE)
length(levels(data$EVTYPE))
## [1] 102
dim(data)
## [1] 156334     11

The data is now ready for analysis.

Analysis

In this section, we aggregate the data by event type to find the most impactful events.

For each of the three measures (total damage, injuries, fatalities), group each variable by event type, and calculate the percentage of total for each event:

damage_by_evtype <- aggregate(TOTAL_DMG ~ EVTYPE, data=data, sum)
damage_by_evtype <- damage_by_evtype[order(-damage_by_evtype$TOTAL_DMG),]
damage_by_evtype$TOTAL_DMG_PERC <- round(100*damage_by_evtype$TOTAL_DMG / 
                                             sum(damage_by_evtype$TOTAL_DMG),2)

injuries_by_evtype <- aggregate(INJURIES ~ EVTYPE,data=data,sum)
injuries_by_evtype <- injuries_by_evtype[order(-injuries_by_evtype$INJURIES),]
injuries_by_evtype$INJURIES_PERC <- round(100*injuries_by_evtype$INJURIES / 
                                              sum(injuries_by_evtype$INJURIES),2)

fatalities_by_evtype <- aggregate(FATALITIES ~ EVTYPE,data=data,sum)
fatalities_by_evtype <- fatalities_by_evtype[order(-fatalities_by_evtype$FATALITIES),]
fatalities_by_evtype$FATALITIES_PERC <- round(100*fatalities_by_evtype$FATALITIES / 
                                                  sum(fatalities_by_evtype$FATALITIES),2)

For each measure, get top 10 events, calculate percentage of total contained in the top 10, and list the top 10 events for each group.

damage_top10 <- head(damage_by_evtype,10)
damage_top10 <- transform(damage_top10, EVTYPE = reorder(EVTYPE,TOTAL_DMG))

injuries_top10 <- head(injuries_by_evtype,10)
injuries_top10 <- transform(injuries_top10, EVTYPE = reorder(EVTYPE,INJURIES))

fatalities_top10 <- head(fatalities_by_evtype,10)
fatalities_top10 <- transform(fatalities_top10, EVTYPE = reorder(EVTYPE,FATALITIES))

damage_top10_perc <- sum(damage_top10$TOTAL_DMG_PERC)
injuries_top10_perc <- sum(injuries_top10$INJURIES_PERC)
fatalities_top10_perc <- sum(fatalities_top10$FATALITIES_PERC)

print(damage_top10)
##               EVTYPE TOTAL_DMG TOTAL_DMG_PERC
## 29             FLOOD 1.389e+11          39.20
## 49 HURRICANE/TYPHOON 7.191e+10          20.29
## 80       STORM SURGE 4.781e+10          13.49
## 86           TORNADO 1.970e+10           5.56
## 37              HAIL 1.377e+10           3.89
## 28       FLASH FLOOD 1.278e+10           3.61
## 18           DROUGHT 9.982e+09           2.82
## 88    TROPICAL STORM 7.607e+09           2.15
## 98          WILDFIRE 7.419e+09           2.09
## 84 THUNDERSTORM WIND 6.246e+09           1.76
print(injuries_top10)
##               EVTYPE INJURIES INJURIES_PERC
## 86           TORNADO    15213         43.30
## 23    EXCESSIVE HEAT     3708         10.55
## 84 THUNDERSTORM WIND     3187          9.07
## 59         LIGHTNING     2993          8.52
## 49 HURRICANE/TYPHOON     1275          3.63
## 39              HEAT     1222          3.48
## 98          WILDFIRE     1197          3.41
## 28       FLASH FLOOD      812          2.31
## 47         HIGH WIND      677          1.93
## 37              HAIL      540          1.54
print(fatalities_top10)
##               EVTYPE FATALITIES FATALITIES_PERC
## 86           TORNADO       1193           19.90
## 23    EXCESSIVE HEAT       1013           16.90
## 28       FLASH FLOOD        600           10.01
## 59         LIGHTNING        466            7.77
## 73       RIP CURRENT        462            7.71
## 29             FLOOD        278            4.64
## 84 THUNDERSTORM WIND        248            4.14
## 39              HEAT        231            3.85
## 6          AVALANCHE        179            2.99
## 47         HIGH WIND        131            2.19

Create plots for each of the top 10 measures:

library(ggplot2)

options(scipen = 999)
p_damage <- ggplot(data=damage_top10, aes(x=EVTYPE,y=TOTAL_DMG/1e+09)) + 
    geom_bar(stat="identity",aes(fill=TOTAL_DMG_PERC)) + 
    theme(text = element_text(size=10)) + 
    labs(title="Top 10 Events by Economic Damage (2000-2011)", 
         x = "Event", y="Damage (Billion USD)") +
    coord_flip()

p_injuries <- ggplot(data=injuries_top10, aes(x=EVTYPE,y=INJURIES)) + 
    geom_bar(stat="identity",aes(fill=INJURIES_PERC)) + 
    theme(text = element_text(size=10)) + 
    labs(title="Top 10 Events by Injuries (2000-2011)", 
         x = "Event", y="Injuries") + 
    coord_flip()

p_fatalities <- ggplot(data=fatalities_top10, aes(x=EVTYPE,y=FATALITIES)) + 
    geom_bar(stat="identity",aes(fill=FATALITIES_PERC)) + 
    theme(text = element_text(size=10)) + 
    labs(title="Top 10 Events by Fatalities (2000-2011)", 
         x = "Event", y="Fatalities") + 
    coord_flip()

Results

The plots at the end of the report show the top ten event types in each of three measures for 2000-2011: economic damage (property and crop) in US dollars, injury counts, and fatality counts. In each plot, the bar fill gradient corresponds to percentage of the total.

The top-10 groups account 94.86%, 87.74%, and 80.1% of the total damage, injuries, and fatalities, respectively.

Tornadoes were most harmful in terms of injuries, causing 15213 injuries (43% of total).

Tornadoes and excessive heat were most harmful in terms of fatalities, causing 1193 and 1013 deaths (20% and 17% of total).

However, the most economic damage was caused by floods and hurricanes/typhoons, with 138.9 billion and 71.9 billion U.S. dollars in damage (39% and 20% of total).

print(p_damage)

plot of chunk plot_results1

print(p_injuries)

plot of chunk plot_results1

print(p_fatalities)

plot of chunk plot_results1

Additional information

This analysis was written as the course project for the Reproducible Research Course on Coursera.