This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

We are asking the questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Synopsis: Tornados have the most harmful in terms of fatalities, with storms causing the most injuries. Floods do the most property damage.

Data Processing

We first download the Storm Data as a bz2 file (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2) and bunzip it into our current directory.

storm <- read.table("repdata-data-StormData.csv.bz2", sep = ",", header=TRUE, na.strings = "")

Population Health

For the first question about which event type has the most harm with respect to population health, we are looking at the FATALITIES and INJURIES values.

library(dplyr)
dim(storm)
## [1] 902297     37
#Remove rows not in US states and DC
data(state) #built in list of states
storm_us <- storm[which(storm$STATE %in% c(state.abb, "DC")),]
dim(storm_us)
## [1] 883623     37
storm_unhealthy <- storm_us %>% filter(FATALITIES > 0 | INJURIES > 0) %>% select(EVTYPE, STATE, FATALITIES, INJURIES) %>% arrange(desc(FATALITIES), desc(INJURIES))
dim(storm_unhealthy)
## [1] 21709     4
head(storm_unhealthy, 20)
##            EVTYPE STATE FATALITIES INJURIES
## 1            HEAT    IL        583        0
## 2         TORNADO    MO        158     1150
## 3         TORNADO    MI        116      785
## 4         TORNADO    TX        114      597
## 5  EXCESSIVE HEAT    IL         99        0
## 6         TORNADO    MA         90     1228
## 7         TORNADO    KS         75      270
## 8  EXCESSIVE HEAT    PA         74      135
## 9  EXCESSIVE HEAT    PA         67        0
## 10        TORNADO    MS         57      504
## 11   EXTREME HEAT    WI         57        0
## 12        TORNADO    AR         50      325
## 13 EXCESSIVE HEAT    TX         49        0
## 14 EXCESSIVE HEAT    CA         46       18
## 15        TORNADO    AL         44      800
## 16        TORNADO    TX         42     1700
## 17 EXCESSIVE HEAT    MO         42      397
## 18 EXCESSIVE HEAT    NY         42        0
## 19        TORNADO    MS         38      270
## 20        TORNADO    MO         37      176
#Compare fatalities to injuries
plot(log(storm_unhealthy$FATALITIES+1), log(storm_unhealthy$INJURIES+1), main="Log of Fatalities vs Injuries")

#Look at the top 100 of each
head(sort(storm_unhealthy$FATALITIES, decreasing = TRUE), 100)
##   [1] 583 158 116 114  99  90  75  74  67  57  57  50  49  46  44  42  42
##  [18]  42  38  37  36  34  33  33  33  32  32  31  31  31  30  30  30  29
##  [35]  29  29  27  27  27  26  25  25  25  25  25  24  24  24  24  23  23
##  [52]  23  22  22  22  22  22  22  21  21  21  20  20  20  20  20  20  20
##  [69]  19  18  18  17  17  17  17  17  17  17  16  16  16  16  16  16  16
##  [86]  16  16  16  16  15  15  15  15  15  14  14  14  14  14  14
head(sort(storm_unhealthy$INJURIES, decreasing = TRUE), 100)
##   [1] 1700 1568 1228 1150 1150  800  800  785  780  750  700  600  597  560
##  [15]  550  519  504  500  500  500  500  500  500  500  463  450  450  450
##  [29]  437  411  410  397  385  350  350  350  350  342  325  306  300  300
##  [43]  300  300  300  293  280  280  275  270  270  270  266  258  257  257
##  [57]  252  252  250  250  250  246  241  240  234  230  225  225  224  223
##  [71]  216  215  210  207  200  200  200  200  200  200  200  200  200  200
##  [85]  200  200  200  200  200  200  200  200  200  200  195  192  192  190
##  [99]  185  185

The vast majority of events have very few fatalities or injuries. Now lets see what the EVTYPE is for the top 200 of storm_unhealthy.

sort(table(factor(head(storm_unhealthy$EVTYPE, 200))), decreasing = TRUE)
## 
##                    TORNADO             EXCESSIVE HEAT 
##                        123                         34 
##                       HEAT                  HEAT WAVE 
##                          8                          6 
##                FLASH FLOOD                      FLOOD 
##                          4                          3 
##               EXTREME HEAT                  LANDSLIDE 
##                          2                          2 
##                   WILDFIRE              COLD AND SNOW 
##                          2                          1 
##                 DUST STORM    EXTREME COLD/WIND CHILL 
##                          1                          1 
##                        FOG                  HIGH SURF 
##                          1                          1 
##                  HURRICANE          HURRICANE/TYPHOON 
##                          1                          1 
##      RECORD/EXCESSIVE HEAT                STORM SURGE 
##                          1                          1 
##           STORM SURGE/TIDE TORNADOES, TSTM WIND, HAIL 
##                          1                          1 
##             TROPICAL STORM                  TSTM WIND 
##                          1                          1 
##          UNSEASONABLY WARM  UNSEASONABLY WARM AND DRY 
##                          1                          1 
##              WINTER STORMS 
##                          1

There is some common thems related to storms (including hurricanes), heat waves and floods. Let try to group these into a new EVGROUP factor.

evgroup <- function(ev){
    if(grepl("WARM|HEAT|DRY", ev)) 
        "HEAT RELATED"
    else if(grepl("TORNADO|FUNNEL", ev)) 
        "TORNADO"
    else if(grepl("LIGHTNING", ev)) 
        "LIGHTNING"
    else if(grepl("FIRE", ev)) 
        "WILDFIRE"
    else if(grepl("WINTER|WINTRY|ICE|ICY|BLIZZARD|COLD|FREEZE|FROST|LOW|FREEZING|SNOW|AVALANCHE|CHILL|HYPOTHERMIA|HYPERTHERMIA|SLEET|HAIL", ev)) 
        "COLD AND SNOW RELATED"
    else if(grepl("FLOOD|TIDE|SURGE|SURF|MARINE|SEAS|WATER|FLD|RIP|DROWNING|WAVE|TSUNAMI", ev)) 
        "FLOOD AND SEA RELATED"
    else if(grepl("DUST|FOG", ev)) 
        "DUST OR FOG RELATED"
    else if(grepl("STORM|HURRICANE|WIND|TYPHOON|RAIN|PRECIP", ev)) 
        "STORM AND WIND RELATED"
    else
        "OTHER"
}
storm_unhealthy$EVGROUP <- factor(sapply(toupper(storm_unhealthy$EVTYPE), evgroup))

Firt lets look at fatalities and injury counts by these new groups to geta sense of their variance.

library(ggplot2);library(Hmisc);library(gridExtra);
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     combine, src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
#boxplot for fatalities by new group
qplot(EVGROUP, log10(FATALITIES), data=storm_unhealthy, fill=EVGROUP, geom=c("boxplot"), main = "Log10 FATALITIES Blox Plot by Group Type")

#boxplot for injuries by new group
qplot(EVGROUP, log10(INJURIES), data=storm_unhealthy, fill=EVGROUP, geom=c("boxplot"), main = "Log10 UNJURIES Blox Plot by Group Type")

Note that FLOOD AND SEA RELATED has the single largest loss of life events, closly followed by TORNADO.

Economic Consequence

But For the second question about economic consequence, we will look at PROPDMG and CROPDMG and adjust for their units.

storm_costly <- storm_us %>% filter(PROPDMG > 0 | CROPDMG > 0) %>% select(EVTYPE, STATE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
#Adjust for units
k <- which(toupper(storm_costly$PROPDMGEXP) == "K")
storm_costly$PROPDMG[k] = (storm_costly$PROPDMG[k] * 1e3)
m <- which(toupper(storm_costly$PROPDMGEXP) == "M")
storm_costly$PROPDMG[m] = (storm_costly$PROPDMG[m] * 1e6)
b <- which(toupper(storm_costly$PROPDMGEXP) == "B")
storm_costly$PROPDMG[b] = (storm_costly$PROPDMG[b] * 1e9)

k <- which(toupper(storm_costly$CROPDMGEXP) == "K")
storm_costly$CROPDMG[k] = (storm_costly$CROPDMG[k] * 1e3)
m <- which(toupper(storm_costly$CROPDMGEXP) == "M")
storm_costly$CROPDMG[m] = (storm_costly$CROPDMG[m] * 1e6)
b <- which(toupper(storm_costly$CROPDMGEXP) == "B")
storm_costly$CROPDMG[b] = (storm_costly$CROPDMG[b] * 1e9)

storm_costly$EVGROUP <- factor(sapply(toupper(storm_costly$EVTYPE), evgroup))
dim(storm_costly)
## [1] 243911      7

Results

We tabulate our total fatalities and injuries.

storm_unhealthy %>% group_by(EVGROUP) %>% summarise(TotalFatalities=sum(FATALITIES), TotalInjuries=sum(INJURIES)) %>% arrange(desc(TotalFatalities), desc(TotalInjuries))
## Source: local data frame [9 x 3]
## 
##                  EVGROUP TotalFatalities TotalInjuries
## 1                TORNADO            5661         91410
## 2           HEAT RELATED            3181          9272
## 3  FLOOD AND SEA RELATED            2219          9471
## 4 STORM AND WIND RELATED            1378         12889
## 5  COLD AND SNOW RELATED            1378          8125
## 6              LIGHTNING             807          5226
## 7    DUST OR FOG RELATED             104          1559
## 8               WILDFIRE              90          1607
## 9                  OTHER              49           276

Tornados have the most harmful in terms of population health in terms of fatalities, and the aggregate of storm types is the most harful in terms of injuries.

We can tabulate our total property and crop damage.

storm_costly %>% group_by(EVGROUP) %>% summarise(TotalProperty=sum(PROPDMG), TotalCrop=sum(CROPDMG)) %>% arrange(desc(TotalProperty), desc(TotalCrop))
## Source: local data frame [9 x 3]
## 
##                  EVGROUP TotalProperty   TotalCrop
## 1  FLOOD AND SEA RELATED  215346005032 12332670200
## 2 STORM AND WIND RELATED  110333489447  8300863738
## 3                TORNADO   58592827629   417460520
## 4  COLD AND SNOW RELATED   28737600284 11863838773
## 5               WILDFIRE    8489501500   402116630
## 6                  OTHER    1370338650 14140418950
## 7              LIGHTNING     937999447    12097090
## 8    DUST OR FOG RELATED      29147130     3600000
## 9           HEAT RELATED      27058350   904494280

Floods do the most property damage, while OTHER (which includes Drought) does the most crop damage.