Economic and health related harmfulness of US weather events

This report analyzes the NOAA storm database between 1950 and November 2011 in order to find the most harmful weather event types regarding population health and economic damage.

The top cause of harm to population health are tornado events.

The top cause of economic damage are flood events.

Data Processing

original_data <- read.csv("repdata-data-StormData.csv.bz2")

The event type is found in the column EVTYPE, health harm is a combination of columns FATALITIES and INJURIES and finally, economic damage is a combination of columns PROPDMG and CROPDMG.

Each damage column has a corresponding exponent column for scaling the amounts. Valid exponents are ‘K’, ‘M’ ‘B’ for thousands, millions and billings respectively.

To get raw dollar values for the damages, we multiply each value with the corresponding value for its exponent.

numeric_exponent <- function(exponent) {
  if (exponent == 'K') {
    1000
  } else if (exponent == 'M') {
    1000000
  } else if (exponent == 'B') {
    1000000000
  } else {
    1
  }
}
data <- data.frame(evtype = original_data$EVTYPE,
                   fatalities = original_data$FATALITIES,
                   injuries = original_data$INJURIES,
                   propdmg = original_data$PROPDMG * sapply(original_data$PROPDMGEXP,
                                                            numeric_exponent),
                   cropdmg = original_data$CROPDMG * sapply(original_data$CROPDMGEXP,
                                                            numeric_exponent))

We add a total damage column containing the sum of propdmg and cropdmg, and a total harm column containing the sum of fatalities and injuries (albeit one could very well question whether it is correct to simply add these to types of harm).

data$dmg <- data$cropdmg + data$propdmg
data$harm <- data$fatalities + data$injuries
str(data)
## 'data.frame':    902297 obs. of  7 variables:
##  $ evtype    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ dmg       : num  25000 2500 25000 2500 2500 2500 2500 2500 25000 25000 ...
##  $ harm      : num  15 0 2 2 2 6 1 0 15 0 ...

We see that there are a little bit more than 900,000 events in the data.

In order to find the event with most harm or damage, we need to group the data according to event type and calculate the total harm and damage.

agg <- aggregate(list(fatalities = data$fatalities,
                      injuries = data$injuries,
                      harm = data$harm,
                      propdmg = data$propdmg,
                      cropdmg = data$cropdmg,
                      dmg = data$dmg),
                 list(evtype = data$evtype),
                 sum)

Now lets calculate the top five for each harm and damage column.

top_fatalities <- head(agg[order(agg$fatalities, decreasing=T),c('evtype', 'fatalities')], n=5)
top_injuries <- head(agg[order(agg$injuries, decreasing=T),c('evtype', 'injuries')], n=5)
top_harm <- head(agg[order(agg$harm, decreasing=T),c('evtype', 'harm')], n=5)
top_propdmg <- head(agg[order(agg$propdmg, decreasing=T),c('evtype', 'propdmg')], n=5)
top_cropdmg <- head(agg[order(agg$cropdmg, decreasing=T),c('evtype', 'cropdmg')], n=5)
top_dmg <- head(agg[order(agg$dmg, decreasing=T),c('evtype', 'dmg')], n=5)

Results

Looking at the top fatalities and injuries cause, we see that both are the same, i.e. tornado. And the total harm (defined ad hoc as the sum of fatalities and injuries) is also highest for tornado.

top_fatalities
##             evtype fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
top_injuries
##             evtype injuries
## 834        TORNADO    91346
## 856      TSTM WIND     6957
## 170          FLOOD     6789
## 130 EXCESSIVE HEAT     6525
## 464      LIGHTNING     5230
top_harm
##             evtype  harm
## 834        TORNADO 96979
## 130 EXCESSIVE HEAT  8428
## 856      TSTM WIND  7461
## 170          FLOOD  7259
## 464      LIGHTNING  6046

Doing the same for the economic damages, we see that there is a difference between property damage and crop damage. The highest property damage total was caused by flood, the highest crop damage total was caused by drought. Taken together, the top cause of economic damage is flood.

top_propdmg
##                evtype      propdmg
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56925660790
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16140812067
top_cropdmg
##          evtype     cropdmg
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025537890
top_dmg
##                evtype          dmg
## 170             FLOOD 150319678257
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57340614060
## 670       STORM SURGE  43323541000
## 244              HAIL  18752904943
palette <- c(rgb(1, 0, 0),
             rgb(0.8, 0.5, 0),
             rgb(0.7, 0.45, 0),
             rgb(0.6, 0.4, 0),
             rgb(0.5, 0.35, 0))
barplot(top_harm$harm, 
        legend = top_harm$evtype,
        col = palette,
        ylab = 'Harm (Fatalities + Injuries)',
        main = 'Top five causes of storm related fatalities and injuries')

The above barplot shows the total harm for each of the top five event types regarding harm (fatalities and injuries) for the full data period (1950 through november 2011).

barplot(top_dmg$dmg / 1000000000, 
        legend = top_dmg$evtype,
        col = palette,
        ylab = 'Damage [B$]',
        main = 'Top five causes of storm related damage')

The above barplot shows the total damage for each of the top five event types regarding damages (property and crop) for the full data period (1950 through november 2011).

Further questions

This report only looks at the total data from 1950 to 2011. One obvious question to investigate would be, whether the damage totals changed in this time. In other words, were the e.g. last ten years different from the 50’s?

Another question to pursue would be: which weather type does have the highest harm/damage per event?

Finally, as far as I can tell, the damage numbers are dollar values for the time when the event happened. If we were to start comparing earlier periods with later periods, then we would have to adjust the values for inflation.