Synopsis

In this report we aim to list the most harmful severe weather event types in the USA during years 1950-2011. Harm is investigated on two different levels - public health and economic. The data for this analysis come from the publicly availabe U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States. From these data we found that the single most fatal event type is tornado. It causes the most injuries as well. However, the most costly event type when combining damage caused to property and crops is flood.

Source of Data

The storm data is available in comma-separated-value format in a file compressed via the bzip2 algorithm at this web site:

The documentation of the database can be found in:

Data Processing

The code for loading the data directly from the above mentioned source is presented below. The programming language is R version 3.1.2 (2014-10-31). The data.table package is used for subsequent processing and ggplot2 for plotting.

library(data.table)
library(ggplot2)

# Fetch dataset if not exists in the current working directory
filename <- "repdata_data_StormData.csv.bz2"
if (!file.exists(filename)) {
    fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    download.file(fileURL, destfile=filename, method="curl")
}
data <- read.csv(bzfile(filename, open="r"), stringsAsFactors=FALSE)
data <- as.data.table(data)

Before the actual study, let’s take a quick look at the data.

The structure of the data looks like this:

str(data)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>

The relevant variables in this study are EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP.

The number of different event types:

length(unique(data$EVTYPE))
## [1] 985

The time range of the data:

range(as.Date(data$BGN_DATE, format="%m/%d/%Y"))
## [1] "1950-01-03" "2011-11-30"

Results

Most Harmful Event Types with Respect to Population Health

The harm caused to public health by severe weather events comes in two forms: fatalities and injuries. These are both first summarized and grouped by event type over the all time data set.

fatal <- data[, .(fatalities=sum(FATALITIES), injuries=sum(INJURIES)), by=EVTYPE]

The summarized data is then ordered by fatality count to show the 20 most fatal event types of all times.

fatal <- fatal[order(-fatalities, -injuries)]
ggplot(fatal[1:20], aes(x=reorder(EVTYPE, fatalities), y=fatalities)) + 
    geom_bar(stat="identity", fill="grey30") + coord_flip() +
    ylab("Fatalities") +
    theme(axis.title.y=element_blank(), plot.title=element_text(face="bold")) +
    ggtitle("Top 20 most fatal event types")

Exact numbers of event types ordered by fatality:

fatal[1:20]
##                      EVTYPE fatalities injuries
##  1:                 TORNADO       5633    91346
##  2:          EXCESSIVE HEAT       1903     6525
##  3:             FLASH FLOOD        978     1777
##  4:                    HEAT        937     2100
##  5:               LIGHTNING        816     5230
##  6:               TSTM WIND        504     6957
##  7:                   FLOOD        470     6789
##  8:             RIP CURRENT        368      232
##  9:               HIGH WIND        248     1137
## 10:               AVALANCHE        224      170
## 11:            WINTER STORM        206     1321
## 12:            RIP CURRENTS        204      297
## 13:               HEAT WAVE        172      309
## 14:            EXTREME COLD        160      231
## 15:       THUNDERSTORM WIND        133     1488
## 16:              HEAVY SNOW        127     1021
## 17: EXTREME COLD/WIND CHILL        125       24
## 18:             STRONG WIND        103      280
## 19:                BLIZZARD        101      805
## 20:               HIGH SURF        101      152

The event types are different for injuries, so they are shown next.

fatal <- fatal[order(-injuries, -fatalities)]
ggplot(fatal[1:20], aes(x=reorder(EVTYPE, injuries), y=injuries)) + 
    geom_bar(stat="identity", fill="tomato2") + coord_flip() +
    ylab("Injuries") +
    theme(axis.title.y=element_blank(), plot.title=element_text(face="bold")) +
    ggtitle("Top 20 most injury causing event types")

Exact numbers of event types ordered by injuries:

fatal[1:20]
##                 EVTYPE fatalities injuries
##  1:            TORNADO       5633    91346
##  2:          TSTM WIND        504     6957
##  3:              FLOOD        470     6789
##  4:     EXCESSIVE HEAT       1903     6525
##  5:          LIGHTNING        816     5230
##  6:               HEAT        937     2100
##  7:          ICE STORM         89     1975
##  8:        FLASH FLOOD        978     1777
##  9:  THUNDERSTORM WIND        133     1488
## 10:               HAIL         15     1361
## 11:       WINTER STORM        206     1321
## 12:  HURRICANE/TYPHOON         64     1275
## 13:          HIGH WIND        248     1137
## 14:         HEAVY SNOW        127     1021
## 15:           WILDFIRE         75      911
## 16: THUNDERSTORM WINDS         64      908
## 17:           BLIZZARD        101      805
## 18:                FOG         62      734
## 19:   WILD/FOREST FIRE         12      545
## 20:         DUST STORM         22      440

Most Harmful Event Types with Respect to Economic Consequences

To calculate the economic consequences of different weather events, two variables are added together, PROPDMG and CROPDMG, which denote the damage in dollars caused to property and crop, respectively. According to the database documentation, these values should be scaled by the contents of the PROPDMGEXP and CROPDMGEXP fields. The meaning of these are:

  • Empty field = no scaling
  • K = thousands of dollars
  • M = millions of dollars
  • B = billions of dollars

However, by taking a look at the data it can be seen that there are other (undocumented) values as well.

table(data$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(data$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994
nrow(data[CROPDMGEXP %in% c("", "K", "M", "B"), ])/nrow(data)
## [1] 0.9999457
nrow(data[PROPDMGEXP %in% c("", "K", "M", "B"), ])/nrow(data)
## [1] 0.9996365

The small capital letters can be simply typing errors, but we cannot be totally sure about it, so the safest thing to do is to ignore them. The total number of the undefined values is so small that ignoring them is well justified.

The damage costs are multiplied by the given scale, and summarized over all the data by weather event type. After that, property and crop damage are added into a new variable holding the total damage cost of each event type.

data[PROPDMGEXP == "B", PROPDMG:=PROPDMG * 1e9]
data[PROPDMGEXP == "M", PROPDMG:=PROPDMG * 1e6]
data[PROPDMGEXP == "K", PROPDMG:=PROPDMG * 1e3]

data[CROPDMGEXP == "B", CROPDMG:=CROPDMG * 1e9]
data[CROPDMGEXP == "M", CROPDMG:=CROPDMG * 1e6]
data[CROPDMGEXP == "K", CROPDMG:=CROPDMG * 1e3]

costly <- data[, .(propdmg=sum(PROPDMG), cropdmg=sum(CROPDMG)), by=EVTYPE]
costly[, totaldmg:=propdmg+cropdmg]

The total costs are ordered by total damage value.

costly <- costly[order(-totaldmg)]
ggplot(costly[1:20], aes(x=reorder(EVTYPE, totaldmg), y=totaldmg/1e9)) + 
    geom_bar(stat="identity", fill="steelblue3") + coord_flip() +
    ylab("Total cost (billions of $)") +
    theme(axis.title.y=element_blank(), plot.title=element_text(face="bold")) +
    ggtitle("Top 20 most damaging event types in total")

Exact numbers of event types ordered by total cost:

costly[1:20]
##                        EVTYPE      propdmg     cropdmg     totaldmg
##  1:                     FLOOD 144657709807  5661968450 150319678257
##  2:         HURRICANE/TYPHOON  69305840000  2607872800  71913712800
##  3:                   TORNADO  56925660790   414953270  57340614060
##  4:               STORM SURGE  43323536000        5000  43323541000
##  5:                      HAIL  15727367053  3025537890  18752904943
##  6:               FLASH FLOOD  16140812067  1421317100  17562129167
##  7:                   DROUGHT   1046106000 13972566000  15018672000
##  8:                 HURRICANE  11868319010  2741910000  14610229010
##  9:               RIVER FLOOD   5118945500  5029459000  10148404500
## 10:                 ICE STORM   3944927860  5022113500   8967041360
## 11:            TROPICAL STORM   7703890550   678346000   8382236550
## 12:              WINTER STORM   6688497251    26944000   6715441251
## 13:                 HIGH WIND   5270046295   638571300   5908617595
## 14:                  WILDFIRE   4765114000   295472800   5060586800
## 15:                 TSTM WIND   4484928495   554007350   5038935845
## 16:          STORM SURGE/TIDE   4641188000      850000   4642038000
## 17:         THUNDERSTORM WIND   3483121284   414843050   3897964334
## 18:            HURRICANE OPAL   3152846020     9000010   3161846030
## 19:          WILD/FOREST FIRE   3001829500   106796830   3108626330
## 20: HEAVY RAIN/SEVERE WEATHER   2500000000           0   2500000000

Conclusions

We were able to present the most harmful weather event types with respect to both public health and economic consequences with high certainty, even though a minor part of the data had to be ignored due to missing documentation about the meaning of certain values found in the explanatory fields. There seems to be also some overlapping event types with quite similar looking names. However, in this study they were all treated as separate event types, since we cannot be 100% sure based on the available documentation about which of those event types actually are identical.