Tornados and Floods are the Most Harmful Weather Events to Population Health and Economy

Synopsis

In this paper we investigate which weather events are most harmful to both population health and economy. The analysis is based on the NOAA database that contains data from 1950 - 2011 (U.S. National Oceanic and Atmospheric Administration’s). Population health is estimated by the number of injuries and fatalities; impact on economy is estimated based on crop and propoerty damages. We found that tornados are most harmful to the population health and that floods have the greatest economic consequences.

Loading and Processing the Raw Data

library(ggplot2)
library(plyr)

We take the strom data from the Coursera webpage (link below) and extract the bzip2 file directly in R.

if(!"data_raw.csv.bz2" %in% dir("./")) {
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
  destfile = "./data_raw.csv.bz2")
}

if(!exists("data_raw")) {
  data_raw  <- read.csv("data_raw.csv.bz2")
}

As also stated on the Coursera page, documentation can be found here:

Quick Inspection of the raw data:

names(data_raw)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

For focus on data that is relevant for this analysis, i.e. the columns EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, and CROPDMGEXP.

data <- data_raw[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Quick Inspection of the data:

str(data)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

Damage value preparation

Property and crop damages are expressed as a combination of xxxDMG and xxxDMGEXP, i.e. a value and an exponent.

unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

We need to translage the exponent data before usage!

Translate function:

translateDMG <- function(f_exp) {
  if (f_exp %in% c("k", "K", "3")) { return(1e+03) }
  if (f_exp %in% c("m", "M", "6")) { return(1e+06) }
  if (f_exp %in% c("b", "B"))      { return(1e+09) }
  if (f_exp %in% c("0", ""))       { return(1e+00) }
  if (f_exp %in% c("1"))           { return(1e+01) }
  if (f_exp %in% c("2"))           { return(1e+02) }
  if (f_exp %in% c("4"))           { return(1e+04) }
  if (f_exp %in% c("5"))           { return(1e+05) }
  if (f_exp %in% c("7"))           { return(1e+07) }
  if (f_exp %in% c("8"))           { return(1e+08) }
  if (f_exp %in% c("-", "?", "+")) { return(0)     }  
  f_exp
}

Creating the new columns:

data$PROPDMGEXPtransl <- sapply(data$PROPDMGEXP, translateDMG)
data$CROPDMGEXPtransl <- sapply(data$CROPDMGEXP, translateDMG)
data$PROPDMGtransl <- data$PROPDMG * data$PROPDMGEXPtransl
data$CROPDMGtransl <- data$CROPDMG * data$CROPDMGEXPtransl

head(data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0           
##   PROPDMGEXPtransl CROPDMGEXPtransl PROPDMGtransl CROPDMGtransl
## 1             1000                1         25000             0
## 2             1000                1          2500             0
## 3             1000                1         25000             0
## 4             1000                1          2500             0
## 5             1000                1          2500             0
## 6             1000                1          2500             0

Event value preparation

As seen above, there are 985 different values for the factor variable EVTYPE. In the following we just take the most relevant 10 event types per analysis, respectively. For a more sophisticated approach it might be useful to create subgroups to reduce the number of possible types.

Results

To answer the question of which types of events are most harmful to population health, we combine the data of injuries and fatalities.

data$FAT_AND_INJ <- data$FATALITIES + data$INJURIES
health <- ddply(data, .(EVTYPE), function(x) data.frame(FAT_AND_INJ = sum(x$FAT_AND_INJ)))
health <- health[with(health, order(-FAT_AND_INJ)), ]
health <- health[1:10, ]
health$EVTYPE = factor(health$EVTYPE, levels = health$EVTYPE)
health
##                EVTYPE FAT_AND_INJ
## 830           TORNADO       96979
## 123    EXCESSIVE HEAT        8428
## 854         TSTM WIND        7461
## 164             FLOOD        7259
## 452         LIGHTNING        6046
## 269              HEAT        3037
## 147       FLASH FLOOD        2755
## 424         ICE STORM        2064
## 759 THUNDERSTORM WIND        1621
## 972      WINTER STORM        1527

We show a bar plot of the top 10 in descending order.

p <- ggplot(data = health, aes(x = EVTYPE, y = FAT_AND_INJ))
p <- p + geom_bar(stat = "identity")
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p <- p + xlab("Event type") 
p <- p + ylab("Fatalities and Injuries") 
p <- p + ggtitle("Top 10 events for number of fatalities and injuries")
p

To answer the question of which types of events are causing the most damage, we combine the available data for crop and property damage in a similar fashion.

data$PROP_AND_CROP <- data$PROPDMGtransl + data$CROPDMGtransl
damage <- ddply(data, .(EVTYPE), function(x) data.frame(PROP_AND_CROP = sum(x$PROP_AND_CROP)))
damage <- damage[with(damage, order(-PROP_AND_CROP)), ]
damage <- damage[1:10, ]
damage$EVTYPE = factor(damage$EVTYPE, levels = damage$EVTYPE)
damage
##                EVTYPE PROP_AND_CROP
## 164             FLOOD  150319678257
## 406 HURRICANE/TYPHOON   71913712800
## 830           TORNADO   57362333886
## 666       STORM SURGE   43323541000
## 238              HAIL   18761221566
## 147       FLASH FLOOD   18243991078
## 88            DROUGHT   15018672000
## 397         HURRICANE   14610229010
## 586       RIVER FLOOD   10148404500
## 424         ICE STORM    8967041360

We show a bar plot of the top 10 in descending order.

p <- ggplot(data = damage, aes(x = EVTYPE, y = PROP_AND_CROP))
p <- p + geom_bar(stat = "identity")
p <- p + theme(axis.text.x = element_text(angle = 90, hjust = 1))
p <- p + xlab("Event type") 
p <- p + ylab("Damage in USD (combined property and crop)") 
p <- p + ggtitle("Top 10 events for combined crop and property damage")
p

Conclusion

Tornados are most harmful with respect to population health as measured in combined number of injuries and fatalities. According to the data, flood events have the greatest economic consequences.