Synopsis

In this report we analyze the human and economic costs of natural disasters in the US. We aim to illustrate that certain disasters carry heavier social costs, and from this perspective, public resources might be best targeted toward mitigating the costs of these disasters. We leverage NOAA data and perform our own mappings of storm types to facilitate summary analysis. We measure human costs using data on storm-related fatalities and injuries, and economic costs based on NOAA-calcuated figures on property and crop damages. We conclude from these data that extreme temperatures generate the greatest average human cost, while hurricanes generate the greatest average economic cost and the second greatest human cost. The data further illustrate that tropical storms, which manifest in a manner similar to hurricanes, are among the next highest in terms of human and economic costs and could benefit from policies or resources that aim to mitigate the social costs of hurricanes.

Data Processing

We leverage Storm Data collected by NOAA to document natural disasters and their consequences.

Reading in the data

if(!require(plyr)) {install.packages("plyr"); library(plyr)}
## Loading required package: plyr
if(!require(dplyr)) {install.packages("dplyr"); library(dplyr)}
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:plyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
if(!require(knitr)) {install.packages("knitr"); library(knitr)}
## Loading required package: knitr
# download NOAA data if not already in working directory
if (!file.exists("./StormData.csv.bz2")) {
      src <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
      download.file(url = src, destfile = "./StormData.csv.bz2")
}
if(!file.exists("./StormData.csv")) { bunzip2("./StormData.csv.bz2") }

data <- read.csv("./StormData.csv")

Cleaning the Data

Much of the data is not relevant for our analysis. Since we are interested in analyzing human and economic damages, we will first subset the data to only include records where fatalities, injuries, property damage or crop damage have occurred.

subs <- filter(data, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

# also select back the exponent variables to correctly scale damages
cdata <- select(subs, one_of("EVTYPE","FATALITIES","INJURIES","PROPDMG",
                             "PROPDMGEXP","CROPDMG","CROPDMGEXP"))

Next, we need to create new property and crop damage fields with a common denomination (i.e., $000’s) based on the corresponding exponent values.

We’ll first review the property damage exponent field to see how this field should be interpreted.

summarise(group_by(cdata, PROPDMGEXP), SumPropDmg = sum(PROPDMG), 
          Count = sum(!is.na(PROPDMG)), Avg = mean(PROPDMG))
## Source: local data frame [16 x 4]
## 
##    PROPDMGEXP  SumPropDmg  Count         Avg
## 1                  527.41  11585  0.04552525
## 2           -       15.00      1 15.00000000
## 3           +      117.00      5 23.40000000
## 4           0     7108.30    210 33.84904762
## 5           2       12.00      1 12.00000000
## 6           3       20.00      1 20.00000000
## 7           4       14.50      4  3.62500000
## 8           5      210.50     18 11.69444444
## 9           6       65.00      3 21.66666667
## 10          7       82.00      3 27.33333333
## 11          B      275.85     40  6.89625000
## 12          h        2.00      1  2.00000000
## 13          H       25.00      6  4.16666667
## 14          K 10735292.10 231428 46.38717917
## 15          m       38.90      7  5.55714286
## 16          M   140694.45  11320 12.42883834

A significant number (11,585) of records do not have any value to indicate the exponent. Since the average damage amount is extremely low, we will assume that no exponent should be applied. 6 other records have an exponent value of “-” or “+”. Since there are so few instances of these exponent values, we will also assume no transformation is necessary.

We create a new field representing the property damage value, denominated in $000’s:

# convert from factor to character to enable simple differentiation between
# numeric & non-numeric
cdata$PROPDMGEXP <- as.character(cdata$PROPDMGEXP)

# define new exponent value to apply for transformation
cdata$StdPropExp <- as.numeric(cdata$PROPDMGEXP)
## Warning: NAs introduced by coercion
# resolve NAs for non-numeric values
cdata[cdata$PROPDMGEXP == "H" | cdata$PROPDMGEXP == "H",]$StdPropExp <- 2
cdata[cdata$PROPDMGEXP == "K" | cdata$PROPDMGEXP == "k",]$StdPropExp <- 3
cdata[cdata$PROPDMGEXP == "M" | cdata$PROPDMGEXP == "m",]$StdPropExp <- 6
cdata[cdata$PROPDMGEXP == "B" | cdata$PROPDMGEXP == "b",]$StdPropExp <- 9
# set all others to 0
cdata[is.na(cdata$StdPropExp),]$StdPropExp <- 0

# create new field standardizing value in thousands
cdata <- mutate(cdata, StdPropDmg = PROPDMG * 10 ^ StdPropExp / 1000)

Rinse, and repeat for the crop damage exponent field:

summarise(group_by(cdata, CROPDMGEXP), SumCropDmg = sum(CROPDMG), 
          Count = sum(!is.na(CROPDMG)), Avg = mean(CROPDMG))
## Source: local data frame [8 x 4]
## 
##   CROPDMGEXP SumCropDmg  Count          Avg
## 1                 11.00 152664 7.205366e-05
## 2          ?       0.00      6 0.000000e+00
## 3          0     260.00     17 1.529412e+01
## 4          B      13.61      7 1.944286e+00
## 5          k     436.00     21 2.076190e+01
## 6          K 1342955.91  99932 1.343870e+01
## 7          m      10.00      1 1.000000e+01
## 8          M   34140.80   1985 1.719940e+01

Here there appears to be more standardization, although we still find 152,664 values with no exponent value, and 6 with a value of “?”. Again, we will assume these values are in single-dollar denominations (i.e., an exponent of 0)

We create a new field representing the property damage value, denominated in $000’s:

# convert from factor to character to enable simple differentiation between
# numeric & non-numeric
cdata$CROPDMGEXP <- as.character(cdata$CROPDMGEXP)

# define new exponent value to apply for transformation
cdata$StdCropExp <- as.numeric(cdata$CROPDMGEXP)
## Warning: NAs introduced by coercion
# resolve NAs for non-numeric values
cdata[cdata$CROPDMGEXP == "K" | cdata$CROPDMGEXP == "k",]$StdCropExp <- 3
cdata[cdata$CROPDMGEXP == "M" | cdata$CROPDMGEXP == "m",]$StdCropExp <- 6
cdata[cdata$CROPDMGEXP == "B" | cdata$CROPDMGEXP == "b",]$StdCropExp <- 9
# set all others to 0
cdata[is.na(cdata$StdCropExp),]$StdCropExp <- 0

# create new field standardizing value in thousands
cdata <- mutate(cdata, StdCropDmg = CROPDMG * 10 ^ StdCropExp / 1000)

Now that we’ve standardized all of the damage amounts, we can create a variable containing the sum of both:

cdata <- mutate(cdata, TotalDamage = StdPropDmg + StdCropDmg)

Our second and final transformation aims to logically standardize the storm type variable (i.e., EVTYPE) to support meaningful analysis of storm severity.

numtypes <- length(unique(cdata$EVTYPE))

There are a total of 488 distinct EVTYPES. We will standardize the values into the following 9 categories: snow, hurricane, other tropical storm, thunderstorm, hail, cold weather, flooding (including tsunamis and other coastal surge/flooding), tornados and other wind, and all other.

#1) snow & winter storms
snow <- grepl("snow|blizzard|ic[ey]|glaze|sleet|freezing [rd]|wint|mix", 
              cdata$EVTYPE, ignore.case = TRUE)

#2) hurricanes & typhoons
hurricane <- grepl("hurricane|typhoon", cdata$EVTYPE, ignore.case = TRUE) &
      !snow

#3) other tropical (storm or depression)
tropical <- grepl("tropical", cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane

#4) thunderstorms & other heavy rain
tstorm <- grepl("tstm|thunderstorm|rain|shower|downburst|lightning|precip", 
                cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane & !tropical

#5) hail
hail <- grepl("hail", cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane & !tropical & !tstorm

#6) extreme temperatures
cold <- grepl("cold|chill|freeze|hypotherm|frost|low temp|cool|exposure|heat|warm", 
              cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane & !tropical & !tstorm & !hail

#7) tsunami, flooding & coastal surge
flood <- grepl(paste("flood|coast|current|tsunami|seiche|",
                     "seas|wave|surf|tide|high water|swells|surge"), 
              cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane & !tropical & !tstorm & !hail & !cold

#8) tornado & other wind
tornado <- grepl("tornado|wind|gustnado|dust|torndao", 
                 cdata$EVTYPE, ignore.case = TRUE) &
      !snow & !hurricane & !tropical & !tstorm & !hail & !cold & !flood

#first add in the "all other bucket"
cdata$StormType <- "Other"

#now input all remaining types based on the logical vectors we created above
cdata[snow,]$StormType <- "Snow"
cdata[hurricane,]$StormType <- "Hurricane"
cdata[tropical,]$StormType <- "Other Tropical"
cdata[tstorm,]$StormType <- "Thunderstorm"
cdata[hail,]$StormType <- "Hail"
cdata[cold,]$StormType <- "Extreme Temp"
cdata[flood,]$StormType <- "Flood and Coastal"
cdata[tornado,]$StormType <- "Tornado and Other Wind"


#review the total allocations:
tally(group_by(cdata,StormType))
## Source: local data frame [9 x 2]
## 
##                StormType      n
## 1           Extreme Temp   1645
## 2      Flood and Coastal  33601
## 3                   Hail  26166
## 4              Hurricane    233
## 5                  Other   3103
## 6         Other Tropical    456
## 7                   Snow   5117
## 8           Thunderstorm 134282
## 9 Tornado and Other Wind  50030

Most of the storms fall into our classification of thunderstorm (including all other non-freezing rain events), which seems intuitively reasonable given the relatively higher frequency of less severe thunder storms.

Results

We’ll first review summary data of fatalities, injuries and damages across our summarized bucketing of storm types. Note that fatalities and injuries remain separate, versus property and crop damages which have been combined, under the assumption that the social impact of a fatality is substantially higher than an injury and therefore warrants differentiation. Note also that, as indicated in the data processing section, the damages figures are displayed in thousands ($’000s).

summary.stats <- summarise(group_by(cdata, StormType), 
                           Total.Events = sum(!is.na(StormType)),
                           Avg.Fatalities = round(mean(FATALITIES),2), 
                           Max.Fatalities = max(FATALITIES),
                           Avg.Injuries = round(mean(INJURIES),2), 
                           Max.Injuries = max(INJURIES),
                           Avg.Damages = round(mean(TotalDamage),2),
                           Max.Damages = max(TotalDamage))

# print summary with unconstrained number of columns
print(summary.stats, width = Inf)
## Source: local data frame [9 x 8]
## 
##                StormType Total.Events Avg.Fatalities Max.Fatalities
## 1           Extreme Temp         1645           2.22            583
## 2      Flood and Coastal        33601           0.07             32
## 3                   Hail        26166           0.00              4
## 4              Hurricane          233           0.58             15
## 5                  Other         3103           0.16             14
## 6         Other Tropical          456           0.14             22
## 7                   Snow         5117           0.13             14
## 8           Thunderstorm       134282           0.01             25
## 9 Tornado and Other Wind        50030           0.12            158
##   Avg.Injuries Max.Injuries Avg.Damages Max.Damages
## 1         5.82          519     2812.26      596000
## 2         0.29          800     6805.47   115032500
## 3         0.05          109      727.07     1800000
## 4         5.72          780   390010.85    16930000
## 5         0.99          150     7903.02     1500000
## 6         0.84          200    18445.23     5150000
## 7         1.25         1568     3472.83     5000500
## 8         0.11           70      142.83     2500000
## 9         1.87         1700     1284.25     2800000

We can see here that extreme temperature category has by far the highest average and maximum per-event fatalities and injuries, followed by hurricanes and typhoons. Snow, tornados and other wind disasters, on the other hand appear to have caused more extreme levels of injuries. Hurricanes have by far the highest average damages per event, while flooding and coastal disasters have the highest single-occurrence of damages. This data suggests that deploying resources toward the mitigation of loss of life for extreme temperatures and hurricanes could have the greatest potential social benefit.

To gain a more nuanced view of the relative human and economic costs of disasters, we will exclude hurricanes and extreme temperature and plot average fatalities versus average damages and average injuries versus average damages by the remaining types.

storms <- as.factor(summary.stats[summary.stats$StormType != "Extreme Temp" & 
                         summary.stats$StormType != "Hurricane", ]$StormType)

par(mfrow = c(1, 2), mar = c(4, 5, 2, 1))
with(summary.stats[summary.stats$StormType != "Extreme Temp" & 
                         summary.stats$StormType != "Hurricane", ],
      plot(Avg.Fatalities, Avg.Damages, col = as.factor(StormType), pch = 19,
           xlab = "Average Fatalities per Event", 
           ylab = "Average Damages per Event"))
legend("topleft",legend = storms, pch = 19, 
       col = storms, cex = .5)


with(summary.stats[summary.stats$StormType != "Extreme Temp" & 
                         summary.stats$StormType != "Hurricane", ],
plot(Avg.Injuries, Avg.Damages, col = as.factor(StormType), pch = 19,
     xlab = "Average Injuries per Event", ylab = "Average Damages per Event"))

The plots above make clear that tropical storms and depressions also carry high both human and economic cost. These storms are of a similar nature to hurricanes, but of a lesser magnitude, which suggests that policies geared toward mitigating the loss of life, injury and economic damages of hurricanes might reap the greatest degree of social benefit.