Synopsis: Using NOAA's storm database, we take a look at what the most damaging to the US in terms of health impacts (fatalities and injuries) and economic losses (property and crop damage) from the years 1996 through 2011. This range of years presents the most complete historical weather event reporting. Individual event types are grouped into categories following the style of the National Weather Service.
From both humanitarian and capitalist points of view, convective severe weather events (tornados, thunderstorms etc.) are very damaging on a consistent year-to-year basis. Tropical cyclones are a greater destroyer of the economy, with a harm pattern dominated by sporadic, highly damaging events in contrast to the steady destruction of convection events.
Data reading
In a somewhat vain effort to make R read in data at what modern humans might consider an acceptable speed, we'll load only the columns that we'll be using. These are the event beginning date, the event type, injuries, fatalities, and economic cost information. (The number of rows in the csv file was determind on the command line.) We'll further reduce the dataset down to those events that incur either public health or economic costs. Furthermore, we'll also extract the year from the data information so that we can consider only dates from 1996 onwards; per this site, these are the only years with full recording of the currently used event types.
classes = rep("NULL", 37)
classes[2] = "character"
classes[8] = "factor"
classes[c(23, 24, 25, 27)] <- "numeric"
classes[c(26, 28)] <- "character"
rawData <- read.csv("repdata-data-StormData.csv.bz2", header = TRUE, nrows = 1232705,
colClasses = classes)
damaging <- subset(rawData, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG >
0)
# working copy of intermediate data product
# classes=c('character','factor','numeric','numeric','numeric','character','numeric','character')
# damaging <- read.csv('damagingData.csv', colClasses = classes)
# convert the start date to a date type, extract the year as a number
# restrict to events after 1996 inclusive as described above
damaging$BGN_DATE <- as.Date(damaging$BGN_DATE, "%m/%d/%Y")
damaging$year <- as.numeric(format(damaging$BGN_DATE, "%Y"))
damaging <- damaging[damaging$year > 1995, ]
Data massage
First we'll lowercase the raw columns that we'll use for final analysis, then we'll convert the economic costs into dollars. With the restrictions on the data range imposed above, at this point all of the exponent information is one of K, M, or B; there are no strange or missing values, each non-zero value of *ROPDMG
is matched to a standard *ROPDMGEXP
.
# make a copy of EVTYPE in lower case
damaging$evtype <- tolower(damaging$EVTYPE)
# lower case some of the column names
colnames(damaging)[3] <- "fatalities"
colnames(damaging)[4] <- "injuries"
# create base orders of magnitude for property and crop damage
damaging$propCost <- rep(0, nrow(damaging))
damaging$cropCost <- rep(0, nrow(damaging))
damaging$propCost[damaging$PROPDMGEXP == "K"] <- 1000
damaging$propCost[damaging$PROPDMGEXP == "M"] <- 1e+06
damaging$propCost[damaging$PROPDMGEXP == "B"] <- 1e+09
damaging$cropCost[damaging$CROPDMGEXP == "K"] <- 1000
damaging$cropCost[damaging$CROPDMGEXP == "M"] <- 1e+06
damaging$cropCost[damaging$CROPDMGEXP == "B"] <- 1e+09
# multiply the orders of magnitude by coefficients at this point every
# *ROPDMGEXP is matched to one of 'K', 'M', or 'B' in its appropriate
# *ROPDMGEXP. There is no need to impute any data for strange or missing
# values of the exponents.
damaging$cropCost <- damaging$cropCost * damaging$CROPDMG
damaging$propCost <- damaging$propCost * damaging$PROPDMG
There's quite a bit of fragmentation in the event types. For example, there are several categories of thunderstroms and their winds. Following the lead of the National Weather Service (e.g. this report), let's group the event types into 7 categories:
To trim the multitude of reported event types to a manageable number, we'll restrict the event types to those that result in a total, from 1996 onwards, of one of:
This subset of event types is then manually categorized to create a new factor 'evgroup':
crop <- tapply(damaging$cropCost, damaging$evtype, sum)
cropNames <- names(crop[!is.na(crop) & crop > 1e+06])
prop <- tapply(damaging$propCost, damaging$evtype, sum)
propNames <- names(prop[!is.na(prop) & prop > 1e+06])
inj <- tapply(damaging$injuries, damaging$evtype, sum)
injNames <- names(inj[!is.na(inj) & inj > 10])
fat <- tapply(damaging$fatalities, damaging$evtype, sum)
fatNames <- names(fat[!is.na(fat) & fat > 10])
allNames <- union(union(fatNames, injNames), union(cropNames, propNames))
weatherGroups <- vector("list", 7)
names(weatherGroups) <- c("convection", "extremeTemperatures", "flood", "maritime",
"tropicalCyclones", "winter", "other")
weatherGroups$convection <- c("tornado", "high wind", "gusty winds", "lightning",
"thunderstorm wind", "wind", "hail", "tstm wind/hail", "dry microburst",
"tstm wind", "strong wind", "strong winds", "small hail")
weatherGroups$extremeTemperatures <- c("excessive heat", "extreme cold", "cold and snow",
"unseasonably warm", "freeze", "damaging freeze", "unseasonably cold", "hard freeze",
"agricultural freeze", "early frost", "frost/freeze", "extreme windchill",
"cold", "cold/wind chill", "extreme cold/wind chill", "heat", "heat wave")
weatherGroups$flood <- c("urban/sml stream fld", "river flooding", "river flood",
"erosion/cstl flood", "coastal flooding/erosion", "coastal flooding/erosion",
"coastal flooding", "coastal flood", "flood", "flash flood")
weatherGroups$maritime <- c("rip current", "rip currents", "high surf", "heavy surf/high surf",
"tsunami", "marine strong wind", "marine tstm wind", "heavy rain/high surf",
"astronomical high tide", "waterspout", "marine thunderstorm wind", "storm surge",
"heavy surf", "storm surge/tide")
weatherGroups$tropicalCyclones <- c("hurricane/typhoon", "tropical storm", "typhoon",
"hurricane")
weatherGroups$winter <- c("blizzard", "winter weather", "wintry mix", "excessive snow",
"avalanche", "heavy snow", "ice storm", "winter weather/mix", "snow squall",
"black ice", "freezing drizzle", "lake-effect snow", "winter storm", "winter weather mix",
"mixed precip", "icy roads", "snow", "light snow", "glaze")
weatherGroups$other <- c("heavy rain", "wild/forest fire", "wildfire", "dust storm",
"dust devil", "fog", "drought", "landslide", "dense fog", "other")
damaging$evgroup[damaging$evtype %in% weatherGroups$convection] <- 0
damaging$evgroup[damaging$evtype %in% weatherGroups$extremeTemperatures] <- 1
damaging$evgroup[damaging$evtype %in% weatherGroups$flood] <- 2
damaging$evgroup[damaging$evtype %in% weatherGroups$maritime] <- 3
damaging$evgroup[damaging$evtype %in% weatherGroups$tropicalCyclones] <- 4
damaging$evgroup[damaging$evtype %in% weatherGroups$winter] <- 5
damaging$evgroup[damaging$evtype %in% weatherGroups$other] <- 6
damaging$evgroup <- factor(damaging$evgroup, levels = c(0, 1, 2, 3, 4, 5, 6),
labels = c("convection", "extreme temperatures", "flood", "maritime", "tropical cyclones",
"winter", "other"))
Now, convert the economic costs to billions of dollars and create a total economic cost for each event, and create aggregate totals of injuries, fatalities, and costs broken down by event group and year.
# convert costs to billions of dollars and create total cost
damaging$cropCost <- damaging$cropCost/1e+09
damaging$propCost <- damaging$propCost/1e+09
damaging["totalCost"] <- damaging$cropCost + damaging$propCost
yearlyTotals <- aggregate(cbind(injuries, fatalities, propCost, cropCost, totalCost) ~
year + evgroup, sum, data = damaging)
totals <- aggregate(cbind(injuries, fatalities, propCost, cropCost, totalCost) ~
evgroup, sum, data = damaging)
First we'll simply report the total damage across all years (1996-2011 inclusive), in decreasing order of total economic cost:
require(xtable)
## Loading required package: xtable
# kable(totals)
totals2 <- totals[order(totals[6], decreasing = TRUE), ]
colnames(totals2) <- c("event group", "injuries", "fatalities", "property damage (G$)",
"crop damage (G)", "total damage (G)")
print(xtable(totals2), type = "html", include.rownames = FALSE)
event group | injuries | fatalities | property damage (G$) | crop damage (G) | total damage (G) |
---|---|---|---|---|---|
flood | 8519.00 | 1337.00 | 159.76 | 6.35 | 166.11 |
tropical cyclones | 1664.00 | 182.00 | 89.36 | 6.03 | 95.39 |
convection | 32157.00 | 2914.00 | 53.29 | 4.50 | 57.79 |
maritime | 970.00 | 761.00 | 48.11 | 0.00 | 48.11 |
other | 3016.00 | 300.00 | 9.74 | 14.52 | 24.26 |
winter | 3757.00 | 750.00 | 6.41 | 0.12 | 6.53 |
extreme temperatures | 7832.00 | 2418.00 | 0.06 | 3.21 | 3.27 |
Table 1 Total fatility, injury, and economic damage due to the severe weather categories for the years 1996-2011.
From a population health standpoint, convection events are the most destructive category. Note that winter storms and extreme temperature events tend to be much more damaging to humans than to crops and property.
Economically, at face value flooding is the most destructive category, with tropical cyclones a distant second place. However, I have my doubts about one particular critical datapoint.
We should look at the year-to-year variation in the destruction:
require(lattice)
## Loading required package: lattice
xyplot((injuries + fatalities) ~ year | evgroup, data = yearlyTotals, panel = function(x,
y, ...) {
panel.xyplot(x, y, ...)
panel.abline(h = median(y), col.line = "skyblue3", lwd = 2)
panel.abline(h = 500, col.line = "grey")
})
Figure 1 Yearly public health incidents from 1996 onwards by event group. Blue circles: yearly total; blue line: median value; gray line: 500 public health events.
The gray lines show 500 injuries plus fatalities per year. Convective events are consistently the most harmful weather phenomenon, i.e. the total over the time period considered is not driven by a single bad year. Note that this is not the case for some other types of event, such as floods and extreme temperatures, where the distribution is bimodal with particularly bad years periodically punctuating a lower level of destruction.
require(lattice)
xyplot(totalCost ~ year | evgroup, data = yearlyTotals, panel = function(x,
y, ...) {
panel.xyplot(x, y, ...)
panel.abline(h = median(y), col.line = "skyblue3", lwd = 2)
panel.abline(h = 0.25, col.line = "grey")
}, ylab = "total economic cost / gigadollars")
Figure 2 Yearly economic cost, in billions of dollars, from 1996 onwards by event group. Blue circles: yearly total; blue line: median value; gray line: 250 million dollars.
Here there is an obvious outlying point for floods, the year 2006 with about 120 billion dollars of damage. Further investigation reveals that this is driven almost entirely by one event, flooding over the new year of 2005-2006 in Napa Valley, CA. Other estimates of this flood suggest damages of 300 million dollars, rather than the (frankly unbelievable) 115 billion in the dataset. Assume for the moment that rather than “B”, an “M” should have been entered in the dataset for this event. We get a revised table of damages:
require(xtable)
# kable(totals)
totals3 <- totals
# subtract off that huge event, put back in as millions instead of billions
totals3[3, 6] <- totals3[3, 6] - 115 + 0.115
totals3 <- totals3[order(totals3[6], decreasing = TRUE), ]
colnames(totals2) <- c("event group", "injuries", "fatalities", "property damage (G$)",
"crop damage (G)", "total damage (G)")
print(xtable(totals3), type = "html", include.rownames = FALSE)
evgroup | injuries | fatalities | propCost | cropCost | totalCost |
---|---|---|---|---|---|
tropical cyclones | 1664.00 | 182.00 | 89.36 | 6.03 | 95.39 |
convection | 32157.00 | 2914.00 | 53.29 | 4.50 | 57.79 |
flood | 8519.00 | 1337.00 | 159.76 | 6.35 | 51.22 |
maritime | 970.00 | 761.00 | 48.11 | 0.00 | 48.11 |
other | 3016.00 | 300.00 | 9.74 | 14.52 | 24.26 |
winter | 3757.00 | 750.00 | 6.41 | 0.12 | 6.53 |
extreme temperatures | 7832.00 | 2418.00 | 0.06 | 3.21 | 3.27 |
Table 1 Revised total fatility, injury, and economic damage due to the severe weather categories for the years 1996-2011.
This seems more believable, but calls into question the integrity of the whole dataset. Nevertheless, we base the conclusions off of this table: tropical cyclones are the most destructive event type in terms of property and crop damage. Note, however, that from a consistent year-to-year standpoint convection events are the largest destroyer of human possessions; focussing on convection event mitigation would have a large impact on the average health of both the population and the economy.
Summary: convective events, whether one is operating out of humanitarian concern or cold, naked capilistic interest, are bad news. Economically, tropical cyclones are a more significant source of harm, driven by sporadic highly destructive events rather than a sustained year-to-year level of destruction.