Economic and Population Health Effects of Severe Storms

(United States, 1975-2011)

This document examines NOAA data on Severe Weather Events and attempts to ascertain what kinds of severe weather cause the most economic and population health damage. The data is fairly messy, so a decent amount of pre-processing and cleaning must be done before we can gain much insight from the data. This cleaning is detailed below, then the data is split into an sdh data set containing the data relevant to population health, and an sdp data set containing the data relevant to economic damage. Then, in the Results section, we examine total damage by event type for each data set.

Loading The Data

The raw data and documentation can be found and downloaded and unzipped by running the following code. NOTE: you will need the R.utils package, which can be installed through CRAN.

download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf", "StormDocumentation.pdf")
library(R.utils)
bunzip2("StormData.csv.bz2")

Then the data can be loaded in R as follows.

stormData <- read.csv("StormData.csv")

Processing The Data

Date

The BGN_DATE column is a factor variable. We will convert it to an object of class “Date” and put the data in chronological order.

stormData$BGN_DATE <- as.character(stormData$BGN_DATE)
stormData$BGN_DATE <- as.Date(stormData$BGN_DATE, "%m/%d/%Y")
stormData <- stormData[order(stormData$BGN_DATE),]

The following is a histogram which shows the number of severe weather events documented in each calendar year. We are looking at the first 300,000 entries, which takes us from January 1950 through July 1997.

From this we can see that the number of events reported in this data set rises sharply beginning in the mid 1970’s. For this reason, we will subset the data to include only those reports from 1975 and later. This will give us a more accurate picture because it will filter out data from the earlier years when reporting was much more scattered and therefore cannot be counted on to be fully representitive.

sd <- stormData[54960:dim(stormData)[1],]
head(sd[,1:5])

##        STATE__   BGN_DATE BGN_TIME TIME_ZONE COUNTY
## 171797      49 1975-01-04     1030       CST     45
## 171798      49 1975-01-04     1040       CST     35
## 171799      49 1975-01-04     1050       CST     11
## 61785       22 1975-01-07     1532       CST    113
## 61786       22 1975-01-07     1930       CST     55
## 155753      48 1975-01-07     0958       CST    245

tail(sd[,1:5])

##        STATE__   BGN_DATE    BGN_TIME TIME_ZONE COUNTY
## 902209       6 2011-11-30 12:40:00 PM       PST    520
## 902260       6 2011-11-30 10:00:00 AM       PST    506
## 902266      30 2011-11-30 02:00:00 PM       MST     63
## 902270      30 2011-11-30 06:00:00 PM       MST     56
## 902271      56 2011-11-30 04:00:00 PM       MST     99
## 902293      56 2011-11-30 10:30:00 PM       MST      7

Event Type

For this analysis, we will be examining what types of severe weather events caused the most damage. Unfortunately, the EVTYPE variable is rather messy. We will clean it up by adding another variable called event and putting broader, more useful, labels for each event into that variable.

levels(sd$EVTYPE) <- tolower(levels(sd$EVTYPE))
sd$event <- "other"
sd$event[grep("lightning", sd$EVTYPE)] <- "lightning"
sd$event[grep("thunder|rain", sd$EVTYPE)] <- "heavy rain"
sd$event[grep("snow|blizzard|chill|wint|cold|sleet|ice|freez", sd$EVTYPE)] <- "winter storm"
sd$event[grep("hail", sd$EVTYPE)] <- "hail"
sd$event[grep("seas|surf|tide|swell", sd$EVTYPE)] <- "coastal event"
sd$event[grep("tornado|gustnado|funnel|spout", sd$EVTYPE)] <- "tornado"
sd$event[grep("fire|smoke", sd$EVTYPE)] <- "fire"
sd$event[grep("wind", sd$EVTYPE)] <- "heavy wind"
sd$event[grep("volcan", sd$EVTYPE)] <- "volcano"
sd$event[grep("flood|stream", sd$EVTYPE)] <- "flood"
sd$event[grep("heat|temp|hot", sd$EVTYPE)] <- "heat wave"
sd$event[grep("dry|drought", sd$EVTYPE)] <- "drought"
sd$event[grep("microburst", sd$EVTYPE)] <- "microburst"
sd$event[grep("tropical", sd$EVTYPE)] <- "tropical storm"
sd$event[grep("hurricane|cyclone|surge", sd$EVTYPE)] <- "hurricane"
sd$event <- as.factor(sd$event)

Subsetting The Data

By looking at the column names of our sd data set, and consulting the documentation provided, we conclude that the columns related to property damage are 25 through 28 and those related to population health are 23 and 24.

names(sd)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"     "event"

We will then create two data sets, subsetting out those two groups, as well as our recently created event variable.

sdp <- sd[, c(2, 25:28, 38)]
sdh <- sd[, c(2, 23, 24, 38)]

Calculating Total Economic Damage

We have to do some cleaning before we can calculate the Crop Damage and Property Damage properly. We learn from the page 12 of the documentation that the DMG variables represent US Dollar amounts rounded to three significant digits and that the EXP variables should contain either “K”, “M”, or “B” to signify the order of magnitude of the number being “thousand”, “million”, or “billion” respectively. However, by doing a summary of those variables, we note that there are a number of other values as well.

summary(sdp$CROPDMGEXP)

##             ?      0      2      B      k      K      m      M 
## 563454      7     19      1      9     21 281832      1   1994

summary(sdp$PROPDMGEXP)

##             -      ?      +      0      1      2      3      4      5 
## 428242      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 408319      7  10409

None of these values are mentioned in the documentation. However, we notice that none of the other values are present in even 0.01% of the rows. This fact, coupled with their lack of explanation in the documentation, leads us to decide that they are both misleading and statistically insignificant, so we will purge them from the data set. The following code does this, after first transforming all letter characters to capitals for ease of processing.

sdp$CROPDMGEXP <- toupper(as.character(sdp$CROPDMGEXP))
sdp$PROPDMGEXP <- toupper(as.character(sdp$PROPDMGEXP))
sdp <- sdp[-grep("[^K|M|B]", sdp$CROPDMGEXP),]
sdp <- sdp[-grep("[^K|M|B]", sdp$PROPDMGEXP),]

Now that the unsavory factors have been eliminated, and all of the remaining characters in the EXP columns have been converted to upper case, we can calculate the actual damage amounts in US Dollars.

sdp$propmult <- 1
sdp$propmult[sdp$PROPDMGEXP == "K"] <- 1000
sdp$propmult[sdp$PROPDMGEXP == "M"] <- 1000000
sdp$propmult[sdp$PROPDMGEXP == "B"] <- 1000000000
sdp$PROPDMG <- sdp$PROPDMG*sdp$propmult
sdp$cropmult <- 1
sdp$cropmult[sdp$CROPDMGEXP == "K"] <- 1000
sdp$cropmult[sdp$CROPDMGEXP == "M"] <- 1000000
sdp$cropmult[sdp$CROPDMGEXP == "B"] <- 1000000000
sdp$CROPDMG <- sdp$CROPDMG*sdp$cropmult

Finally we will combine PROPDMG and CROPDMG into a variable called totaldmg which we will use to compare different event types’ economic damge.

sdp$totaldmg <- sdp$CROPDMG+sdp$PROPDMG

Before we go any further, we will check the totaldmg variable to make sure we don’t have any unexplained outliers. We do this by looking at the Top 10 totaldmg events make sure they all seem realistic.

sdpo <- sdp[,c(1,2,4,6,9)]
sdpo <- sdpo[order(sdpo$totaldmg, decreasing=TRUE),]
rownames(sdpo) <- 1:nrow(sdpo)
top10p <- sdpo[1:10,]
top10p

##      BGN_DATE   PROPDMG  CROPDMG          event     totaldmg
## 1  2006-01-01 1.150e+11 3.25e+07          flood 115032500000
## 2  2005-08-29 3.130e+10 0.00e+00      hurricane  31300000000
## 3  2005-08-28 1.693e+10 0.00e+00      hurricane  16930000000
## 4  2005-08-29 1.126e+10 0.00e+00      hurricane  11260000000
## 5  1993-08-31 5.000e+09 5.00e+09          flood  10000000000
## 6  2005-10-24 1.000e+10 0.00e+00      hurricane  10000000000
## 7  2005-08-29 5.880e+09 1.51e+09      hurricane   7390000000
## 8  2005-08-28 7.350e+09 0.00e+00      hurricane   7350000000
## 9  2004-08-13 5.420e+09 2.85e+08      hurricane   5705000000
## 10 2001-06-05 5.150e+09 0.00e+00 tropical storm   5150000000

Unfortunately, things seem strange. There is one event - on January 1, 2006 - that registers over $115 Billion in damage. This seems unlikely, especially when we rack our memories and can’t remember any catastrophic weather events happening on New Year’s Day less than 10 years ago.

We go back to the sd data set which still has a REMARKS variable, to see if we can learn anything.

Napa <- sd[sd$BGN_DATE == "2006-01-01",]
Napa <- Napa[23,]
Napa[,c(25:28, 36)]

##        PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 605953     115          B    32.5          M
##                                                                                                                                                                                                                                                                                                                                                                                               REMARKS
## 605953 Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.

And here is the event: a flood in Napa, CA. From the remarks, we can see that it did “at least $70 million” in property damage. This is significant, but nowhere near the $115 Billion that’s reflected in the PROPDMG AND PROPDMGEXP variables. We have to assume that this was a typo, the most likely situation being that someone typed a “B” instead of an “M” in the CROPDMGEXP column. This is fixed with the code below and we move on.

sdp$PROPDMG[sdp$totaldmg==115032500000] <- 1.15e+8
sdp$PROPDMGEXP[sdp$totaldmg==115032500000] <- "M"
sdp$propmult[sdp$totaldmg==115032500000] <- 1e+06
sdp$totaldmg[sdp$totaldmg==115032500000] <- 115032500

Calculating Total Health Damage

To calculate the population health effects, we will be looking at the FATALITIES and INJURIES columns of our sdh data set. In order to compare types of events, we will create a totaldmg variable. It seems wrong to weigh fatalies and injuries as the same amount of “health damage”, so we arbitrarily multiply the fatalities by 3 to give us a better picture of overall damage. Everyone knows dying is three times worse than being injured.

sdh$totaldmg <- sdh$INJURIES+sdh$FATALITIES*3

Again, before we go any further, we will check the totaldmg variable to make sure we don’t have any unexplained outliers.

sdho <- sdh
sdho <- sdho[order(sdho$totaldmg, decreasing=TRUE),]
rownames(sdho) <- 1:nrow(sdho)
top10h <- sdho[1:10,]
top10h

##      BGN_DATE FATALITIES INJURIES        event totaldmg
## 1  1979-04-10         42     1700      tornado     1826
## 2  1995-07-12        583        0    heat wave     1749
## 3  2011-05-22        158     1150      tornado     1624
## 4  1994-02-08          1     1568 winter storm     1571
## 5  2011-04-27         44      800      tornado      932
## 6  1998-10-17          2      800        flood      806
## 7  2004-08-13          7      780    hurricane      801
## 8  2011-04-27         20      700      tornado      760
## 9  1998-10-17          0      750        flood      750
## 10 1998-10-17         11      600        flood      633

Luckily, this time everything seems normal.

Results

Economic Damage

Now that everything is cleaned up, we create a new object sdpa which aggregates the total damage for each type of event that we have defined in our event variable.

sdpa <- sdp[,c("event", "totaldmg")]
sdpa <- aggregate(sdpa$totaldmg, list(sdpa$event), sum)
colnames(sdpa) <- c("event", "totaldmg")
sdpa <- sdpa[order(sdpa$totaldmg, decreasing=TRUE),]
rownames(sdpa) <- 1:nrow(sdpa)
sdpa

##             event     totaldmg
## 1       hurricane 138237551810
## 2           flood  65057699894
## 3         tornado  47108721273
## 4    winter storm  21092558560
## 5      heavy wind  19671868158
## 6            hail  18995930230
## 7         drought  15018927780
## 8            fire   8905010130
## 9  tropical storm   8411023550
## 10     heavy rain   5238188040
## 11          other   1392274180
## 12      lightning    940781373
## 13      heat wave    924539250
## 14  coastal event    166588000
## 15     microburst      7312600
## 16        volcano       500000

The figure below shows the total damage done by each category of severe weather event.

It’s clear that Hurricanes do the most economic damage, far ahead of anything else. Hurricane Katrina alone did more total damage than many of the other entire categories in the previous plot.

Katrina <- rbind(sd[sd$BGN_DATE == "2005-08-29",],sd[sd$BGN_DATE == "2005-08-28",])
Katrina <- Katrina[grep("hurricane|surge", Katrina$EVTYPE),]
Katrina[,c(2,7,8,25,26)]

##          BGN_DATE STATE            EVTYPE PROPDMG PROPDMGEXP
## 566200 2005-08-29    AR hurricane/typhoon  400.00          K
## 570290 2005-08-29    GA hurricane/typhoon    0.00           
## 577676 2005-08-29    LA       storm surge   31.30          B
## 577677 2005-08-29    LA hurricane/typhoon   30.00          K
## 577678 2005-08-29    LA hurricane/typhoon    3.60          M
## 581535 2005-08-29    MS       storm surge   11.26          B
## 581536 2005-08-29    MS hurricane/typhoon    0.00           
## 581537 2005-08-29    MS hurricane/typhoon    5.88          B
## 581545 2005-08-29    MS hurricane/typhoon    0.00           
## 581551 2005-08-29    MS hurricane/typhoon    0.00           
## 598405 2005-08-29    TX       storm surge   40.00          K
## 569215 2005-08-28    FL       storm surge    0.00           
## 569217 2005-08-28    FL hurricane/typhoon    1.70          M
## 569218 2005-08-28    FL       storm surge  200.00          K
## 569219 2005-08-28    FL       storm surge  200.00          K
## 569221 2005-08-28    FL       storm surge  750.00          K
## 569222 2005-08-28    FL       storm surge  250.00          K
## 569223 2005-08-28    FL       storm surge  300.00          K
## 569224 2005-08-28    FL       storm surge    0.00           
## 569225 2005-08-28    FL       storm surge    0.00           
## 577675 2005-08-28    LA hurricane/typhoon   16.93          B
## 581533 2005-08-28    MS hurricane/typhoon    7.35          B

Floods do the second most damage and tornadoes the third. Winter storms come in a distant fourth, although if you combined them with hail (which is often, but not always, the case) they would be very close to tornadoes.

Population Health Damage

We create a new object sdha which aggregates the total health damage for each type of event that we have defined in our event variable.

sdha <- sdh[,c("event", "totaldmg")]
sdha <- aggregate(sdha$totaldmg, list(sdha$event), sum)
colnames(sdha) <- c("event", "totaldmg")
sdha <- sdha[order(sdha$totaldmg, decreasing=TRUE),]
rownames(sdha) <- 1:nrow(sdha)
sdha

##             event totaldmg
## 1         tornado    53112
## 2       heat wave    18636
## 3      heavy wind    15844
## 4           flood    13342
## 5    winter storm     8939
## 6       lightning     7682
## 7           other     5801
## 8            fire     1878
## 9       hurricane     1848
## 10           hail     1416
## 11  coastal event      855
## 12     heavy rain      623
## 13 tropical storm      581
## 14        drought      125
## 15     microburst       37
## 16        volcano        0

The figure below shows the total health damage done by each category of severe weather event.

By this measure, tornadoes cause by the far the most population health damage. Heat waves are a distant second, followed by heavy winds and then floods.

Considering hurricanes did so much property damage, we are confused that they are so far down the list for health damage. To look into this, we check the health data for the ten most destructive hurricanes in our data set.

hurhealth <- sdh[grep("hurricane", sdh$event),]
hurhealth <- hurhealth[order(sdh$totaldmg),]
hurhealth <- hurhealth[order(hurhealth$totaldmg, decreasing=TRUE),]
hurhealth[1:10,]

##          BGN_DATE FATALITIES INJURIES     event totaldmg
## 529351 2004-08-13          7      780 hurricane      801
## 484801 2002-12-08          1      316 hurricane      319
## 581537 2005-08-29         15      104 hurricane      149
## 366678 1999-09-14         13        0 hurricane       39
## 529499 2004-09-15          6       16 hurricane       34
## 739576 2008-09-12         11        0 hurricane       33
## 196184 1993-03-13         10        0 hurricane       30
## 282882 1997-01-01          0       27 hurricane       27
## 267894 1996-09-05          7        2 hurricane       23
## 281013 1996-09-09          7        0 hurricane       21

Among other things, we notice that Hurricane Katrina (August 29th, 2005) registers only 15 fatalities and 104 injuries. We know - from our own memory, from Wikipedia, and from the REMARKS column in our Katrina data set above - that this wildly underestimates the fatalities and injuries. Wikipedia tells us that 1,833 fatalities were directly associated with Hurricane Katrina. Our data set registers only 15.

These discrepencies cast some doubt on this whole portion of the analysis, but there seems to be no easy way to fix these entries, short of scanning through the REMARKS column of our original sd set and manually double-checking it against FATALITIES and INJURIES. Unfortunately, with 847,338 observations, that seems unreasonably time-consuming. We therefore accept our current analysis but take it, as they say, with a grain of salt. Tornadoes are bad.

In conclusion, don’t live anywhere with hurricanes, floods, or tornadoes. Unless it’s New Orleans, because New Orleans is the bomb.

Exhibit A: http://www.youtube.com/watch?v=cLPXcYo3uvc