Synopsis

This analysis merely scratches the surface of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, limited to the period from January 3rd, 1996 through November 2011, and to key variables for the questions we were to answer: Across the United States, which types of events (1) are most harmful with respect to population health, and (2) have the greatest economic consequences? I found that excessive heat (fatalities), tornadoes (injuries) and hurricanes (cost) are topping the lists. Some steps were taken to clean, transform and enrich the data, and these steps are all reproducible and possible to investigate through the included R code. The applied method is just simple aggregations. Further analyses could be done into dimensions like geography, demographics, time/trends, and even into the database’s rich narratives.

Background information about the data

The storm data is NOAA’s official publication which documents the occurrence of storms and other significant natural hazards having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. The data analyzed here start in the year 1950 and end in November 2011, with increasingly more complete data. All event types are included from January 3rd 1996 (ncdc.noaa.gov). You will find more detailed and interesting information about the input procedures at National Weather Service Storm Data Documentation and National Climatic Data Center Storm Events FAQ.

Prerequisites

Activate/install neccessary libraries.

require(xlsx) ## For read.xlsx2() to download the price index file
require(reshape2) ## For melt()
require(knitr) ## For kable()

The code below is only tested on 32bit Windows 7, R version 3.1.3. See session info at the bottom of this document.

Global R options:

Sys.setlocale(category = "LC_ALL", locale = "C")  ## Change locale to C language default to allow input format
options(scipen=999) ## To avoid scientific number notation in plots etc.

Data Processing

Downloading the storm database
Note that the setInternet2 option had to be set on my Windows system for download.file to work with knitr. It is only available on Windows, so if you are on another OS you must omit that line, and also most likely have to change mode=“wb” to method=“curl”. Alternatively, you can change the url from https: to http:.

if (!file.exists("./data")) { # Create a data folder if needed
        dir.create("./data") 
} 

url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2?accessType=DOWNLOAD"

bz2file = file.path( "./data" , "StormData.csv.bz2" )

setInternet2(use = TRUE) ## Neccessary for download.file to work with knitr on Windows 7

if (!file.exists(bz2file)) { ## Download zipped file if it is not already done
        download.file( url = url, 
                       destfile = bz2file, 
                       mode = "wb") ## Windows. Other OS may use method=curl
} 

# No need to unzip, bz2 unzipping is handled by R internally through read.csv. 

Reading the storm data into R
After looking through the first lines of the data and studying the documentation, I decided to read these ten variables (of the total 37) into R:

#stormsample <- read.csv("./data/StormData.csv.bz2")[1,] ## 1 obs. of 37 variables
#str(stormsample)

# Choosing relevant variables for this analysis, to limit memory burden
storm<- read.csv("./data/StormData.csv.bz2", 
                 strip.white=TRUE,
                 stringsAsFactors = FALSE)[ ,c('BGN_DATE', 'COUNTY', 'STATE', 'EVTYPE',
                                               'FATALITIES','INJURIES',
                                               'PROPDMG', 'PROPDMGEXP',
                                               'CROPDMG', 'CROPDMGEXP')] 

At this point, the storm data frame has the following number of observations and variables:

dim(storm)
## [1] 902297     10

Now changing the date variable to a date format and add a year variable to allow easier analyses.

storm$BGN_DATE <- as.Date(storm$BGN_DATE, "%m/%d/%Y")
storm$year <- format(storm$BGN_DATE, "%Y")

Subset to relevant rows from time period with complete input
The official documentation shows that in the earliest years, only tornado data was registered. The exploratory data analysis I performed also showed that input parameters like cost exponential codes were of variable quality in the early data. From January 3rd 1996, NOAA states that all event types are included. They also seem to be of more consistent good quality otherwise. I select to use only data from January 3rd 1996 onwards, and only rows with quantified fatalities, injuries or economical consequences.

storm <- subset(storm, BGN_DATE >= '1996-01-03' & 
                        (FATALITIES > 0|INJURIES > 0|PROPDMG > 0|CROPDMG > 0)) 

We are then at this number of observations and variables:

dim(storm)
## [1] 201289     11

The most extreme cost is wrong

billioncost <- storm[with(storm, PROPDMGEXP =="B"), 1:10 ]
head(billioncost[order(-billioncost$PROPDMG), c(1,2,3,4,7,8) ],5)
##          BGN_DATE COUNTY STATE            EVTYPE PROPDMG PROPDMGEXP
## 605953 2006-01-01     55    CA             FLOOD  115.00          B
## 577676 2005-08-29     40    LA       STORM SURGE   31.30          B
## 577675 2005-08-28     34    LA HURRICANE/TYPHOON   16.93          B
## 581535 2005-08-29     80    MS       STORM SURGE   11.26          B
## 569308 2005-10-24     68    FL HURRICANE/TYPHOON   10.00          B

I had to search the Internet for the USD 115 Billion damage reported in Napa River in 2006, a much higher cost than the flooding of New Orleans after hurricane Katrina. The U.S. Government publication found here estimates the total Northern California floodings in Dec 2005 and Jan 2006 to be 300 Million, so clearly this should be M, not B. I could update it from B to M. But in the same week, a 115M otherwise identical record already exists.

storm[with(storm, PROPDMG == 115 & COUNTY==55), ]
##          BGN_DATE COUNTY STATE EVTYPE FATALITIES INJURIES PROPDMG
## 567251 2005-12-31     55    CA  FLOOD          0        0     115
## 605953 2006-01-01     55    CA  FLOOD          0        0     115
##        PROPDMGEXP CROPDMG CROPDMGEXP year
## 567251          M    32.5          M 2005
## 605953          B    32.5          M 2006

It could be that the two 115M records should be added together, or that one is a duplicate. The remarks mention the exact same incident and costs. I ultimately chose to consider the 115B record as a duplicate to be removed.

storm <- storm[!(storm$PROPDMG == 115 & storm$PROPDMGEXP =="B" & storm$COUNTY==55), ] ## Delete that one row
dim(storm) ## One row less
## [1] 201288     11

It is no longer among the billion cost rows:

billioncost <- storm[with(storm, PROPDMGEXP =="B"), 1:10 ] 
head(billioncost[order(-billioncost$PROPDMG), c(1,2,3,4,7,8) ],5) 
##          BGN_DATE COUNTY STATE            EVTYPE PROPDMG PROPDMGEXP
## 577676 2005-08-29     40    LA       STORM SURGE   31.30          B
## 577675 2005-08-28     34    LA HURRICANE/TYPHOON   16.93          B
## 581535 2005-08-29     80    MS       STORM SURGE   11.26          B
## 569308 2005-10-24     68    FL HURRICANE/TYPHOON   10.00          B
## 581533 2005-08-28     68    MS HURRICANE/TYPHOON    7.35          B

Same sanity check for fatality and injury records reveals no errors
The 2011 Missouri tornado is the worst event with 158 fatalities…

head(storm[order(-storm$FATALITIES), 1:6 ],5)
##          BGN_DATE COUNTY STATE         EVTYPE FATALITIES INJURIES
## 862634 2011-05-22     97    MO        TORNADO        158     1150
## 355128 1999-07-28      5    IL EXCESSIVE HEAT         99        0
## 371112 1999-07-04     54    PA EXCESSIVE HEAT         74      135
## 598500 2005-09-21    163    TX EXCESSIVE HEAT         49        0
## 606363 2006-07-16     89    CA EXCESSIVE HEAT         46       18

…and 1150 injured.

head(storm[order(-storm$INJURIES), 1:6 ],5) 
##          BGN_DATE COUNTY STATE            EVTYPE FATALITIES INJURIES
## 862634 2011-05-22     97    MO           TORNADO        158     1150
## 344159 1998-10-17     91    TX             FLOOD          2      800
## 860386 2011-04-27    125    AL           TORNADO         44      800
## 529351 2004-08-13     55    FL HURRICANE/TYPHOON          7      780
## 344178 1998-10-17    206    TX             FLOOD          0      750

K/M/B cost magnitude characters
Cost estimates are rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. We have:

table(storm$year, storm$PROPDMGEXP)
##       
##                  B     K     M
##   1996   735     0  8776   500
##   1997   834     1  9095   392
##   1998   948     1 12507   557
##   1999   792     1  9446   370
##   2000   978     1 10247   282
##   2001   971     1  8977   349
##   2002   651     0  9487   294
##   2003   655     2  9854   504
##   2004   725     5  9309   445
##   2005   568    10  9063   373
##   2006   588     0 10923   462
##   2007     0     0 11498   455
##   2008     0     2 16919   712
##   2009     0     0 13997   437
##   2010     0     2 15601   416
##   2011     0     5 19750   815

We see “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry. 1996-2006: No damage, no magnitude value. After 2006 default magnitude. Similar data found for storm$CROPDMGEXP. We can then add more practical magnitude multipliers:

#Cost magnitude letters, translated to numbers
storm$PROPDMGEXP.value[storm$PROPDMGEXP == "B"] <- 1000000000
storm$PROPDMGEXP.value[storm$PROPDMGEXP == "M"] <- 1000000
storm$PROPDMGEXP.value[storm$PROPDMGEXP == "K"] <- 1000
storm$PROPDMGEXP.value[is.na(storm$PROPDMGEXP.value)] <- 1
table(storm$PROPDMGEXP.value)
## 
##          1       1000    1000000 1000000000 
##       8445     185449       7363         31
storm$CROPDMGEXP.value[storm$CROPDMGEXP == "B"] <- 1000000000
storm$CROPDMGEXP.value[storm$CROPDMGEXP == "M"] <- 1000000
storm$CROPDMGEXP.value[storm$CROPDMGEXP == "K"] <- 1000
storm$CROPDMGEXP.value[is.na(storm$CROPDMGEXP.value)] <- 1
table(storm$CROPDMGEXP.value)
## 
##          1       1000    1000000 1000000000 
##     102745      96781       1760          2

Use of construction price index (2005=100) from US Dept of Commerce, Survey of Construction
I decided to index adjust the cost estimates using the price index for single-family houses under construction. This is based on an unconfirmed assumption that property costs is driven mostly by the reconstruction costs for homes and infrastructure, and not by for example lost frozen food, damaged electrical appliances and interior. This would have to be done differently in a more thorough analysis.

setInternet2(use = TRUE) ## Neccessary for download.file to work with knitr on Windows 7

download.file("https://www.census.gov/construction/nrs/xls/price_uc_cust.xls",
              "price_uc_cust.xls", 
              mode="wb")

prices <- read.xlsx2("price_uc_cust.xls", 
                     sheetName = "fixed", 
                     startRow = 7, 
                     colClasses = rep("numeric",30))

prices <- prices[ ,c(1,3)] ## Year and fixed annual index columns

colnames(prices) <- c("year","index")

prices <- subset(prices, prices$year>=1996 & prices$year<=2011) ## Relevant years only

prices$adjustment_factor_to_2011 <- prices$index / prices$index[prices$year == 2011] ## Anchor calc. to 2011

These are then the factors used for cost adjustment to 2011 dollar value:

print(prices)
##    year index adjustment_factor_to_2011
## 33 1996  68.6                 0.7043121
## 34 1997  70.6                 0.7248460
## 35 1998  72.5                 0.7443532
## 36 1999  72.7                 0.7464066
## 37 2000  75.9                 0.7792608
## 38 2001  79.7                 0.8182752
## 39 2002  81.7                 0.8388090
## 40 2003  85.9                 0.8819302
## 41 2004  93.1                 0.9558522
## 42 2005 100.0                 1.0266940
## 43 2006 106.0                 1.0882957
## 44 2007 107.0                 1.0985626
## 45 2008 103.3                 1.0605749
## 46 2009  98.1                 1.0071869
## 47 2010  96.4                 0.9897331
## 48 2011  97.4                 1.0000000

Adding prices’ columns to storm.

storm <- merge(storm, prices) ## Merge the adjustment factor into storm (key=year)

Now we can use this to calculate values for the property and crop losses that are indexed to 2011 dollars.

storm$PROPDMG.2011.value <- (storm$PROPDMG * storm$PROPDMGEXP.value) / storm$adjustment_factor_to_2011
storm$CROPDMG.2011.value <- (storm$CROPDMG * storm$CROPDMGEXP.value) / storm$adjustment_factor_to_2011

How event types were grouped
Even if post-1996 data are of decent quality, some event types (EVTYPE) must be mapped to each other. For example, many different TSTM… variants exist.

head(table(storm$EVTYPE, storm$year) ,10)
##                         
##                          1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
##      HIGH SURF ADVISORY     0    0    0    0    0    1    0    0    0    0
##    FLASH FLOOD              0    0    0    0    1    0    0    0    0    0
##    TSTM WIND                0    0    0    1    1    0    0    0    0    0
##    TSTM WIND (G45)          0    0    1    0    0    0    0    0    0    0
##   AGRICULTURAL FREEZE       0    3    0    0    0    0    0    0    0    0
##   ASTRONOMICAL HIGH TIDE    0    0    0    0    0    0    0    3    1    4
##   ASTRONOMICAL LOW TIDE     0    0    0    0    0    0    0    0    0    0
##   AVALANCHE                 5    2   12   21   23   25   20    9   13   18
##   BLACK ICE                 0    1    0    0    0    0    0    0    0    0
##   BLIZZARD                 26   31   12    5    8   13    4    4    3   16
##                         
##                          2006 2007 2008 2009 2010 2011
##      HIGH SURF ADVISORY     0    0    0    0    0    0
##    FLASH FLOOD              0    0    0    0    0    0
##    TSTM WIND                0    0    0    0    0    0
##    TSTM WIND (G45)          0    0    0    0    0    0
##   AGRICULTURAL FREEZE       0    0    0    0    0    0
##   ASTRONOMICAL HIGH TIDE    0    0    0    0    0    0
##   ASTRONOMICAL LOW TIDE     0    1    1    0    0    0
##   AVALANCHE                13   20   24    8   39   12
##   BLACK ICE                 0    0    0    0    0    0
##   BLIZZARD                  6    7    7   52   17   17

A total of 222 (length(unique(storm$EVTYPE))) unique EVTYPEs exists in my filtered storm data set. I retrieved the Storm Data Event Table in Section 2.1.1 of NWS Directive 10-1605 August 17, 2007. It has 48 event types.

# All the 48 official event types cut & pasted from NWSI 10-1605 AUGUST 17, 2007, chapter 2.1.1
eventtable <- data.frame(
        event.type=c(
                "Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill","Debris Flow",
                "Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat", 
                "Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze","Funnel Cloud","Freezing Fog",
                "Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind","Hurricane (Typhoon)",
                "Ice Storm","Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind", 
                "Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche","Sleet",
                "Storm Surge/Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression",
                "Tropical Storm","Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm",
                "Winter Weather"),
        stringsAsFactors = FALSE)

From this table, a separate mapping table evmap is created:

evmap <- as.data.frame(unique(storm$EVTYPE)) 
names(evmap) <- "EVTYPE"

To this mapping table I apply the official event name (event.type) where a direct match is found.

evmap$event.type<- factor(eventtable$event.type[match(toupper(levels(evmap$EVTYPE)[evmap$EVTYPE]),
                                                      toupper(eventtable$event.type))])

evmap[] <- lapply(evmap, as.character)

Of the total 222 (nrow(evmap)) EVTYPEs, 172 (sum(is.na(evmap$event.type))) remain unmatched at this point. To work with textual mapping, I now pre-created a result and result.agr.totdata frame by merging relevant columns from storm to the mapping table evmap. Unmapped event types are marked as “Unmapped!” here. Implemented using functions since a rerun is required when mapping is done.

PrepareResult <- function() {
        # Merges storm with the current event mapping, trims EVTYPE
        #
        # Args: 
        #    None
        # Returns: 
        #    The data frame `result`, with unmapped event types marked as such
        result <<- merge(x=evmap, y=storm[ ,c(1,5,6,7,16,17)], by = "EVTYPE")
        result$EVTYPE <<- toupper(gsub("^\\s+|\\s+$", "", result$EVTYPE)) ## Remove leading/trailing space
        result$event.type[is.na(result$event.type)] <<- "Unmapped!"
}

PrepareResultAgr <<- function(aggrby="EVTYPE + event.type") {
        # Aggregates `result` by given grouping variable(s) 
        #
        # Args: 
        #     aggrby: Aggregate by. If "event.type" then groups by only that 
        #             variable, otherwise by both EVTYPE and event.type
        # Returns: 
        #     The data frame `result.agr.tot`, with a `total.cost` variable instead
        #     of the separate PROPDMG and CROPDMG 2011 value variables
        result.agr.tot <<- 
                if (aggrby=="event.type") {
                        aggregate(cbind(FATALITIES, 
                                        INJURIES, 
                                        PROPDMG.2011.value, 
                                        CROPDMG.2011.value) ~ event.type, 
                                  data=result,
                                  sum)
                } else {
                        aggregate(cbind(FATALITIES, 
                                        INJURIES, 
                                        PROPDMG.2011.value, 
                                        CROPDMG.2011.value) ~ EVTYPE+event.type, 
                                  data=result, 
                                  sum)
                }
        result.agr.tot$total.cost <<- round(result.agr.tot$PROPDMG.2011.value+
                                                    result.agr.tot$CROPDMG.2011.value, 0)
        
        result.agr.tot <<- 
                result.agr.tot[ ,!(colnames(result.agr.tot) %in% 
                                           c("PROPDMG.2011.value", "CROPDMG.2011.value"))] 
}

Run those functions and see how the plotting basis would look at this stage. Note that some EVTYPEs where grouped together after trimming and upper-casing them all, so total number of distinct EVTYPEs decrease somewhat.

PrepareResult()
PrepareResultAgr()
kable(result.agr.tot)
EVTYPE event.type FATALITIES INJURIES total.cost
ASTRONOMICAL LOW TIDE Astronomical Low Tide 0 0 297811
AVALANCHE Avalanche 223 156 3938103
BLIZZARD Blizzard 70 385 699572522
COASTAL FLOOD Coastal Flood 3 2 254258558
COLD/WIND CHILL Cold/Wind Chill 95 12 2556986
DENSE FOG Dense Fog 9 143 7487943
DENSE SMOKE Dense Smoke 0 0 91028
DROUGHT Drought 0 4 16995641241
DUST DEVIL Dust Devil 2 39 698760
DUST STORM Dust Storm 11 376 8906423
EXCESSIVE HEAT Excessive Heat 1797 6391 461214640
EXTREME COLD/WIND CHILL Extreme Cold/Wind Chill 125 24 8846980
FLASH FLOOD Flash Flood 887 1674 18275422159
FLOOD Flood 414 6758 37981905773
FREEZING FOG Freezing Fog 0 0 2169617
FROST/FREEZE Frost/Freeze 0 0 1064299095
FUNNEL CLOUD Funnel Cloud 0 1 150893
HAIL Hail 7 713 19033928551
HEAT Heat 237 1222 1679745
HEAVY RAIN Heavy Rain 94 230 1461957054
HEAVY SNOW Heavy Snow 106 698 890910018
HIGH SURF High Surf 90 150 83258124
HIGH WIND High Wind 235 1083 6209445781
ICE STORM Ice Storm 82 318 4311045550
LAKE-EFFECT SNOW Lake-Effect Snow 0 0 37671706
LAKESHORE FLOOD Lakeshore Flood 0 0 7536497
LIGHTNING Lightning 650 4140 822642394
MARINE HAIL Marine Hail 0 0 3772
MARINE HIGH WIND Marine High Wind 1 1 1279355
MARINE STRONG WIND Marine Strong Wind 14 22 415146
MARINE THUNDERSTORM WIND Marine Thunderstorm Wind 10 26 467662
RIP CURRENT Rip Current 340 209 993
SEICHE Seiche 0 0 1256720
STORM SURGE/TIDE Storm Surge/Tide 11 5 4379147426
STRONG WIND Strong Wind 103 278 244869801
THUNDERSTORM WIND Thunderstorm Wind 130 1400 3668176912
TORNADO Tornado 1511 20667 27086010017
TROPICAL DEPRESSION Tropical Depression 0 0 1683602
TROPICAL STORM Tropical Storm 57 338 9880855441
TSUNAMI Tsunami 33 129 142760381
AGRICULTURAL FREEZE Unmapped! 0 0 39760170
ASTRONOMICAL HIGH TIDE Unmapped! 0 0 10334066
BEACH EROSION Unmapped! 0 0 141983
BLACK ICE Unmapped! 1 24 0
BLOWING DUST Unmapped! 0 0 24442
BLOWING SNOW Unmapped! 1 1 20694
BRUSH FIRE Unmapped! 0 2 0
COASTAL FLOODING/EROSION Unmapped! 0 0 20151724
COASTAL EROSION Unmapped! 0 0 1029081
COASTAL FLOODING Unmapped! 3 0 137868181
COASTAL FLOODING/EROSION Unmapped! 0 5 26909269
COASTAL STORM Unmapped! 3 2 70991
COASTALSTORM Unmapped! 1 0 0
COLD Unmapped! 18 12 746223
COLD AND SNOW Unmapped! 14 0 0
COLD TEMPERATURE Unmapped! 2 0 0
COLD WEATHER Unmapped! 2 0 0
DAM BREAK Unmapped! 0 0 1286027
DAMAGING FREEZE Unmapped! 0 0 59817230
DOWNBURST Unmapped! 0 0 2840
DROWNING Unmapped! 1 0 0
DRY MICROBURST Unmapped! 3 25 2313627
EARLY FROST Unmapped! 0 0 59632653
EROSION/CSTL FLOOD Unmapped! 0 0 23001166
EXCESSIVE SNOW Unmapped! 0 2 2592421
EXTENDED COLD Unmapped! 1 0 141983
EXTREME COLD Unmapped! 115 79 1785714227
EXTREME WINDCHILL Unmapped! 17 5 20317704
FALLING SNOW/ICE Unmapped! 1 1 0
FLASH FLOOD Unmapped! 0 0 64163
FLASH FLOOD/FLOOD Unmapped! 0 0 6699
FLOOD/FLASH/FLOOD Unmapped! 0 0 13398
FOG Unmapped! 60 712 16932905
FREEZE Unmapped! 0 0 205807146
FREEZING DRIZZLE Unmapped! 2 13 146869
FREEZING RAIN Unmapped! 2 0 843327
FREEZING SPRAY Unmapped! 1 0 0
FROST Unmapped! 1 3 20694
GLAZE Unmapped! 1 212 208391
GRADIENT WIND Unmapped! 0 0 51045
GUSTY WIND Unmapped! 1 1 497705
GUSTY WIND/HAIL Unmapped! 0 0 26795
GUSTY WIND/HVY RAIN Unmapped! 0 0 2840
GUSTY WIND/RAIN Unmapped! 0 0 2840
GUSTY WINDS Unmapped! 3 11 2011203
HARD FREEZE Unmapped! 0 0 17796884
HAZARDOUS SURF Unmapped! 0 1 0
HEAT WAVE Unmapped! 0 70 0
HEAVY RAIN/HIGH SURF Unmapped! 0 0 21297376
HEAVY SEAS Unmapped! 1 0 0
HEAVY SNOW SHOWER Unmapped! 0 2 14198
HEAVY SURF Unmapped! 6 40 1796318
HEAVY SURF AND WIND Unmapped! 3 0 0
HEAVY SURF/HIGH SURF Unmapped! 42 48 10943188
HIGH SEAS Unmapped! 3 7 20096
HIGH SURF ADVISORY Unmapped! 0 0 244417
HIGH SWELLS Unmapped! 1 0 6898
HIGH WATER Unmapped! 3 0 0
HIGH WIND (G40) Unmapped! 0 0 24182
HIGH WINDS Unmapped! 0 0 689802
HURRICANE Unmapped! 61 46 18479180880
HURRICANE EDOUARD Unmapped! 0 2 0
HURRICANE/TYPHOON Unmapped! 64 1275 71610691192
HYPERTHERMIA/EXPOSURE Unmapped! 1 0 0
HYPOTHERMIA/EXPOSURE Unmapped! 7 0 0
ICE JAM FLOOD (MINOR Unmapped! 0 0 1420
ICE ON ROAD Unmapped! 1 0 0
ICE ROADS Unmapped! 0 1 16077
ICY ROADS Unmapped! 4 22 446548
LAKE EFFECT SNOW Unmapped! 0 0 91238
LANDSLIDE Unmapped! 37 52 333354235
LANDSLIDES Unmapped! 1 1 6898
LANDSLUMP Unmapped! 0 0 809300
LANDSPOUT Unmapped! 0 0 9378
LATE SEASON SNOW Unmapped! 0 0 219975
LIGHT FREEZING RAIN Unmapped! 0 0 582967
LIGHT SNOW Unmapped! 1 2 3298462
LIGHT SNOWFALL Unmapped! 0 0 120685
MARINE ACCIDENT Unmapped! 1 2 70991
MARINE TSTM WIND Unmapped! 9 8 5763371
MICROBURST Unmapped! 0 0 28397
MIXED PRECIP Unmapped! 2 26 0
MIXED PRECIPITATION Unmapped! 0 0 1005462
MUD SLIDE Unmapped! 0 0 122346
MUDSLIDE Unmapped! 4 2 1497051
MUDSLIDES Unmapped! 1 0 0
NON TSTM WIND Unmapped! 0 1 0
NON-SEVERE WIND DAMAGE Unmapped! 0 7 6416
NON-TSTM WIND Unmapped! 0 0 48883
OTHER Unmapped! 0 4 1505159
RAIN Unmapped! 0 0 705797
RAIN/SNOW Unmapped! 4 2 0
RECORD HEAT Unmapped! 2 0 0
RIP CURRENTS Unmapped! 202 294 195221
RIVER FLOOD Unmapped! 1 0 30024208
RIVER FLOODING Unmapped! 1 1 190505029
ROCK SLIDE Unmapped! 0 0 201517
ROGUE WAVE Unmapped! 0 2 0
ROUGH SEAS Unmapped! 8 5 0
ROUGH SURF Unmapped! 4 1 14198
SMALL HAIL Unmapped! 0 10 27370603
SNOW Unmapped! 2 12 3382057
SNOW AND ICE Unmapped! 1 0 0
SNOW SQUALL Unmapped! 2 35 41388
SNOW SQUALLS Unmapped! 1 0 94364
STORM SURGE Unmapped! 2 37 42094551052
STRONG WINDS Unmapped! 7 21 2923581
THUNDERSTORM Unmapped! 1 1 0
THUNDERSTORM WIND (G40) Unmapped! 1 0 0
TIDAL FLOODING Unmapped! 0 1 18229
TORRENTIAL RAINFALL Unmapped! 0 4 0
TSTM WIND Unmapped! 241 3629 6223626575
TSTM WIND (G45) Unmapped! 0 0 6717
TSTM WIND (41) Unmapped! 0 0 11037
TSTM WIND (G35) Unmapped! 1 0 40303
TSTM WIND (G40) Unmapped! 0 1 60112
TSTM WIND (G45) Unmapped! 0 3 310550
TSTM WIND 40 Unmapped! 0 0 1380
TSTM WIND 45 Unmapped! 0 0 13796
TSTM WIND AND LIGHTNING Unmapped! 0 0 110368
TSTM WIND G45 Unmapped! 0 0 1283
TSTM WIND/HAIL Unmapped! 5 95 142016694
TYPHOON Unmapped! 0 5 829217521
UNSEASONABLE COLD Unmapped! 0 0 7241108
UNSEASONABLY COLD Unmapped! 0 0 34547182
UNSEASONABLY WARM Unmapped! 0 17 13434
UNSEASONAL RAIN Unmapped! 0 0 13416003
URBAN/SML STREAM FLD Unmapped! 28 79 90203132
WARM WEATHER Unmapped! 0 2 0
WET MICROBURST Unmapped! 0 0 47563
WHIRLWIND Unmapped! 1 0 15444
WILD/FOREST FIRE Unmapped! 12 545 4019565607
WIND Unmapped! 18 84 3202952
WIND AND WAVE Unmapped! 0 0 1339752
WIND DAMAGE Unmapped! 0 0 14198
WINDS Unmapped! 1 0 0
WINTER WEATHER MIX Unmapped! 0 68 71530
WINTER WEATHER/MIX Unmapped! 28 72 6400452
WINTRY MIX Unmapped! 1 77 16984
VOLCANIC ASH Volcanic Ash 0 0 611041
WATERSPOUT Waterspout 2 2 5884591
WILDFIRE Wildfire 75 911 5235566860
WINTER STORM Winter Storm 190 1287 1582133030
WINTER WEATHER Winter Weather 33 343 35302095

Conclusion: Must at least map HURRICANE/TYPHOON, STORM SURGE, HURRICANE, TSTM WIND, WILD/FOREST FIRE, FOG, RIP CURRENTS and EXTREME COLD to their new official equivalents. Also other groups like flash flood/flood and anything with adjectives like gusty, gradient, high and heavy needs mapping.

Corr <- function(search.list, replacement){
        # Updates `evmap` with new mappings between storm$EVMAP and official 48 names
        #
        # Args:
        #     search.list: The terms to be searched for in EVMAP by the grep() function,
        #                  e.g. (?=.*TSTM)(?=.*WIND)
        #     replacement: The name (from the official 48) to map it to
        #            
        # Returns:
        #     Changes evmap$event.type from NA to official name where match is found   
        evmap[is.na(evmap$event.type),][c(grep(paste("(?=.*",search.list,sep="",collapse="",")"), 
                                               evmap[is.na(evmap$event.type),][,1], 
                                               ignore.case=TRUE,
                                               perl=TRUE)), 2] <<- replacement
}

I ran that function on these combinations and in this order as a minimum mapping to get the right picture:

evmap$EVTYPE <- gsub("^\\s+|\\s+$", "", evmap$EVTYPE) ##Remove leading/trailing space

Corr("HURRICANE", "Hurricane (Typhoon)")
Corr("TYPHOON", "Hurricane (Typhoon)") 
Corr("STORM SURGE", "Storm Surge/Tide")
Corr("MARINE TSTM WIND", "Marine Thunderstorm Wind")
Corr(c("^NON","WIND"), "High Wind")
Corr(c("^WIND|GUSTY|HIGH","WIND"), "High Wind")
Corr(c("^TSTM","WIND"), "Thunderstorm Wind")
Corr(c("THUNDERSTORM","WIND"), "Thunderstorm Wind")
Corr("THUNDERSTORM", "Thunderstorm Wind") 
Corr("FIRE", "Wildfire")
Corr(c("EXTREME","COLD"), "Extreme Cold/Wind Chill")
Corr(c("EXTREME","WINDCHILL"), "Extreme Cold/Wind Chill")
Corr("COLD", "Cold/Wind Chill")
Corr("RIP CURRENTS", "Rip Current") 
Corr("FOG", "Dense Fog")
Corr("^HIGH SURF", "High Surf")
Corr(c("COASTAL|CSTL","FLOOD"), "Coastal Flood")
Corr(c("^RIVER","FLOOD"), "Flood")
Corr("STREAM FLD", "Flood")
Corr(c("FLASH","FLOOD"), "Flash Flood")
Corr("FREEZE|FROST", "Frost/Freeze")
Corr("WINTER|WINTRY", "Winter Weather")
Corr("HAIL", "Hail")
Corr("SURF", "High Surf")
Corr("FREEZING", "Ice Storm")
Corr("LIGHT SNOW", "Winter Weather")
Corr("SNOW", "Heavy Snow")
Corr("RAIN", "Heavy Rain")
Corr("WINDS", "Strong Wind")

These rules are, of course subjective, and should be considered critically. The result was this:

PrepareResult()
PrepareResultAgr()
kable(result.agr.tot)
EVTYPE event.type FATALITIES INJURIES total.cost
ASTRONOMICAL LOW TIDE Astronomical Low Tide 0 0 297811
AVALANCHE Avalanche 223 156 3938103
BLIZZARD Blizzard 70 385 699572522
COASTAL FLOODING/EROSION Coastal Flood 0 0 20151724
COASTAL FLOOD Coastal Flood 3 2 254258558
COASTAL FLOODING Coastal Flood 3 0 137868181
COASTAL FLOODING/EROSION Coastal Flood 0 5 26909269
EROSION/CSTL FLOOD Coastal Flood 0 0 23001166
COLD Cold/Wind Chill 18 12 746223
COLD AND SNOW Cold/Wind Chill 14 0 0
COLD TEMPERATURE Cold/Wind Chill 2 0 0
COLD WEATHER Cold/Wind Chill 2 0 0
COLD/WIND CHILL Cold/Wind Chill 95 12 2556986
EXTENDED COLD Cold/Wind Chill 1 0 141983
UNSEASONABLE COLD Cold/Wind Chill 0 0 7241108
UNSEASONABLY COLD Cold/Wind Chill 0 0 34547182
DENSE FOG Dense Fog 9 143 7487943
FOG Dense Fog 60 712 16932905
DENSE SMOKE Dense Smoke 0 0 91028
DROUGHT Drought 0 4 16995641241
DUST DEVIL Dust Devil 2 39 698760
DUST STORM Dust Storm 11 376 8906423
EXCESSIVE HEAT Excessive Heat 1797 6391 461214640
EXTREME COLD Extreme Cold/Wind Chill 115 79 1785714227
EXTREME COLD/WIND CHILL Extreme Cold/Wind Chill 125 24 8846980
EXTREME WINDCHILL Extreme Cold/Wind Chill 17 5 20317704
FLASH FLOOD Flash Flood 1774 3348 36550844317
FLASH FLOOD/FLOOD Flash Flood 0 0 6699
FLOOD/FLASH/FLOOD Flash Flood 0 0 13398
FLOOD Flood 414 6758 37981905773
RIVER FLOOD Flood 1 0 30024208
RIVER FLOODING Flood 1 1 190505029
URBAN/SML STREAM FLD Flood 28 79 90203132
FREEZING FOG Freezing Fog 0 0 2169617
AGRICULTURAL FREEZE Frost/Freeze 0 0 39760170
DAMAGING FREEZE Frost/Freeze 0 0 59817230
EARLY FROST Frost/Freeze 0 0 59632653
FREEZE Frost/Freeze 0 0 205807146
FROST Frost/Freeze 1 3 20694
FROST/FREEZE Frost/Freeze 0 0 1064299095
HARD FREEZE Frost/Freeze 0 0 17796884
FUNNEL CLOUD Funnel Cloud 0 1 150893
HAIL Hail 7 713 19033928551
SMALL HAIL Hail 0 10 27370603
HEAT Heat 237 1222 1679745
HEAVY RAIN Heavy Rain 94 230 1461957054
RAIN Heavy Rain 0 0 705797
TORRENTIAL RAINFALL Heavy Rain 0 4 0
UNSEASONAL RAIN Heavy Rain 0 0 13416003
BLOWING SNOW Heavy Snow 1 1 20694
EXCESSIVE SNOW Heavy Snow 0 2 2592421
FALLING SNOW/ICE Heavy Snow 1 1 0
HEAVY SNOW Heavy Snow 106 698 890910018
HEAVY SNOW SHOWER Heavy Snow 0 2 14198
LAKE EFFECT SNOW Heavy Snow 0 0 91238
LATE SEASON SNOW Heavy Snow 0 0 219975
RAIN/SNOW Heavy Snow 4 2 0
SNOW Heavy Snow 2 12 3382057
SNOW AND ICE Heavy Snow 1 0 0
SNOW SQUALL Heavy Snow 2 35 41388
SNOW SQUALLS Heavy Snow 1 0 94364
HAZARDOUS SURF High Surf 0 1 0
HEAVY RAIN/HIGH SURF High Surf 0 0 21297376
HEAVY SURF High Surf 6 40 1796318
HEAVY SURF AND WIND High Surf 3 0 0
HEAVY SURF/HIGH SURF High Surf 42 48 10943188
HIGH SURF High Surf 90 150 83258124
ROUGH SURF High Surf 4 1 14198
GUSTY WIND High Wind 1 1 497705
GUSTY WIND/HAIL High Wind 0 0 26795
GUSTY WIND/HVY RAIN High Wind 0 0 2840
GUSTY WIND/RAIN High Wind 0 0 2840
GUSTY WINDS High Wind 3 11 2011203
HIGH WIND High Wind 235 1083 6209445781
HIGH WIND (G40) High Wind 0 0 24182
HIGH WINDS High Wind 0 0 689802
NON TSTM WIND High Wind 0 1 0
NON-SEVERE WIND DAMAGE High Wind 0 7 6416
NON-TSTM WIND High Wind 0 0 48883
WIND High Wind 18 84 3202952
WIND AND WAVE High Wind 0 0 1339752
WIND DAMAGE High Wind 0 0 14198
WINDS High Wind 1 0 0
HURRICANE Hurricane (Typhoon) 61 46 18479180880
HURRICANE EDOUARD Hurricane (Typhoon) 0 2 0
HURRICANE/TYPHOON Hurricane (Typhoon) 64 1275 71610691192
TYPHOON Hurricane (Typhoon) 0 5 829217521
FREEZING DRIZZLE Ice Storm 2 13 146869
FREEZING RAIN Ice Storm 2 0 843327
FREEZING SPRAY Ice Storm 1 0 0
ICE STORM Ice Storm 82 318 4311045550
LIGHT FREEZING RAIN Ice Storm 0 0 582967
LAKE-EFFECT SNOW Lake-Effect Snow 0 0 37671706
LAKESHORE FLOOD Lakeshore Flood 0 0 7536497
LIGHTNING Lightning 650 4140 822642394
MARINE HAIL Marine Hail 0 0 3772
MARINE HIGH WIND Marine High Wind 1 1 1279355
MARINE STRONG WIND Marine Strong Wind 14 22 415146
MARINE THUNDERSTORM WIND Marine Thunderstorm Wind 10 26 467662
MARINE TSTM WIND Marine Thunderstorm Wind 9 8 5763371
RIP CURRENT Rip Current 340 209 993
RIP CURRENTS Rip Current 202 294 195221
SEICHE Seiche 0 0 1256720
STORM SURGE Storm Surge/Tide 2 37 42094551052
STORM SURGE/TIDE Storm Surge/Tide 11 5 4379147426
STRONG WIND Strong Wind 103 278 244869801
STRONG WINDS Strong Wind 7 21 2923581
THUNDERSTORM Thunderstorm Wind 1 1 0
THUNDERSTORM WIND Thunderstorm Wind 130 1400 3668176912
THUNDERSTORM WIND (G40) Thunderstorm Wind 1 0 0
TSTM WIND Thunderstorm Wind 482 7258 12426410326
TSTM WIND (G45) Thunderstorm Wind 0 0 6717
TSTM WIND (41) Thunderstorm Wind 0 0 11037
TSTM WIND (G35) Thunderstorm Wind 1 0 40303
TSTM WIND (G40) Thunderstorm Wind 0 1 60112
TSTM WIND (G45) Thunderstorm Wind 0 6 599604
TSTM WIND 40 Thunderstorm Wind 0 0 1380
TSTM WIND 45 Thunderstorm Wind 0 0 13796
TSTM WIND AND LIGHTNING Thunderstorm Wind 0 0 110368
TSTM WIND G45 Thunderstorm Wind 0 0 1283
TSTM WIND/HAIL Thunderstorm Wind 5 95 142016694
TORNADO Tornado 1511 20667 27086010017
TROPICAL DEPRESSION Tropical Depression 0 0 1683602
TROPICAL STORM Tropical Storm 57 338 9880855441
TSUNAMI Tsunami 33 129 142760381
ASTRONOMICAL HIGH TIDE Unmapped! 0 0 10334066
BEACH EROSION Unmapped! 0 0 141983
BLACK ICE Unmapped! 1 24 0
BLOWING DUST Unmapped! 0 0 24442
COASTAL EROSION Unmapped! 0 0 1029081
COASTAL STORM Unmapped! 3 2 70991
COASTALSTORM Unmapped! 1 0 0
DAM BREAK Unmapped! 0 0 1286027
DOWNBURST Unmapped! 0 0 2840
DROWNING Unmapped! 1 0 0
DRY MICROBURST Unmapped! 3 25 2313627
GLAZE Unmapped! 1 212 208391
GRADIENT WIND Unmapped! 0 0 51045
HEAT WAVE Unmapped! 0 70 0
HEAVY SEAS Unmapped! 1 0 0
HIGH SEAS Unmapped! 3 7 20096
HIGH SWELLS Unmapped! 1 0 6898
HIGH WATER Unmapped! 3 0 0
HYPERTHERMIA/EXPOSURE Unmapped! 1 0 0
HYPOTHERMIA/EXPOSURE Unmapped! 7 0 0
ICE JAM FLOOD (MINOR Unmapped! 0 0 1420
ICE ON ROAD Unmapped! 1 0 0
ICE ROADS Unmapped! 0 1 16077
ICY ROADS Unmapped! 4 22 446548
LANDSLIDE Unmapped! 37 52 333354235
LANDSLIDES Unmapped! 1 1 6898
LANDSLUMP Unmapped! 0 0 809300
LANDSPOUT Unmapped! 0 0 9378
MARINE ACCIDENT Unmapped! 1 2 70991
MICROBURST Unmapped! 0 0 28397
MIXED PRECIP Unmapped! 2 26 0
MIXED PRECIPITATION Unmapped! 0 0 1005462
MUD SLIDE Unmapped! 0 0 122346
MUDSLIDE Unmapped! 4 2 1497051
MUDSLIDES Unmapped! 1 0 0
OTHER Unmapped! 0 4 1505159
RECORD HEAT Unmapped! 2 0 0
ROCK SLIDE Unmapped! 0 0 201517
ROGUE WAVE Unmapped! 0 2 0
ROUGH SEAS Unmapped! 8 5 0
TIDAL FLOODING Unmapped! 0 1 18229
UNSEASONABLY WARM Unmapped! 0 17 13434
WARM WEATHER Unmapped! 0 2 0
WET MICROBURST Unmapped! 0 0 47563
WHIRLWIND Unmapped! 1 0 15444
VOLCANIC ASH Volcanic Ash 0 0 611041
WATERSPOUT Waterspout 2 2 5884591
BRUSH FIRE Wildfire 0 2 0
WILD/FOREST FIRE Wildfire 12 545 4019565607
WILDFIRE Wildfire 75 911 5235566860
WINTER STORM Winter Storm 190 1287 1582133030
LIGHT SNOW Winter Weather 1 2 3298462
LIGHT SNOWFALL Winter Weather 0 0 120685
WINTER WEATHER Winter Weather 33 343 35302095
WINTER WEATHER MIX Winter Weather 0 68 71530
WINTER WEATHER/MIX Winter Weather 28 72 6400452
WINTRY MIX Winter Weather 1 77 16984

Impute remaining unmapped types with given event type in the storm database, so that all rows have content in event.type.

evmap[is.na(evmap$event.type),][,2] <- evmap[is.na(evmap$event.type),][,1]

Finally, aggregate the measures again, now only around the newly mapped event type values.

PrepareResult()
PrepareResultAgr("event.type")

Results

Fatality and injury reasons
Most harmful events with respect to human health between 1996-03-01 and 2011-11-30 are found as shown below. First, fatalities are summed by reason into fat. Event types ordered by their number of fatalities.

fat <- melt(head(result.agr.tot[order(-result.agr.tot$FATALITIES, result.agr.tot$event.type), 1:2], 12),
            id="event.type")
fat$event.type <- factor(fat$event.type, levels=fat[order(fat$value), "event.type"]) 

A simple barplot for the 12 event types causing the most fatalities:

par(mar=c(5.1,10.1,4.1,2.1), las=1) ## Expand left margin and set horiz. axis labels
bp <- barplot(rev(fat$value), 
              main="Fatalities, total, top 12 reasons 1996-2011", 
              names=rev(fat$event.type), 
              col="black",
              border=NA,
              horiz=T, 
              xlim=c(0,max(fat$value)*1.25)
)
text(x=fat$value+100, y=rev(bp), labels=fat$value, cex=0.8)

Similar aggregation for injury reasons:

inj <- melt(head(result.agr.tot[ order(-result.agr.tot$INJURIES, result.agr.tot$event.type), c(1,3)], 12),
            id="event.type")
inj$event.type <- factor(inj$event.type, levels=inj[order(inj$value), "event.type"])

Plotting the 12 event types with the most injuries:

## Plot injuries
par(mar=c(5.1,10.1,4.1,2.1), las=1) 
bp <- barplot(rev(inj$value), 
              main="Injuries, total, top 12 reasons 1996-2011", 
              names=rev(inj$event.type), 
              col="#cc3333",
              border=NA,
              horiz=T, 
              xlim=c(0,max(inj$value)*1.25)
)
text(x=inj$value+1300, y=rev(bp), labels=inj$value, cex=0.8)

The period’s most harmful event types with respect to human health are excessive heat, flash floods and tornadoes ranked descending by number of fatalities. Regarding injuries, tornadoes top that list ahead of thunderstorm winds and floods.

Economic cost reasons

# Greatest economic consequences, total 
econ <- melt(head(result.agr.tot[ order(-result.agr.tot$total.cost, result.agr.tot$event.type), c(1,4)], 12),
             id="event.type")
econ$event.type <- factor(econ$event.type, levels=econ[order(econ$value), "event.type"])

Plotting the 12 most costly event types as ranked by the total of property and crop damage costs.

## Plot economic consequences
par(mar=c(5.1,10.1,4.1,2.1), las=1) 
bp <- barplot(rev(econ$value), 
              main="Economic cost, total, top 12 reasons 1996-2011", 
              names=rev(econ$event.type), 
              col="dodgerblue3",
              border=NA,
              horiz=TRUE, 
              xlim=c(0,max(econ$value)*1.25),
              xlab="US Dollars, billions, 2011 value",
              axes=FALSE
              
)
text(x=econ$value+5.5e09, y=rev(bp), labels=paste("$",round(econ$value/1000000000,1),"B", sep=""), cex=0.8) ##Bar value labels
pts <- pretty(econ$value)
axis(1, at = pts, labels = paste("$",round(pts/1e09,0),"B", sep = ""))

The most economically damaging weather event types in the U.S. 1996-2011 were hurricanes, storm surge/tide and floods/flash floods.


Session information

I work under a different locale/collate, but have developed this using the C default locale. Key system information:

devtools::session_info()
## Session info --------------------------------------------------------------
##  setting  value                       
##  version  R version 3.1.3 (2015-03-09)
##  system   i386, mingw32               
##  ui       RTerm                       
##  language (EN)                        
##  collate  C                           
##  tz       Europe/Berlin
## Packages ------------------------------------------------------------------
##  package    * version date       source        
##  Rcpp       * 0.11.6  2015-05-01 CRAN (R 3.1.3)
##  devtools   * 1.7.0   2015-01-17 CRAN (R 3.1.3)
##  digest     * 0.6.8   2014-12-31 CRAN (R 3.1.3)
##  evaluate   * 0.7     2015-04-21 CRAN (R 3.1.3)
##  formatR    * 1.2     2015-04-21 CRAN (R 3.1.3)
##  highr      * 0.5     2015-04-21 CRAN (R 3.1.3)
##  htmltools  * 0.2.6   2014-09-08 CRAN (R 3.1.3)
##  knitr        1.10.5  2015-05-06 CRAN (R 3.1.3)
##  plyr       * 1.8.2   2015-04-21 CRAN (R 3.1.3)
##  rJava        0.9-6   2013-12-24 CRAN (R 3.1.3)
##  reshape2     1.4.1   2014-12-06 CRAN (R 3.1.3)
##  rmarkdown  * 0.7     2015-06-13 CRAN (R 3.1.3)
##  rstudioapi * 0.3.1   2015-04-07 CRAN (R 3.1.3)
##  stringr    * 0.6.2   2012-12-06 CRAN (R 3.1.3)
##  xlsx         0.5.7   2014-08-02 CRAN (R 3.1.3)
##  xlsxjars     0.6.1   2014-08-22 CRAN (R 3.1.3)
##  yaml       * 2.1.13  2014-06-12 CRAN (R 3.1.3)