NOAA Storm Data: Health vs. Economic Loss by Event Type and Event Type Class

Synopsis

NOAA’s storm data archive at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 is extremely large and messy, a heritage archive from what appear to be many different sources with many of the field reporters untrained or poorly trained in the protocol for reporting weather events, with some observations copied directly from field reports and others apparently transcribed long after the fact.

After the initial data were downloaded, an iteratively improved recoding protocol converted EVTYPE, the event type code, into StandardEventType, a code initially based on NOAA’s 48 categories of weather events, with 9 more types added, and 2 additional types for observations which should be excluded from the analysis either because they were not weather events or because they could not be definitely attributed to any of the 57 standard event types.

57 standard event types was still too large a set to produce a clear answer to the initial questions about the most significant consequences for human health/safety and economic property/production, so the 57 types were organized into 17 “event type classes,” which allowed a quick, robust exploration.

The concept of “harm” also required some definition and assembly. First, the NOAA database included some numbers recorded in nonstandard formats which needed regularization. Next, the variables “Casualties” (death+injury) and TotalEconomicDamage (property loss + crop loss) were created to measure health and economic issues respectively.

Most importantly, because harmful is a multiple-meaning term, more than one definition of “most harmful to health” and “greatest economic harm” was possible; this investigation looked at four of them:
•greatest likelihood that an individual event of given type/class would cause some harm
•total cumulative historical harm done by events of given type/class
•mean harm per individual event of given type/class, given that it did some harm
•estimated probability and maximum harm by individual event of given type/class for ten, one hundred, and one thousand years into the future.

Within some event type classes, the great majority of the harm was done by one single standard event type, whereas in other event type classes many different standard event types contributed significant harms, so data was summarized by either event type class or standard event type, whichever seemed more relevant and informative.

Overall, the analysis yielded these results:
• most weather events are harmless
• high overall historical harms generally associate with:
> •high probability of harm for single incidents
> •high frequency of event types
> •high mean damage per single incident
• some weather events seem to be disproportionately reported mainly when they are harmful
• tornados have historically caused by far the most casualties
• hurricanes, floods, tornados, and storm surges are historically the most economically destructive
• mean casualities for an individual event is worst by far for heat waves
• mean economic damage for an individual event is worst for hurricanes
• the worst of the worst-case scenarios for casualties is freezing fog
• the worst of the worst-case scenarios for economic damage are hurricanes

Housekeeping and downloading

housekeeping; load necessary packages

suppressPackageStartupMessages(library(data.table))
suppressPackageStartupMessages(library(datasets))
suppressPackageStartupMessages(library(plyr))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(GGally))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(xtable))

download, read, and correct StormData

WeatherData<-tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
              WeatherData,
              "curl")
StormData<-data.table()
StormData<-read.csv(bzfile(WeatherData))
##Correct known miscoded entry. Earlier work with data and notes by other analysts
##revealed this one particular entry was off by a factor of 1000. Correction:
StormData[605953,26]<-as.factor("M")

Data Processing

Initial inspection of independent variable EVTYPE

Since the assignment is to analyze the effect of the independent variable “type of event,” coded in the original data as EVTYPE, on variables that measure health outcomes and economic outcomes, the first step was to make sure the dataset was tidy, clearly coded, and useful with respect to that independent variable, either by confirming that EVTYPE could be used as is, or by outlining the transformations needed to derive an accurate, tidy, clearly coded, and useful variable from EVTYPE.

NumberOfEVTYPEs<-length(unique(StormData$EVTYPE)) #how many EVTYPES are there?
EVTYPE_frequencies<-summary(StormData$EVTYPE,maxsum=NumberOfEVTYPEs) #Counts of how often different EVTYPES occur
TypesWithMoreThan.001OfObs<-EVTYPE_frequencies>=(length(StormData$EVTYPE)/1000) #Logic vector designates  EVTYPEs that account for at least 0.1% of obs
CatalogTypesWithMoreThan.001OfObs<-EVTYPE_frequencies[TypesWithMoreThan.001OfObs]
EVTYPECutoffFrequency<-ceiling(length(StormData$EVTYPE)/1000)
#0.1% (or 1/1000) is an arbitrary criterion;
#there  should only be fewer than 100 EVYTPEs according to NOAA documentation,
#so a type that occurs less th an 1/10 as much as that is probably not relevant
#to questions about most damage
#
NumberOfTypesWithMoreThan.001OfObs<-sum(TypesWithMoreThan.001OfObs) #Counts EVTYPEs that account for at least 0.1% of obs
FractionOfTypesWithMoreThan.001OfObs<-NumberOfTypesWithMoreThan.001OfObs/NumberOfEVTYPEs #Proportion of all EVTYPEs that account for at least 0.1% of obs
SingleMostCommonType<-EVTYPE_frequencies[EVTYPE_frequencies==max(EVTYPE_frequencies)] #Identifies EVTYPE accounting for most entries
FractionOfSingleMostCommonType<-SingleMostCommonType/length(StormData$EVTYPE) #Proportion of the single EVTYPE accounting for most entries

Using length(unique(StormData$EVTYPE) to get an idea of the scope of the coding problem revealed that EVTYPE contained 985 different type names, far too many to process meaningfully. Frequency counts revealed that out of those 985, only 31 occurred 902.3 or more times (i.e. represented more than 1/1000 of the total data), which is to say 97% of the possible values of EVTYPE represented less than a thousandth of total observations. Inspection via View(StormData) showed numerous apparent unique codings, variant spellings, and so forth leading to many single- or few- member types.

On the other hand, the single type HAIL, with 288661 observations, represented 32.0% of all entries — almost one third.

The two possible choices here were to throw out all the data for which the EVTYPE was not clearly already standardized, or to recode EVTYPE into a cleaner, better-restricted variable. Recoding seemed like much the better choice, as reading through the EVTYPE variable, nearly all the nonstandard and less-common values clearly could be coded into one of the categories found in MacAloney, B. “Operations and Services Performance, NWSPD 10-16, Storm Data Preparation.” National Weather Service Instruction 10-1605.
Washington: 17 Aug 2007. pp. 18-94. Retrieved from https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf on 12 Nov 2015.

Recoding EVTYPE into StandardEventType

StandardEventType was a new variable intended to combine data from EVTYPE into more manageable and standardized form. Initially it began from the list of 48 event types found in MacAloney, B. “Operations and Services Performance, NWSPD 10-16, Storm Data Preparation.” National Weather Service Instruction 10-1605. Washington: 17 Aug 2007. pp. 18-94. Retrieved from https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

EVTYPE included a very sizable number of nonstandard entries that could have been coded in more than one type. For example, many thunderstorm reports also included mention of hail, lightning, funnel clouds, wind in general, thunderstorm winds in particular, and heavy rain. Furthermore, some types were highly specific, like FLASH FLOOD and COASTAL FLOOD, and others were more generic, like FLOOD. Should the recoded types reflect every category into which an observation might fall?

For this study, each observation is placed in just one event type, because:
• That is all that is needed to answer the analytic questions asked.
• Structures of multiple types are much harder to interpret.

Priorities for recoding were set as:
1. Each observation should be placed in the narrowest available type, e.g. prefer coding StandardEventType as FLASH FLOOD to coding it as FLOOD.
2. Each observation should be coded as a NOAA type if possible, and as one of the types invented for this study only if no NOAA type was applicable,e.g. prefer coding StandardEventType as DENSE FOG to coding it as FOG.
3. Coding must be based strictly on a parsing (with grepl) of EVYPE for keywords, to ensure replicability; e.g. although it’s quite clear to any human reader that EVTYPE = UNSEASONABLE HEAT should not be coded as the invented StandardEventType “Seas” (intended to code EVTYPEs like HEAVY SEAS and ROUGH SEAS), as a naive grepl-based recoding will try to do, this must be accomplished by code, not by human inspection and adjustment. Another example: some midwestern reporters call torndaoes “cyclones” so it was important to check for “tropical cyclone” (and assign it to “Hurricane/Typhoon”) before checking for “cyclone” (which was assigned to “Tornado.”

The priorities were enforced by a procedural method:
1. Keyword tests for narrow types were tested before keyword tests for broader types
2. Keyword tests for NOAA types were tested before keyword tests for created types
3. Both to speed up processing and to enforce Priorities 1 and 2, the first type that could be assigned would be recorded, and evaluation would move immediately to the next observation. This saved many million trips through loops over the course of the whole dataset.

How recoding comparison list was developed

Initially the comparison list for grepl matching was set up from the 48 NOAA categories and their common thesaurus synonyms for each StandardEventType. This first version was able to assign just over 2/3 of all the observations to types; the rest were marked as “Unassigned.”

In the second pass, skimming through the 900+ unassigned EVTYPES in the unique vector quickly showed that there were several recurring themes for which no NOAA type was appropriate, so 5 types were added for those. 4 more types were added as catchalls (e.g. Rain, Wind) to be placed at the bottom of the list. Another type was added for “Nonevent” — rows that were not actually observations of an event, such as monthly summaries.

This second pass left about 300 or so remaining Unassigned EVTYPES in the unique list. Most of them were misspellings, and some were oddly colorful ways of describing things. For the third pass, misspellings (most misspelled word: lightning) and more vivid language (e.g. “torrential cloudburst”) were simply pasted into the comparison list for the appropriate types. This final version of the selection process leaves only a few EVTYPESs with StandardEventType = Unassigned, and most of them were Unassigned because they were genuinely unsassignable based on EVTYPE: partial and poorly written reports.

Given that including poor quality observations was not desirable anyway, and the small number of them left, it was decided to just discard the few observations to which the unassignable EVTYPEs corresponded, and finalize the recoding standardized list and protocol in this code:

#### Recoding EVTYPE into StandardEventType

##Input list for EVTYPES Conversion to StandardEvTypes
StandardizedVectorList<-list(
as.character(c("Microburst","micro-burst","micro burst","micoburst")),
as.character(c("Turbulence")),
as.character(c("Nonevent","summary","monthly","yearly","Summary","RED FLAG CRITERIA")),
as.character(c("Debris Flow","landslide","land slide","land-slide","mudslide","mud slide","mud-slide","rockslide","rock slide","rock-slide","rockfall","rock fall","rock-fall","hillside slump","slope collapse","mud flow","mud-flow","mudflow","MUDSLIDES","Landslump")),
as.character(c("Heavy Snow","snowstorm","snow storm","snow fall","snowfall","excessive snow","snow load","HEAVY SNOW/SLEET")),
as.character(c("Heavy Rain","excessive rain","heavy precipitation","downpour","cloudburst","rainstorm","rain-storm","rain storm","extreme rain","abnormally wet","excessive precipitation","excessive wetness","extremely wet","downburst","HEAVY RAINFALL","Heavy rain","TSTM HEAVY RAIN","HEAVY PRECIPATATION","HEAVY SHOWER","Metro Storm","RECORD PRECIPITATION","torrential")),
as.character(c("Coastal Flood","Coastal")),
as.character(c("Blizzard","blowing snow")),
as.character(c("Ice Storm","freezing rain","freezing sleet","black ice","ice load","freezing drizzle","ICY ROADS","ICE ROADS","PATCHY ICE","ICE ON ROAD")),
as.character(c("Frost/Freeze","frost","freeze")),
as.character(c("Flash Flood","sudden flood","RAPIDLY RISING WATER","FLASH FLOOODING")),
as.character(c("Sleet","ICE PELLETS")),
as.character(c("Funnel Cloud","funnel","WALL CLOUD","gustnado","LANDSPOUT")),
as.character(c("Lightning","lightening","lighning","ligtning","lighting","LIGNTNING")),
as.character(c("Waterspout","water spout","water-spout","wayterspout")),
as.character(c("Hurricane (Typhoon)","hurricane","typhoon","Tropical Cyclone","typhon","REMNANTS OF FLOYD")),
as.character(c("Dense Fog","Fog")),
as.character(c("Drought","no rainfall","dry","lack of precipitation","no precipitation","low moisture","driest","BELOW NORMAL PRECIPITATION","RED FLAG FIRE WX")),
as.character(c("Wildfire","WILD FIRE","forest fire","range fire","prairie fire","brush fire","grass fire","uncontrolled fire","crown fire","scrub fire")),
as.character(c("Winter Storm","HEAVY MIX")),
as.character(c("High Surf","HEAVY SURF","Heavy Surf","ROUGH SURF","HAZARDOUS SURF","ROGUE WAVE")),
as.character(c("Tropical Storm")),
as.character(c("Astronomical Low Tide","astronomical","low water","low tide","low-tide","ebb")),
as.character(c("Excessive Heat","extreme high temp","RECORD HIGH TEMPERATURE","RECORD HIGH","RECORD TEMPERATURE","Temperature record","Record temperature","Record High")),
as.character(c("Extreme Cold/Wind Chill","Extreme Cold","extreme low temp","RECORD SNOW/COLD","RECORD LOW","RECORD COOL")),
as.character(c("Volcanic Ash","Volcan")),
as.character(c("Lake-Effect Snow","lake-effect"," lake effect")),
as.character(c("Dust Storm","blowing dust","blowing sand","dusty wind","windborne dust","wind-borne dust","dirt on wind")),
as.character(c("Rip Current","undertow","undercurrent","rip-tide","tide-rip","cross-current","cross current","rip tide","riptide","crosstide","rip-current")),
as.character(c("Dense Smoke","Smoke")),
as.character(c("Dust Devil","Dustdevil","Dust-devil","dust devel")),
as.character(c("Freezing Fog"," freezing mist","freezing spray","glaze","freezing drizzle")),
as.character(c("Storm Surge/Tide","storm surge","storm-surge","stormsurge","storm tide","storm-tide","stormtide","STORM SURGE/TIDE","BEACH EROSIN","Beach Erosion","BEACH EROSION")),
as.character(c("Avalanche","avalache","snow slide","snowslide","snow-slide","AVALANCE")),
as.character(c("Marine Thunderstorm Wind")),
as.character(c("Tropical Depression")),
as.character(c("Marine Strong Wind")),
as.character(c("Marine High Wind")),
as.character(c("Marine Hail")),
as.character(c("Seiche","standing wave")),
as.character(c("Lakeshore Flood","lake shore","lake-shore","lakeshore")),
as.character(c("Tsunami","tidalwave","TIDAL WAVE","tidal-wave")),
as.character(c("Flood","URBAN/SMALL","URBAN AND SMALL","SMALL STREAM AND","URBAN/SMALL STREAM","HIGH WATER","URBAN SMALL","SMALL STREAM","URBAN/SML STREAM FLD","Sml Stream Fld","URBAN/SML STREAM FLDG","URBAN/SMALL STRM FLDG")),
as.character(c("Thunderstorm Wind","tstm wind","thunder-storm wind","thunderstorm wind","TSTM WIND")),
as.character(c("Heat","heat","hot","high temp","warm","UNSEASONABLY WARM")),
as.character(c("High Wind")),
as.character(c("Hail")),
as.character(c("Cold/Wind Chill","wind chill","cold","hyperthermia","hypothermia","cool spell","low temp","COLD WAVE","UNSEASONABLY COOL")),
as.character(c("Winter Weather","winter","wintry","MIXED PRECIP","Mixed Precipitation","MIXED PRECIPITATION","winter mix")),
as.character(c("Tornado","Tornadoe","TORNDAO","twister","cyclone")),
as.character(c("Strong Wind")),
as.character(c("Wind","WND")),
as.character(c("Snow","FIRST SNOW")),
as.character(c("Thunder-storm","Thunder Storm","thunderstorm","tstm","Tstorm","T-storm")),
as.character(c("Rain","wet")),
as.character(c("Tide")),
as.character(c("Current")),
as.character(c("Seas")),
as.character(c("Swell")),
as.character(c("Waves")),
as.character(c("Unassigned"))
      )

Rather than use the algorithm above to compute each of the 900,000+ EVTYPES into StandardEventType individually, which would have been very slow, I created an assignStandardType function which looks up StandardEventType in a table; this turned out to be surprisingly fast. In the code below, procedural priority #3 is implemented by having assignStandardType assign the FIRST plausible StandardEventType to a value of EVTYPE and then return to the code that called it. EVTYPEs that might fit under more than one category are assigned only to the first workable category assignStandardType finds. E.g. when narrow types like Flash Flood, Heavy Rain, and Extreme Heat are captured and the assignment made, processing moves to the next EVTYPE withouth checking other possibilities; only if the observation cannot fall into any of the narrower types is it assigned to the later, broader ones.

The number of event types listed under StdEvtType had now grown to 57 (plus Unassigned and Nonevent, which would be eliminated and not analyzed further; there were fewer than 150 of either of those left).

For a more wieldy analysis, it seemed a good idea to group the Standard Event Types into a smaller group of Standard Event Type Classes; for example, EventStandardType= Flash Flood, Coastal Flood, etc. were all included in EventStandardTypeClass=Flood; Snow, Blizzard, Ice Storm, etc. in Winter Weather; and so on. An assigning table was created for that purpose as well.

#create and apply assigning tables to classify all possible observations#
#Sets up a function, assignStandardType(EVTYPE), which assigns the 985 different values of EVTYPE into 57 StandardTypes for further analysis.
#Also establishes a Standard Class Table to assign the StandardTypes into an even smaller set of broad categories called StandardTypeClasses. Then
##Creates StdCodeAssigner, an nrow(UsedEVTYPES) by 3 data table with
#     Column1=all EVTYPEs used in the original StormData.
#     Column2=StandardTypes, output of assignStandardType(EVTYPE),
#           sortable, processable, and non-redundant codes that
#           generally follow the structure of NOAA Document's 48 types,
#           plus 9 catch-all types for situations recorded in EVTYPE
#                 that did not correspond to any of the NOAA types,
#           plus "Nonevent" for monthly and annual summaries that are not
#                 weather event reports
#           plus "Unassigned" for the completely uncodable.
#     Column3=StandardTypeClasses. 17 much broader Standard Type
#            Classes (plus Unassigned).
#            These are assigned by applying the Standard Class Table to Column 2.

#     Function assignStandardType takes an EVTYPE and assigns a
#     Standard Type to it, using the Standardized Vector List
#     for reference. IMPORTANT: It assigns the FIRST plausible categorizati
#     on; EVTYPES that might fit under more than one category are assigned
#     solely to the first workable category this function finds.
#
assignStandardType<-function(raweventtype) {
      StdType<-"Unassigned"
      CompareString<-NULL
      for (i in 1:length(StandardizedVectorList)) {
            for (j in 1:length(StandardizedVectorList[[i]])) {
                  CompareString<-StandardizedVectorList[[i]][j]
                        if(grepl(CompareString,raweventtype,ignore.case=TRUE)){
                  StdType<-StandardizedVectorList[[i]][1]}
            if(StdType!="Unassigned") break
                                                            }
      if(StdType!="Unassigned") break

                                                }
      return(StdType)
                                          }

# Create StdCodeAssigner data table and and pre-load it with default values of
#  "Unassigned" for StandardType and StandardTypeClass
UsedEVTYPES<-as.character(unique(StormData$EVTYPE))
StdCodeAssigner<-data.table()
UnassignedLoader<-character(length(UsedEVTYPES))
UnassignedLoader<-"Unassigned"
StdCodeAssigner<-cbind(UsedEVTYPES,UnassignedLoader,UnassignedLoader)
colnames(StdCodeAssigner)<-c("EVTYPE","StandardType","StandardTypeClass")
#Assign Standard Types (Column 2 of StCodeAssigner)
for (m in 1:nrow(StdCodeAssigner)) StdCodeAssigner[m,2]<-assignStandardType(StdCodeAssigner[m,1])

#Create and Load Standard Class Table
#Standard Class Table is based as much as possible on the big, broad,
##generic catch-alls at the end of the Standardized Type Code List.
##It assigns the 57 relatively narrow Standard Types into 17 much
##bigger Standard Type Classes.
      StandardClassTable<-data.table() #creates data table for Type Class assignments

      #pulling standard types out of Standard Type Vector List.
      StandardType<-character(length(StandardizedVectorList))
            for (k in 1:length(StandardizedVectorList)) {
                        StandardType[k]<-StandardizedVectorList[[k]][1]
                        }

      #lining up Types and Type Classes against each other
      StandardTypeClass<-c("Microburst","Turbulence","Nonevent","Slide","Winter weather","Rain","Flood","Winter weather","Winter weather","Cold","Flood","Winter weather","Vortex storms","Thunderstorm","Vortex storms","Hurricane/Tropical Storm","Fog","Drought","Fire","Winter weather","Waves, currents, tides","Hurricane/Tropical Storm","Waves, currents, tides","Heat","Cold","Volcano","Winter weather","Dust storm","Waves, currents, tides","Fire","Vortex storms","Winter weather","Waves, currents, tides","Slide","Thunderstorm","Hurricane/Tropical Storm","Wind","Wind","Hail","Waves, currents, tides","Flood","Waves, currents, tides","Flood","Thunderstorm","Heat","Wind","Hail","Cold","Winter weather","Vortex storms","Wind","Wind","Winter weather","Thunderstorm","Rain","Waves, currents, tides","Waves, currents, tides","Waves, currents, tides","Waves, currents, tides","Waves, currents, tides","Unassigned")
            StandardClassTable<-as.data.table(cbind(StandardType,StandardTypeClass))

#Use Standard Class Table to assign values for
#column 3 of StdCodeAssigner
            for (k in 1:nrow(StdCodeAssigner)) {
                  TypeToLookUp<-StdCodeAssigner[k,2]
                  ClassSelector<-StandardClassTable$StandardType==TypeToLookUp
                  StdCodeAssigner[k,3]<-StandardClassTable$StandardTypeClass[ClassSelector]
            }

Finally, with the necessary functions and vectors created, the two new variables, StandardEventType and StandardEventTypeClass were created and attached to the dataset. Because the analysis would be driven by those types and classes, I created a new dataset called TypedStormData at this point (typed meaning “having been assigned a type,” not “entered with a keyboard.”)

#Using StdCodeAssigner to look up and fill in Standard Event Codes and Standard Event Code Classes in the StormData dataset.
TypedStormData<-data.table()
RawEventType<-StormData$EVTYPE
StandardEventType<-character(length(RawEventType))
StandardEventTypeClass<-character(length(RawEventType))
StandardEventType<-"Not here yet" # convenient check that assignment process is complete
StandardEventTypeClass<-"Not here yet" # convenient check that assignment process is complete
for (i in 1:nrow(StdCodeAssigner)) {
                 EventType<-StdCodeAssigner[i,1]
                 StdEvt<-StdCodeAssigner[i,2]
                 StdClass<-StdCodeAssigner[i,3]
                 Placer<-RawEventType==EventType
                 StandardEventType[Placer]<-StdEvt
                 StandardEventTypeClass[Placer]<-StdClass
                                    }
TypedStormData<-cbind(StormData,StandardEventType,StandardEventTypeClass)  
  
#establishes selector vectors that will be used to cut down the dataset later
HasStandardTypeCode<-TypedStormData$StandardEventType!="Unassigned"
ActuallyAnEvent<-TypedStormData$StandardEventType!="Nonevent"

Regularizing dates and places as identifiers

Regularizing date as a time identifier

Because BGN_DATE is used for all event types as the date on which it either occurred or began, but per page 1 of MacAloney, END_DATE is only used for a few categories of long-duration events and since duration would not matter for the analysis here, I decided to use BGN_DATE as the time identifier, and to convert it to date form.

#convert begin date to use as the chronological identifier
TypedStormData$BGN_DATE<-as.character(TypedStormData$BGN_DATE)
TypedStormData$BGN_DATE<-as.Date(TypedStormData$BGN_DATE,format="%m/%d/%Y")

Regularizing place as an identifier

For convenience in human readability, and with an eye to fewest missing values, I simply chose to assemble variable called Place, whose format was County, State. This still left a fairly large number of observations for which the place identifier was just “,STATE” but as the analysis developed it turned out that location was not important for anything anyway.

#Establish standard format place name as "COUNTY, STATE"
      TypedStormData$Place<-paste(TypedStormData$COUNTYNAME,TypedStormData$STATE,sep=", ")

converting dollar damages to a uniform form

On page 12, MacAloney specifies that dollar damages to crops and property (CROP and PROP) are to be reported as a coefficient in the field PROPEXP or CROPEXP, and that a set of letter codes for exponents for a multiplier will be given in the fields CROPDMGEXP and PROPDMAGEXP; K for thousands, M for millions, and B for billions. Some reporters seem to have improvised H for hundreds; around a third of reporters did not specify an exponent, leaving the -DMGEXP field blank. Instead, they simply put the complete number in the coefficient field. Some possibly confused reporters put in ?, probably meaning that they had no idea, + (perhaps “more than?”) and “-” (because it was a loss? or “somewhat lower than?”).

Most puzzlingly, some of the field reporters entered numbers in the -DMGEXP variables; few of their entries were interpretable, but fortunately there were not many of them. Only 314 PROPDMGEXP were numeric, out of over 900,000 observations. Of these a handful made any sort of sense at all; many of the PROPDMG coefficients were 0, whereas others were in the 10s or 100s). I didn’t find anything in the National Weather Service report to make any more sense out of them, so I discarded them.

It may be significant that all of the numeric exponents originated in 1993-5, suggesting there was an attempt at that time to change the recording system, and apparently it failed.

The recoding situation was even less productive with crop damage. Numeric exponents just didn’t contain any clues as to what they were about. Again, almost all from the 1990s, but there were a much smaller range than the one for property damage. It didn’t seem likely that they were orders of magnitude, either. Therefore, with only 27 of these uninterpretable observations for crop damage it was an easy decision to throw them out.

In converting to standard dollar amounts I found it convenient to place the whole value in a single field.

#recode monetary losses into a uniform format
#system of vectors for interpreting exponent codes
#NULL was very frequently associated with DMG fields where the
#exponent was not used. +, -, 0, and digits were
#used very unreliably; one or two field observers may have been
#using exponent digits as in scientific notation, but it was not easy, perhaps
#not possible, to sort their work out from the people who were entering guesses
#or misinterpretations.
#
#Since the numbers of non-letter, non-blank exponent entries are so small, and
#not concentrated regionally or on particular storm types or classes,
#I just threw away those observations.
#
#Establish vectors for property and crop damage which
# identify all exponent variables that CAN be interpreted,
# #including letter codes, irrelevant(because coefficient=0), and null (because
# damage is given in ordinary not scientific notation).
      # Identify and exponent values in letter codes
PropInHundreds<-TypedStormData$PROPDMGEXP=="h" | TypedStormData$PROPDMGEXP == "H"
CropInHundreds<-TypedStormData$CROPDMGEXP=="h" | TypedStormData$CROPDMGEXP == "H"
PropInThousands<-TypedStormData$PROPDMGEXP=="k" | TypedStormData$PROPDMGEXP == "K"
CropInThousands<-TypedStormData$CROPDMGEXP=="k" | TypedStormData$CROPDMGEXP == "K"
PropInMillions<-TypedStormData$PROPDMGEXP=="m" | TypedStormData$PROPDMGEXP == "M"
CropInMillions<-TypedStormData$CROPDMGEXP=="m" | TypedStormData$CROPDMGEXP == "M"
PropInBillions<-TypedStormData$PROPDMGEXP=="b" | TypedStormData$PROPDMGEXP == "B"
CropInBillions<-TypedStormData$CROPDMGEXP=="b" | TypedStormData$CROPDMGEXP == "B"
      #identify cases where the exponent is irrelevant because zero
PropDmgZero<-TypedStormData$PROPDMG==0
CropDmgZero<-TypedStormData$CROPDMG==0
      # Identify exponent value=0, multiplier=1, because null
PropInOnes<-is.null(TypedStormData$PROPDMGEXP)
CropInOnes<-is.null(TypedStormData$CROPDMGEXP)

#sum the identifying vectors to create a single selector for all the usable values
PropHasAcceptableCodes<-PropInOnes|PropInHundreds|PropInThousands|PropInMillions|PropInBillions|PropDmgZero
CropHasAcceptableCodes<-CropInOnes|CropInHundreds|CropInThousands|CropInMillions|CropInBillions|CropDmgZero

#Data note: NULLS were converted to exponent=0, multiplier=1, because
#it was quite clear that by far the most common reason for omitting
#the exponent was either that there was no damage (coefficient=0) or
#that damage was being reported in numeric rather than scientific
#notation (exponent=1). However a few recorders did not follow NOAA
#coding protocol, entering 0 through 8, question marks, plus signs,
#and minus signs, and examination of the very small fraction of such
#observations showed no consistent way to interpret them.  Since they
#were less than 0.03% of either the crop or property damage cases, it
# was better to just to remove them.

##NULLS were converted to exponent=0, multiplier=1. Wherever recorders did not follow NOAA
##coding protocol by using codes 0 through 8, question marks, and plus and minus signs,
##those observations have been flagged with NA.
TypedStormData$PROPDMGEXP[!PropHasAcceptableCodes]<-NA
TypedStormData$CROPDMGEXP[!CropHasAcceptableCodes]<-NA

                  #PropsHasAcceptableCodes and CropsHasAcceptableCodes
                  # will also be used in the later
                  #module to cut down the dataset to a clean/workable version

# compile exponent code vectors into one exponent vector each for property and crops
PropertyExponentVector<-0*PropInOnes+2*PropInHundreds+3*PropInThousands+6*PropInMillions+9*PropInBillions
CropsExponentVector<-0*CropInOnes+2*CropInHundreds+3*CropInThousands+6*CropInMillions+9*CropInBillions

#convert exponent vectors into multipliers
PropDmgMult<-10^PropertyExponentVector
CropDmgMult<-10^CropsExponentVector

convert health and economic damage into single, uniform numbers

which will be the dependent variables in the analysis

The question did not ask for an analysis of the different kinds of harm; since both property damage and crop damage occur in dollar amounts, it was simple to add them together. Fatalities seem like an occasional extreme outcome of injuries, i.e. sometimes injuries happen to be so severe that people die of them, so it also seemed sensible to combine the two.

TypedStormData$NumericPropertyDamage<-TypedStormData$PROPDMG*PropDmgMult
TypedStormData$NumericCropDamage<-TypedStormData$CROPDMG*CropDmgMult
TypedStormData$TotalEconomicDamage<-TypedStormData$NumericPropertyDamage+TypedStormData$NumericCropDamage
TypedStormData$Casualties<-TypedStormData$INJURIES+TypedStormData$FATALITIES

Set up logs of injuries, fatalities, casualties, property damage, crop damage, and total damage

EDA revealed that statistics of property/crop damage extend across 11 orders of magnitute, impossible to graph.As is quite typical with large dollar losses, I found that logs of the damages were more informative. Because our money is in the decimal system, it was particularly easy to use log10 functions, so that values beginning with 3 corresponded to thousands, those with 6 to ten thousands, and so on.

###   Created log fields for further analysis.
##Log 10 was used for easy comparison of orders of magnitude
#logs of property, crop, and cumulative.

TypedStormData$LOGPROPDMG<-log10(TypedStormData$NumericPropertyDamage+1)
TypedStormData$LOGCROPDMG<-log10(TypedStormData$NumericCropDamage+1)
TypedStormData$LOGTOTALCOST<-log10(TypedStormData$TotalEconomicDamage+1)

#sets up logs of injuries, fatalities, and casualties.
TypedStormData$LogFatalities<-log10(TypedStormData$FATALITIES+1)
TypedStormData$LogInjuries<-log10(TypedStormData$INJURIES+1)
TypedStormData$LogCasualties<-log10(TypedStormData$INJURIES+TypedStormData$FATALITIES+1)

Prepare TypedStormData for Analysis

Two small steps completed data processing and prepared TypedStormData for analysis.

Remove Nonevents and Unassigned Types/Classes

Now that TypedStormData has been created with needed new recoded variables and cleaned, this chunk eliminates the partial and unsuitable observations: the ones that could not be assigned to a standard type, were non-events, or had uncodable property or crop damage.

RetainForAnalysis<-HasStandardTypeCode &
      ActuallyAnEvent &
      PropHasAcceptableCodes &
      CropHasAcceptableCodes
TypedStormData<-TypedStormData[RetainForAnalysis,]

set StandardEventType and StandardEventTypeClass as factors

Up till now it was more convenient to work with them as character variables but the analytic functions need them to be factors.

TypedStormData$StandardEventType<-as.factor(TypedStormData$StandardEventType)
TypedStormData$StandardEventTypeClass<-as.factor(TypedStormData$StandardEventTypeClass)

Analysis

Defining “most harmful” and “greatest economic consequences.”

There are four possible meanings of the questions as asked in the assignment, and different statistics are needed to answer each. This seems like a pretty good mirror of the situation actual policy-makers face, because any of the meanings might be the most relevant one for a particular situation, and because there are ineveitably tradeoffs between one kind of “harmful” and another (as well as between physical harms to people and harms to the economy).

The four possible meanings can be considered as rephrasings of the questions from the assignment.

What is the likelihood that an event of a given type (or class) will cause any health or economic harm?
Cumulatively and historically, which event types (and classes) have caused the most overall health and economic harm?
If an event is harmful, given its type (or class), what is the mean health or economic harm an individual event does?
What is the maximum health or economic harm an event of a given type (or class) is likely to cause within a specified future?

Any of these interpretations may be of interest to legislators, administrators, or other policy makers. If the great majority of a weather event cause no harm at all, or historically the total damage tends to be small in aggregate, or any one event is either small on average or has little potential to be really big, it can be accorded lower priority. But if nearly every time an event happens it hurts someone or something, or if over a long period of time its combination of high frequency and moderate harms means that the toll of victims or losses mounts up, or if there is a high mean harm from the harmful ones, or if the theoretical maximum is very large, those weather events may need addressing. The decision about which of these criteria should be used to make policy is properly the realm of the policy-makers, and so I present answers for all four interpretations.

“no hurt no foul” — In most weather events, no harm is done

Since this report is supposed to be about harms, the first definition of the question, as noted above, is whether a weather event is likely to do any harm at all.

#marking events as harmless or harmful
TypedStormData$NoHarm<-TypedStormData$Casualties==0 &
                  TypedStormData$TotalEconomicDamage==0
TypedStormData$PersonalHarm<-TypedStormData$Casualties>0
TypedStormData$EconomicHarm<-TypedStormData$TotalEconomicDamage>0
TypedStormData$BothHarm<-TypedStormData$Casualties>0 &
                        TypedStormData$TotalEconomicDamage>0
TypedStormData$Harmful[TypedStormData$NoHarm]<-"Harmless"
TypedStormData$Harmful[TypedStormData$PersonalHarm]<-"Casualties"
TypedStormData$Harmful[TypedStormData$EconomicHarm]<-"Economic loss"
TypedStormData$Harmful[TypedStormData$BothHarm]<-"Casualties and economic loss"
TypedStormData$Harmful<-as.factor(TypedStormData$Harmful)
HarmlessEvents<-sum(TypedStormData$NoHarm)
PercentHarmless<-100*HarmlessEvents/nrow(TypedStormData)
HarmlessnessBreakout<-as.matrix(summary(TypedStormData$Harmful),ncol=1)
HarmlessnessPercent<-100*round(HarmlessnessBreakout[,1]/sum(HarmlessnessBreakout[,1]),digits=3)
HarmlessnessBreakout<-cbind(HarmlessnessBreakout,HarmlessnessPercent)
colnames(HarmlessnessBreakout)<-c("Frequency in Dataset","Percent of Dataset")
Table1<-xtable(HarmlessnessBreakout, caption = "Frequency of Kinds of Harm Without Regard for Severity")
print(Table1,type="html",caption.placement="top")

Frequency of Kinds of Harm Without Regard for Severity
	Frequency in Dataset	Percent of Dataset
Casualties	9593.00	1.10
Casualties and economic loss	12313.00	1.40
Economic loss	232322.00	25.80
Harmless	647455.00	71.80

Of the 901683 events in TypedStormData, 647455 involved no harm to people or property — 71.8% of the events in the database. Also, economic loss occurs about an order of magnitude more frequently than casualties do.

This may be because weather events that do not cause any harm are much more likely to go unreported in some types and classes than in others. Also, some classes contain several strongly harmful types (for example, Floods or Hurricane/Tropical Storms; in other classes (for example, Waves, currents, tides or Vortex storms) one single type does nearly all the damage. All this can be seen in detail in Figure 1.

Figure 1: Proportions of Harmless (Blue), Casualties Only (Red), Property Loss Only (Yellow), or Casualty-and-Property-Loss (Orange) for Event Classes, broken out by Event Type

EventClassSet<-data.table()
EventClassesToShow<-c("Drought","Fog","Fire","Heat","Slide", #graphs can be narrow, little detail
                      "Rain","Hail", "Hurricane/Tropical Storm","Cold","Thunderstorm", 
                      "Vortex storms","Winter weather","Waves, currents, tides",
                      "Flood","Wind" #graphs must be wide, much detail
                      )
                        ##Cut out: Turbulence, Volcanic Ash, Microbursts, Dust Storms.
                        ## All cut out classes had < 450 cases, i.e. less than 1/2000 of the total.
NumClasses<-length(EventClassesToShow)
layout(matrix(c(1:15), nrow=5, ncol=3, byrow = FALSE),c(widths=1,2,3),heights=1)
for (e in 1:NumClasses) {
EventClassSet<-TypedStormData[TypedStormData$StandardEventTypeClass==EventClassesToShow[e],]
HarmsSummaryMatrix<-xtabs(~Harmful+StandardEventType,data=EventClassSet,drop.unused.levels = TRUE)
ClassBreakoutPlot<-barplot(HarmsSummaryMatrix,beside=TRUE,col=c("red","orange","yellow","blue"))
MainTitle<-paste("Class=",EventClassesToShow[e]," by Event Type & Harm",sep="")
XLabel<-paste("Standardized Event Types Included In ",EventClassesToShow[e],sep="")
YLabel<-"Number of events 1950-2011"
title(main = MainTitle, xlab = XLabel, ylab = YLabel)
ClassBreakoutPlot
}

Observation: The blue (harmless) bar is nearly always the tallest by far; the exceptions are for the classes “Hurricane/Tropical Storm” and “Vortex storm”, and the types “Avalanche” and “Rip Current.” For those types in particular I suspect that the event is only rarely reported if no one is hurt and no property is lost. At a guess, this might also be true for thunderstorms.

Historical cumulative damage and mean damage per damaging event; an interesting comparison

mow down dataset to just observations involving harm

The probability of doing harm for different event classes and types, as figured above, is only one way of answering the question “Which types of weather events cause the greatest health and economic harms?” Two other ways are linked in an interesting way mathematically, and we can learn more by comparing them than by considering them separately. These are questions 2 and 3 from above:
2. the sum of all historical damage per event class, and
3. mean damage per reported harmful incident.

The sum of all historical damage per event class simply says, in the whole 1950-2011 period of the study, which classes of events hurt/kill the most people, and which cost the most dollars, in total and overall? Technically this can be figured either on the whole database or on just the incidents with harm (since incidents with harm would only contribute zeros to the total). It is a very straightforward measure of what the real-life history shows.

Mean damage per reported harmful incident is a much trickier statistic. Since most incidents of most weather events do not result in any harm, most of the observations in the less than one third of all observations that result in harm are to some extent anomalous, just by being there in the first place. Secondly, judging by the structures observable in Figure 1, it seems very likely that some classes of incidents are reported only when there’s actual harm (Turbulence and Volcano, for example); some are reported sporadically when there’s no harm, but fairly heavily when there is (Slides, Thunderstorms, and Waves, currents, and tides would be examples of this); and still others almost every time they occur, whether there is harm or not (Hurricanes/Tropical Storms and Heat). There are even cases where the reporting seems to be very different from type to type within a class (in the class Vortex Storms, it looks like almost all tornadoes and waterspouts are reported, funnel clouds are reported often but with a strong bias toward harm, and dust devils only when there’s harm). Thus to discuss how much harm an incident of a given type does (if it does any at all),the best basis of comparison is to use only the cases where harm is reported; otherwise most analyses and results will be deeply influenced by the systematic patterns in the proportion of reported harmless cases.

Since sum and mean are going to be closely compared in this analysis, it is preferable for them to be from exactly the same data. Using only the observations with reported harms should make no difference in sums, but as noted above it will make a huge difference in means. Therefore, the dataset is reduced to just the cases with actual harms before sum and mean are calculated and analyzed.

## Sets a contingency: from here on we only look at observations where harm occurred
HarmsOnlyDataset<-data.table()
HarmsOnlyDataset<-TypedStormData[!TypedStormData$NoHarm,]

Compute log10s of sums and means and store in HarmsByEventClass dataset

The sum of casualties stretches across four orders of magnitude (almost five); the sum of economic damage stretches across ten (almost eleven). For the graph to be readable, it’s highly desirable to use the logs of the sums. Mean economic losses also extend across seven, almost eight, orders of magnitude, so it makes sense to report that statistic as a log as well. Although mean casualties by event class would fit fairly well onto a single graph as an unprocessed ordinary number, since it will be very useful to compare it both to mean economic damage and to cumulative casualties, it is also computed on a log scale.

The easiest log scale for comparison of money is log10, where 0-3 is less than a thousand, 3-6 is thousands, 6-9 millions, and 9-12 billions. For casualties, one might imagine that as a headline: a first digit of 0’s headline would be “Local Family Hurt in Car Wreck, Weather Blamed.” A first digit of 1’s headline might be “Dozens” (or “Scores”) Hurt By Weather Incident in (Regional City)." A first digit of 2 would prompt “Hundreds Hurt and Killed by Weather Incident in (National City).” And a first digit of 3 would trigger a headline like “(World City) Hit by Weather Incident, Thousands Hurt, Many Feared Dead.” Thus, log10 has a huge advantage in interpretability compared to natural or binary log scales: it is an easily interpretable indicator of what media attention is likely to be, from 0 indicating a minor local story and 3 or higher a major global story. For many policy makers, this is extremely important.

The sum of logs is not equal to the log of the sum, nor the mean of logs to the log of the means. In both cases it is the latter we want, so two functions were created to ensure getting the order of precedence right for every calculation, and then the data on harmful events were grouped and the log10(sum(EventClass)) and log10(mean(EventClass)) were figured for all event classes, for both Casualties and Total Economic Damage.

## Set up two needed functions
LogTenOfSum<-function(x) log10(sum(x)+1)
LogTenOfMean<-function(x) log10(mean(x)+1)
## group HarmsOnlyDataset as needed
HarmsOnlyDataset<-HarmsOnlyDataset  %>% group_by(StandardEventTypeClass)
#set up data table for log10sums and log10means of harms
HarmsByEventClass<-data.table()
HarmsByEventClass<-HarmsOnlyDataset  %>%  summarise_each(funs(LogTenOfSum,
                                                              LogTenOfMean),
                                                         Casualties,
                                                         TotalEconomicDamage)

Figure 2: Plots of Log10Sum and Log10Mean of Casualties (Red) and Economic Loss (Yellow) by Standard Event Type Class

StatsToShow<-c("Log10 of Sum of Casualties",
               "Log10 of Sum of Economic Damage",
               "Log10 of Mean of Casualties per Harmful Incident",
               "Log10 of Mean of Economic Damage per Harmful Incident"
                  )
ColorRotation<-c("red","yellow","red","yellow")
NumStats<-length(StatsToShow)

layout(matrix(c(1:4), nrow=4, ncol=1, byrow = FALSE),c(widths=c(1,1,1,1)),heights=c(2,3,2,3))
for (f in 1:NumStats) {
      g<-f+1
      EventClassHarmsOnly<-as.matrix(HarmsByEventClass[,g],"numeric",drop.unused.levels = TRUE)
      EventClassHarmsOnlyPlot<-barplot(EventClassHarmsOnly,beside=TRUE,col=ColorRotation[f],names.arg=HarmsByEventClass$StandardEventTypeClass,horiz=FALSE)
      MainTitle<-paste(StatsToShow[f]," by Event Type Class",sep="")
      XLabel<-"Standardized Event Type Classes"
      YLabel<-StatsToShow[f]
      title(main = MainTitle, xlab = XLabel, ylab = YLabel)
      EventClassHarmsOnlyPlot
}

Features of interest, cumulative of all incidents v. mean per reported single incident

There are many features of interest in the above graph. In general the line of the bars for sums follows the corresponding line of bars for means, indicating that weather events that tend to hurt a large number of people in any one incident are also weather events that tend to hurt many people in aggregate over time. But there are some notable divergences:

Health — low mean high sum Vortex storms (harms, per Figure 1, are almost entirely from tornadoes), Flood, Hail, Winter Weather, and Wind have low mean casualties but high total casualties. This suggests that although any one incident is likely to harm few or no people, there are so many incidents that they are a major contributor to overall health harm. Hail and tornadoes, of course, are among the most frequent of all event types and classes.
Health — high mean low sum Conversely, Dust Storms, Fog, and Heat all have relatively high mean casualties (so that individual incidents inflict greater health harm), but low cumulative casualties. This may be due to the wide-area nature of the threat; most of the time fog just slows traffic, but when a wide area has thick fog, there are numerous accidents. Usually heat waves are just uncomfortable but when they reach the point of making people sick or killing them, they can do so over a large part of the continent. Dust storms are uncommon nowadays, but they are wide area phenomena when they do happen, and perhaps are either very apt to cause respiratory distress or strongly promote accidents.
Economic damage- high mean low sum The Log10 mean graph shows that Turbulence, Volcano, Microburst, and Fog incidents all tend to be quite expensive, but the cumulative economic loss (Log10 sum graph) for all those classes is quite low. This suggests that rare events that are highly destructive contribute a relatively small proportion to overall losses (though they probably receive much more media attention!)
Economic damage —- low mean high sum On the Log10 mean graph it can be seen that Flood, Vortex Storm, and especially Thunderstorm tend to do cheap or no damage, yet on the Log10 sum graph it is clear that they are cumulatively very costly. This is probably a matter of frequency in part, but it is also interesting that each of those classes include some highly populated event types, like funnel cloud, dust devil, coastal flood, general flood, lightning, and general thunderstorm that are very low mean cost, and one or two types like tornado, flash flood, and thunderstorm winds that are extremely high cost per incident. These results may thus be an artifact of my classification system.

The fourth question: how bad could the worst be?

Planning and policy sometimes needs to consider the worst-case scenario. One familiar way of doing this for floods is to describe various magnitudes of floods as “hundred year” , “500 year”, or more generally “n year” in which n=the interval of years within which one flood of that magnitude could be expected to occur. There is an excellent discussion of what this means statistically at Perlman, Howard. “Floods: Recurrence intervals and 100-yearfloods.” USGS: Washington 2015. Retrieved from http://water.usgs.gov/edu/100yearflood.html on 15 Nov 2015.

The calculation method I have preferred to use is the following: the mean recurrence of a given weather event type per year is multiplied by the number of years in the period n, which results in W, an estimated number weather events of that type to be expected in the period. The probability of any one weather event of that type being the worst one in the period is obviously 1/W. Since we know the mean and standard deviation for severity from historical data, and p=1/W, we then simply compute the quantile for 1-p/W, and that will be the largest weather event of that type that can be expected within the interval.

Exploratory data analysis quickly showed that:
1. For nearly all reasonably frequent weather event types, log10 of severity (measured in casualties or in total economic damage) followed a normal distribution, which gave a very satisfactory and easy to interpret mean and standard deviation.
2. Event type consistently gave much better results than Event Type Class in this case, probably because within any given Event Type Class there are usually a diversity of Event Types with drastically different distributions of severity.
3. Event types with relatively low populations (under 1000 in frequency) often had very skewed distributions, as did event types with relatively low rates of harm, so it was decided that a few of the most interesting event types would be chosen. This led to creating Top 10 lists, which in turn proved to be an excellent way to report statistics in other categories (see Results, below).
4. Following the overall pattern of using powers of ten throughout this report, I decided to calculate severities for 10-, 100-, and 1000-year weather events of each type.

The last of these decisions has a natural interpretability for a policy-maker: a 10-year Weather Event occurs several times in an employee’s career, a 100-year not usually more than once, and a 1000-year may be taken to be “the very worst that could happen if you’re really unlucky.” Assuming career bureaucrats work (or pretend to) for 50 years, a 1000-year weather event is one that has a 95% chance of not occuring on the individual bureaucrat’s watch. The thousand-year weather event thus stands in very well as the “worst of the worst case scenarios”. A policy maker reviewing the results could thus interpret the three bars in each graph from left to right as, “An estimate of what the worst 2-5 you are certain to see might be like; an estimate of the worst that you are likely to see; and an estimate of the very worst you could see if you and the people you serve are very unlucky.”

10-, 100-, and 1000-year severities for selected weather event types of interest are presented in Figure 3.

Since Casualties for these graphs fall into a quite narrow range between 10 and 500, I chose to represent them as direct values rather than logs. Economic damage on the other hand occurred across an enormous range from millions to multiple trillions of dollars, and so it was left on a log scale. For both Casualties and Total Economic Damage, because low levels of harms were by definition irrelevant to what was being investigated, lower limits were set above 0 (10 for Casualties and $1 million for economic losses). A common upper limit was specified for graphs of each kind to facilitate recognition of just how differen the scales involved really are.

Figure 3 10-, 100-, and 1000- year Worst Case Scenarios for Casualties and Economic Damage

HarmsOnlyDataset<-HarmsOnlyDataset %>% group_by(StandardEventType) #Regroup By Event Type

#New DF for computing and storing Max Losses
MaxLosses<-data.table()
MaxLosses<-HarmsOnlyDataset %>%summarise_each(funs(mean,sd),LogCasualties,LOGTOTALCOST)
#Compute and attach incident counts
NumberHarmfulPerType<-xtabs(~HarmsOnlyDataset$StandardEventType,drop.unused.levels = TRUE)
MaxLosses$IncidentCount<-NumberHarmfulPerType

#convenient function for computing days from earliest to latest report of a Standard Event Type
dayRange<-function(x) 1+max(x)-min(x)

#Compute and attach day ranges for each Standard Event Type
HarmsOnlyDayRanges<-data.table()
HarmsOnlyDayRanges<-HarmsOnlyDataset %>%summarise_each(funs(dayRange),BGN_DATE)
MaxLosses$DayRangeOfIncidents<-as.numeric(HarmsOnlyDayRanges$BGN_DATE)

#Compute and attach incidents per day ratio
MaxLosses$IncidentsPerDay<-MaxLosses$IncidentCount/MaxLosses$DayRangeOfIncidents

#Three useful coefficients for further data manipulation
DaysPerTenYears<-as.numeric(dayRange(as.Date(c("2016/1/1","2026/1/1"))))
DaysPerHundredYears<-as.numeric(dayRange(as.Date(c("2016/1/1","2116/1/1"))))
DaysPerThousandYears<-as.numeric(dayRange(as.Date(c("2016/1/1","3016/1/1"))))

#Compute p's for quantile estimates
MaxLosses$TenYearP<-1/(MaxLosses$IncidentsPerDay*DaysPerTenYears)
MaxLosses$HundredYearP<-1/(MaxLosses$IncidentsPerDay*DaysPerHundredYears)
MaxLosses$ThousandYearP<-1/(MaxLosses$IncidentsPerDay*DaysPerThousandYears)

#Compute ten, hundred, and thousand year max quantile estimates for Log 10 of Casualties
#and Total Economic Damage
MaxLosses$MaxTenYrCasualties<-round(10^(qnorm(p=1-MaxLosses$TenYearP,
                                       mean=MaxLosses$LogCasualties_mean,
                                       sd=MaxLosses$LogCasualties_sd,
                                       lower.tail=TRUE,
                                       log.p=FALSE)))
MaxLosses$MaxHundredYrCasualties<-round(10^(qnorm(p=1-MaxLosses$HundredYearP,
                                    mean=MaxLosses$LogCasualties_mean,
                                    sd=MaxLosses$LogCasualties_sd,
                                    lower.tail=TRUE,
                                    log.p=FALSE)))
MaxLosses$MaxThousandYrCasualties<-round(10^(qnorm(p=1-MaxLosses$ThousandYearP,
                                    mean=MaxLosses$LogCasualties_mean,
                                    sd=MaxLosses$LogCasualties_sd,
                                    lower.tail=TRUE,
                                    log.p=FALSE)))
MaxLosses$MaxTenYrEconomicLoss<-qnorm(p=1-MaxLosses$TenYearP,
                                      mean=MaxLosses$LOGTOTALCOST_mean,
                                      sd=MaxLosses$LOGTOTALCOST_sd,
                                      lower.tail=TRUE,
                                      log.p=FALSE)
MaxLosses$MaxHundredYrEconomicLoss<-qnorm(p=1-MaxLosses$HundredYearP,
                                          mean=MaxLosses$LOGTOTALCOST_mean,
                                          sd=MaxLosses$LOGTOTALCOST_sd,
                                          lower.tail=TRUE,
                                          log.p=FALSE)
MaxLosses$MaxThousandYrEconomicLoss<-qnorm(p=1-MaxLosses$ThousandYearP,
                                           mean=MaxLosses$LOGTOTALCOST_mean,
                                           sd=MaxLosses$LOGTOTALCOST_sd,
                                           lower.tail=TRUE,
                                           log.p=FALSE)

##Set up selector vectors for Figure 3
WorstTypesHealth <- c("Freezing Fog", "Excessive Heat", "Heat", "Dense Fog", "Dust Storm", "Tsunami", "Tornado", "Hurricane (Typhoon)", "Blizzard", "Wildfire")
WorstTypesEconomic <- c("Hurricane (Typhoon)", "Sleet", "Drought", "Extreme Cold/Wind Chill", "Wildfire", "Blizzard", "Lightning", "Dense Fog", "Ice Storm", "High Surf")

NumHealthTypes<-length(WorstTypesHealth)
NumEconTypes<-length(WorstTypesEconomic)
layout(matrix(c(1:20), nrow=5, ncol=4, byrow = FALSE))
for (h in 1:NumHealthTypes) {
      EventTypeMatrix<-as.matrix(MaxLosses[MaxLosses$StandardEventType==WorstTypesHealth[h],12:14])
      colnames(EventTypeMatrix)<-c("10 yr","100 yr","1000 yr")
      WorstHealthPlot<-barplot(EventTypeMatrix,
                               beside=TRUE,
                               col=c("lightpink","pink","red"),
                               ylim=c(10,500))
      MainTitle<-paste("WCS Casualties ",WorstTypesHealth[h],sep="")
      XLabel<-"Worst Case Scenarios"
      YLabel<-"Casualties in Single Worst Incident"
      title(main = MainTitle, xlab = XLabel, ylab = YLabel)
      WorstHealthPlot
}
for (i in 1:NumEconTypes) {
      EventTypeMatrix<-as.matrix(MaxLosses[MaxLosses$StandardEventType==WorstTypesEconomic[i],15:17])
      colnames(EventTypeMatrix)<-c("10 yr","100 yr","1000 yr")
      WorstEconPlot<-barplot(EventTypeMatrix,
                             beside=TRUE,
                             col=c("lightyellow","yellow","yellowgreen"),
                             ylim=c(6,14))
      MainTitle<-paste("WCS $ Loss ",WorstTypesEconomic[i],sep="")
      XLabel<-"Worst Case Scenarios"
      YLabel<-"Log 10 of Total Economic Loss in Single Worst Incident"
      title(main = MainTitle, xlab = XLabel, ylab = YLabel)
      WorstEconPlot
}

In interpreting the graphs,
a steep rise across the three bars indicates a substantial risk of being surprised by a difference in size so large that it becomes a difference in kind. These, in other words, are the kind of events where a seemingly unprecedented-in-scale event could occur. It would be highly advisable to insure or reinsure against these “black swan prone” weather event types. Among such types are Tsunami, Excessive Heat, Hail, and Dust Storm for casualties, and High Wind, Extreme Cold, and Blizzard for total economic damage.
a gentle rise across the three bars indicates a more regular and predictable behavior, though it may still be very costly. Such issues might well be addressed by careful budgeting and making sure that either a reserve fund or a low-cost source of loans is available; they can be budgeted and planned for much more easily. Among such types are Among such types are Dense Fog, Hurricane, and Blizzard for casualties, and Ice Storms and Lightning for total economic damage.

Results

most weather events are harmless

As found above in the Analysis, 71.8% of the observed events in the cleaned database result in neither injury nor dollar loss. The big hidden story here is the lack of story: most of the time that the weather is bad, it does nothing bad.

some weather events seem to be disproportionately reported mainly when they are harmful

This can be readily seen in Figure 1 of the analysis.

historically, high levels of harms can occur for multiple reasons

Explorations of Figures 1 and 2 in the analysis revealed that there are three ways in which a weather event type, or type class, becomes particularly deadly or destructive in its cumulative effects over decades:
1. Individual occurrences may be extremely likely to cause harm, so that even a few of them are enough to result in serious danger or cost.
2. Events with relatively low probability of harm may occur so frequently that the cumulative harm mounts up to high levels.
3. Even if events of a given type or class are low-probability and there are many that are harmless, if the event is for some reason highly destructive when it does occur and cause harm (high mean damage per single harmful incident) a small number of the events may mark the whole event type class as dangerous or expensive.

Of course, if those three reasons occur in combinations instead of singly — for example, heat waves have a high probability of harm, occur frequently, and can cause large numbers of deaths over a wide area — then a type or class becomes a highly significant threat.

By the numbers: top ten lists for different definitions of “Worst”

tornados have historically caused by far the most casualties

#Ten Event Classes Which Have Cumulatively Hurt the Most People, 1950-2011
CumMostPeopleHurt<-data.frame()
CumMostPeopleHurt<-HarmsByEventClass
CumMostPeopleHurt$CumulativeCasualties<-10^(HarmsByEventClass$Casualties_LogTenOfSum)
CumMostPeopleHurt<-CumMostPeopleHurt[,c(1,6)]
CumMostPeopleHurt<-arrange(CumMostPeopleHurt,desc(CumulativeCasualties))
colnames(CumMostPeopleHurt)<-c("Standard Event Type Class","Total Casualties 1950-2011") 
CumMostPeopleHurt<-CumMostPeopleHurt[1:10,]
HistoricalCasualtiesT10<-xtable(CumMostPeopleHurt,caption = "Ten Event Classes Accounting for Most Total Casualties, 1950-2010") 
print(HistoricalCasualtiesT10, type="html", caption.placement="top")

Ten Event Classes Accounting for Most Total Casualties, 1950-2010
	Standard Event Type Class	Total Casualties 1950-2011
1	Vortex storms	97097.00
2	Thunderstorm	16333.00
3	Heat	12372.00
4	Flood	10244.00
5	Winter weather	6872.00
6	Wind	2418.00
7	Hurricane/Tropical Storm	1918.00
8	Waves, currents, tides	1776.00
9	Fire	1699.00
10	Hail	1386.00

Note in the table above that casualties from the class Vortex storms are nearly six times those of the nearest compeititor, Thunderstorm, and more than all the rest combined. As a hypothesis: tornados are frequent through much of the continental United States. Most happen on empty land. However, when one does happen in a populated area, harm is nearly inevitable. The number of tornadoes is so large that it drives this statistic upward, even though any individual tornado is not especially likely to do damage.

hurricanes, floods, tornados, and storm surges are historically the most economically destructive

#Ten Event Classes Which Have Cumulatively Done The Most Dollar Damage, 1950-2011
CumMostDollarDamage<-data.frame()
CumMostDollarDamage<-HarmsByEventClass
CumMostDollarDamage$CumulativeDollarDamage<-HarmsByEventClass$TotalEconomicDamage_LogTenOfSum
CumMostDollarDamage<-CumMostDollarDamage[,c(1,6)]
CumMostDollarDamage<-arrange(CumMostDollarDamage,desc(CumulativeDollarDamage))
colnames(CumMostDollarDamage)<-c("Standard Event Type Class","Log 10 of Cumulative Dollar Damage 1950-2011")
CumMostDollarDamage<-CumMostDollarDamage[1:10,]
HistoricalCumCostsT10<-xtable(CumMostDollarDamage, caption = "Top 10 Event Classes By Cumulative Economic Damage 1950-2011, log 10 dollars") 
print(HistoricalCumCostsT10, type="html", caption.placement="top")

Top 10 Event Classes By Cumulative Economic Damage 1950-2011, log 10 dollars
	Standard Event Type Class	Log 10 of Cumulative Dollar Damage 1950-2011
1	Hurricane/Tropical Storm	11.00
2	Flood	10.81
3	Vortex storms	10.76
4	Waves, currents, tides	10.68
5	Hail	10.28
6	Winter weather	10.25
7	Drought	10.18
8	Thunderstorm	10.17
9	Fire	9.95
10	Wind	9.84

Possible explanations: The two most destructive classes are ones that are capable of destroying or damaging large fixed structures (i.e. expensive and hard to move objects) and naturally impact wide areas. The next two are slso capable of destruction of large fixed structures, but strike narrower areas. Finally, Hail, Winter weather, and Drought all occur across wide areas but are much more apt to damage crops than factories, warehouses, and other expensive structures, and crops have a much lower dollar value per square foot, so they’re somewhat farther down the scale.

mean casualities for an individual event is worst by far for heat waves

#Ten Event Classes in which Mean Single-Incident Casualties are Highest, 1950-2011
MeanPeopleHurt<-data.frame()
MeanPeopleHurt<-HarmsByEventClass
MeanPeopleHurt$MeanCasualties<-10^(HarmsByEventClass$Casualties_LogTenOfMean)
MeanPeopleHurt<-MeanPeopleHurt[,c(1,6)]
MeanPeopleHurt<-arrange(MeanPeopleHurt,desc(MeanCasualties))
colnames(MeanPeopleHurt)<-c("Standard Event Type Class","Mean Number of People Hurt")
MeanPeopleHurt<-MeanPeopleHurt[1:10,] 
MeanCasualtiesPerHarmfulIncidentT10<-xtable(MeanPeopleHurt,caption = "Top 10 Mean Casualties in a Single Event")
print(MeanCasualtiesPerHarmfulIncidentT10, type="html", caption.placement="top")

Top 10 Mean Casualties in a Single Event
	Standard Event Type Class	Mean Number of People Hurt
1	Heat	13.66
2	Fog	7.13
3	Dust storm	5.40
4	Hurricane/Tropical Storm	3.78
5	Vortex storms	3.42
6	Waves, currents, tides	2.55
7	Winter weather	2.36
8	Fire	2.35
9	Cold	2.24
10	Slide	2.02

There’s a big gap between heat waves and everything else; is that, perhaps, because while people can avoid driving and stay where they are to avoid fog and dust storm dangers, people in a heat wave have many fewer options for avoiding its negative consequences?

mean economic damage for an individual event is worst for hurricanes

MeanDollarDamage<-data.frame()
MeanDollarDamage<-HarmsByEventClass
MeanDollarDamage$MeanDollarDamage<-HarmsByEventClass$TotalEconomicDamage_LogTenOfMean
MeanDollarDamage<-MeanDollarDamage[,c(1,6)]
MeanDollarDamage<-arrange(MeanDollarDamage,desc(MeanDollarDamage))
colnames(MeanDollarDamage)<-c("Standard Event Type Class","Mean Damage in Single Event, Log 10 $") 
MeanDollarDamage<-MeanDollarDamage[1:10,]
MeanCostsPerHarmfulIncidentT10<-xtable(MeanDollarDamage, caption = "Top 10 Event Classes for Mean Economic Damage in a Single Event, log 10 dollars")
print(MeanCostsPerHarmfulIncidentT10, type="html", caption.placement="top")

Top 10 Event Classes for Mean Economic Damage in a Single Event, log 10 dollars
	Standard Event Type Class	Mean Damage in Single Event, Log 10 $
1	Hurricane/Tropical Storm	8.16
2	Drought	7.73
3	Waves, currents, tides	7.63
4	Fire	6.85
5	Cold	6.75
6	Rain	6.56
7	Winter weather	6.54
8	Flood	6.29
9	Vortex storms	6.16
10	Heat	5.98

Mean damage per single harmful event tracks total historical damage fairly well. (Note that zero-harm observations were removed for this part of the study. There are, however, some exceptions; Vortex storms, despite high mean damage for harmful events, do somewhat less damage in aggregate because they hit small areas, even though they are very frequent, and random small areas are not as likely to contain valuable structures and objects. Conversely, wildfires have a fairly high mean damage (when a fire burns into economically valuable areas it destroys many valuable things) but probably because they can often be contained, and certainly because they are rarer, they play much less of a role in overall economic damage. Hurricanes, droughts, both of which cause unavoidable damage over wide areas, are at the top of the list; the strong destructiveness of the Waves, currents, and tides class is almost entirely storm surges (see Figure 1 in the Analysis). Also note that because this is a logarithmic scale, the largest gap by far is between Waves, currents, tides and Fire.

the worst of the worst-case scenarios for casualties is freezing fog

##Ten Event Types with Worst Possible Single-Incident Health Outcomes
ThousandYearPeopleHurt<-data.frame()
ThousandYearPeopleHurt<-MaxLosses[,c(1,14)]
ThousandYearPeopleHurt<-arrange(ThousandYearPeopleHurt,desc(MaxThousandYrCasualties))
colnames(ThousandYearPeopleHurt)<-c("Standard Event Type","Expected Maximum Single Incident Casualties in 1000 Years")
ThousandYearPeopleHurt<-ThousandYearPeopleHurt[1:10,]
ThousandYrCasualtiesT10 <- xtable(ThousandYearPeopleHurt,caption="Top 10 Magnitudes of 1000-year Events, Casualties")
print(ThousandYrCasualtiesT10, type="html", caption.placement="top")

Top 10 Magnitudes of 1000-year Events, Casualties
	Standard Event Type	Expected Maximum Single Incident Casualties in 1000 Years
1	Freezing Fog	480.00
2	Excessive Heat	438.00
3	Heat	419.00
4	Dense Fog	246.00
5	Dust Storm	158.00
6	Tsunami	158.00
7	Tornado	59.00
8	Hurricane (Typhoon)	48.00
9	Blizzard	43.00
10	Wildfire	28.00

To produce a worst-case for a thousand-year event, a given type needs a very high variance. Freezing fog, excessive heat, Heat, and Dense fog all exhibit this, with historical values that vary a great deal (and histograms of their casualties (not shown) clearly show a moderately high peak with a great deal of spread). A planner should consider at least the possibility that, for example, that a big temperature drop (so that many drivers did not have a coat or sweater in the car) combined with a freezing fog across a several state area during a rush hour could result in large numbers of multi-car pileups, difficulty in reaching everyone with emergency assistance, and significant injuries due to exposure.

the worst of the worst-case scenarios for economic damage are hurricanes

#Ten Event Types with Worst Possible Single-Incident Economic Outcomes
ThousandYearDollarLosses<-data.frame()
ThousandYearDollarLosses<-MaxLosses[,c(1,17)]
ThousandYearDollarLosses<-arrange(ThousandYearDollarLosses,desc(MaxThousandYrEconomicLoss))
colnames(ThousandYearDollarLosses)<-c("Standard Event Type","Expected Maximum Single Incident Economic Loss in 1000 Years, log 10 $")
ThousandYearDollarLosses<-ThousandYearDollarLosses[1:10,]
ThousandYrCostsT10<-xtable(ThousandYearDollarLosses,caption="Top 10 Magnitudes of 1000-year Events, log 10 dollars")
print(ThousandYrCostsT10, type="html", caption.placement="top")

Top 10 Magnitudes of 1000-year Events, log 10 dollars
	Standard Event Type	Expected Maximum Single Incident Economic Loss in 1000 Years, log 10 $
1	Hurricane (Typhoon)	14.31
2	Sleet	13.16
3	Drought	13.00
4	Extreme Cold/Wind Chill	12.74
5	Wildfire	12.71
6	Blizzard	12.50
7	Lightning	12.20
8	Dense Fog	11.71
9	Ice Storm	11.51
10	High Surf	11.24

As with casualties, worst case scenarios for economic damage tend to go to the “long tail”, i.e. very wide variances. In this case the results, though indicative in relation to each other, are clearly not accurate; $log10(14.31) would be in excess of 100 trillion dollars, which is greater than current world GDP, so even if a hurricane somehow managed to strike the whole Earth, it simply couldn’t do this much damage.