This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database records major weather-related events in the United States, tracking their times and locations as well as estimates of fatalities, injuries, property and agricultural damages that may be associated with them.
We seek to identify which type of storm events have the greatest impact on the population both from the point of view of their health and of the economic consequences. To address these questions, our analysis aggregates the data by storm event type to identify the storm events categories responsible for the largest impact as measured by fatalities, injuries, property damage, crop damage.
Looking separately at each harm/damage category separately based on our analysis we can say that the most severe impact comes from the following types of events:
The scope and time for this investigation were limited, but there are clear directions worth further investigation, such as
The data structure and definition documentation is at the following URLs of the NOAA National Climatic Data Center (NCDC):
The raw data comprise 37 variables with 902,297 observation from years 1950-2011.
The objective of this research is to address two main questions:
The main variables in the NOAA/NCDC Storm Event databases that we will consider are:
For storm event date and type:
BGN_DATE: begin date of event.EVTYPE: event type (see discussion later).For human and economic damages associated with each storm event:
FATALITIES: number of fatalities.INJURIES: number of injuries.PROPDMG: property damage assessment.CROPDMG: agricultural damage assessment.The values reported for these two latter have to be converted in actual US dollar amounts by
multiplying them by a scaling factor coded in two other variables (PROPDMGEXP, CROPDMGEXP) as
the power of 10 of the multiplier. These two auxiliary variables seem to suffer from some
data-entry issues, which we will address below.
The data set comes as a compressed csv file, which can be read directly with read.csv().
After reviewing the data we set our preferred values for the classes of the variables loaded
in the data-frame and we set them on loading via the colClasses option, using a character
vector read from an auxiliary file.
NOTE: given the substantial size of the data set we set the options cache=TRUE to reduce the
knitr processing time to a manageable length.
col.cl <- read.csv("readin_classes.csv", stringsAsFactors=FALSE, strip.white=TRUE)
col.cl$newclass2
## [1] "numeric" "character" "character" "factor" "numeric"
## [6] "character" "factor" "character" "numeric" "character"
## [11] "character" "character" "character" "numeric" "character"
## [16] "numeric" "character" "character" "numeric" "numeric"
## [21] "factor" "numeric" "numeric" "numeric" "numeric"
## [26] "character" "numeric" "character" "character" "character"
## [31] "character" "numeric" "numeric" "numeric" "numeric"
## [36] "character" "numeric"
col.cl <- read.csv("readin_classes.csv", stringsAsFactors=FALSE, strip.white=TRUE)
# system.time( data <- read.csv("StormData_DL.csv.bz2", colClasses= col.cl$newclass2, nrows=1300000, strip.white=TRUE) )
data <- readRDS("df_rapanui_data_as_read.rds")
# str(data)
This is the structure of the data-frame as loaded.
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: chr "" "" "" "" ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : Factor w/ 7 levels "","0","1","2",..: 5 4 4 4 4 4 4 3 5 5 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
Due to the nature of the development of this dataset several variables suffer from inconsistent formatting and in order to make our analysis as robust as possible we cleaned and reformatted some of the potentially useful variables, as well as created ones for further convenience.
data$REMARKS <- gsub("^ *", "", data$REMARKS, perl=TRUE)
data$REMARKS <- gsub(" *$", "", data$REMARKS, perl=TRUE)
data$REMARKS <- gsub("[ ]{2,}", " ", data$REMARKS, perl=TRUE)
data$EVTYPE <- gsub("^ *", "", data$EVTYPE, perl=TRUE)
data$EVTYPE <- gsub(" *$", "", data$EVTYPE, perl=TRUE)
data$EVTYPE <- gsub("[ ]{2,}", " ", data$EVTYPE, perl=TRUE)
data$COUNTYNAME <- toupper(data$COUNTYNAME)
data$EVTYPE <- toupper(data$EVTYPE)
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
data$LATITUDE[data$LATITUDE <= 0] <- NA
data$LONGITUDE[data$LONGITUDE <= 0] <- NA
First a small fix motivated by the discovery of O instead of a 0 in the BGN_TIME variable:
data$BGN_TIME <- gsub("([0-9])O(.*)([^M])$","\\10\\2\\3", data$BGN_TIME, perl=TRUE)
It is better to work with properly constructed time variables, hence we convert BGN_DATE and
END_DATE into POSIXlt class variables.
# making BGN and END dates dates
data$BGN_DATE.new <- strptime(as.character(data$BGN_DATE), "%m/%d/%Y %H:%M:%S")
data$END_DATE.new <- strptime(as.character(data$END_DATE), "%m/%d/%Y %H:%M:%S")
Finally, we define a factor variable YEAR.
# create a 4-digit year factor variable
data$YEAR <- as.factor(substr(as.character(data$BGN_DATE.new),0,4))
For ease of processing it is practical to work with a reduced set of variables.
Our choice was the following:
sel.columns.n <- c( 37, 7, 5, 40, 38, 39, 8, 23, 24, 25, 26, 27, 28)
colnames(data)[sel.columns.n]
## [1] "REFNUM" "STATE" "COUNTY" "YEAR"
## [5] "BGN_DATE.new" "END_DATE.new" "EVTYPE" "FATALITIES"
## [9] "INJURIES" "PROPDMG" "PROPDMGEXP" "CROPDMG"
## [13] "CROPDMGEXP"
After creating the new data-frame, we rename the newly created time variables to take
the original names, and sort the data-frame by BGN_DATE.
good <- data[, sel.columns.n]
colnames(good)[5] <- "BGN_DATE"
colnames(good)[6] <- "END_DATE"
# sorting data frame chronologically
good <- good[order(good$BGN_DATE), ]
str(good)
## 'data.frame': 902297 obs. of 13 variables:
## $ REFNUM : num 28479 28480 83316 114844 6348 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 20 20 37 48 5 20 37 63 63 63 ...
## $ COUNTY : num 119 135 189 161 113 91 93 47 39 201 ...
## $ YEAR : Factor w/ 62 levels "1950","1951",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : POSIXlt, format: "1950-01-03" "1950-01-03" ...
## $ END_DATE : POSIXlt, format: NA NA ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 1 0 0 0 0 1 ...
## $ INJURIES : num 0 3 3 1 1 0 5 2 0 12 ...
## $ PROPDMG : num 250 250 2.5 25 2.5 250 250 0 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "M" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
We perform some more cleaning steps on this leaner data-frame to regularize the EVTYPE variable.
Because of historical reasons, namely the heterogenous sources of the data compiled into the Storm
Event database, the EVTYPE variable requires a significant amount of work to tidy it up
into a more usable form.
While in theory since 1996 NOAA has codified a set of 48 type of events
See NWS Directive to be used in the
classification of storm events, a quick review of the different EVTYPE values
for each year shows that their number remained higher until much more recently.
EVTYPE.by.year <- tapply(good$EVTYPE, good$YEAR, function(x) table(x), simplify=TRUE)
sapply(EVTYPE.by.year, function(x) length(names(x)))
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 3 3 3 3 3 3 3 3 3 3 3 3 3 160 266
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 384 199 135 122 118 110 120 99 51 38 46 50 46 46 46
## 2010 2011
## 46 46
Until 1993 “Tornado”“, "Thunderstorm Wind”“ and "Hail”“ type of events were recorded.
The 1993 and 1995 data have been extracted from unformatted text files, and this helps explain
the high level of noise of EVTYPE during those years.
It is less clear why after the Directive of 1996 the data have not become
A quick census of the most frequent EVTYPE can be extracted with table()
evt.census <- as.data.frame(table(data$EVTYPE))
head(evt.census[order(evt.census$Freq, decreasing=TRUE),],10)
## Var1 Freq
## 203 HAIL 288661
## 766 TSTM WIND 219946
## 672 THUNDERSTORM WIND 82564
## 745 TORNADO 60652
## 129 FLASH FLOOD 54278
## 145 FLOOD 25327
## 698 THUNDERSTORM WINDS 20850
## 309 HIGH WIND 20214
## 407 LIGHTNING 15755
## 265 HEAVY SNOW 15708
nrow(evt.census)
## [1] 883
The full database comprises 883 unique EVTYPE values!
It is immediately obvious the problem with inconsistent naming, as in the top-10 there
are three entries for the same storm event type, Thunderstorm Wind: TSTM WIND,
THUNDERSTORM WIND, THUNDERSTORM WINDS .
After an extensive analysis of the EVTYPE, we put together a lengthy set of substitutions
via gsub() and grepl() to consolidate as many related types as possible.
We run this by calling an external script to avoid clogging the document, but we include
the source of the script in the Appendix.
The script creates a new character vector added to the good data-frame as EVTYPE.
EVTYPE per year, etc… # Calling 'grepl.R' script to clean the EVTYPE entries
source("./evtype_regularization.R")
## # Entering 'evtype_regularization' script
## # Leaving 'evtype_regularization' script
# colnames(good)[7] <- "old.EVTYPE"
good$EVTYPE <- TESTvec
str(good)
## 'data.frame': 902297 obs. of 13 variables:
## $ REFNUM : num 28479 28480 83316 114844 6348 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 20 20 37 48 5 20 37 63 63 63 ...
## $ COUNTY : num 119 135 189 161 113 91 93 47 39 201 ...
## $ YEAR : Factor w/ 62 levels "1950","1951",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : POSIXlt, format: "1950-01-03" "1950-01-03" ...
## $ END_DATE : POSIXlt, format: NA NA ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 1 0 0 0 0 1 ...
## $ INJURIES : num 0 3 3 1 1 0 5 2 0 12 ...
## $ PROPDMG : num 250 250 2.5 25 2.5 250 250 0 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "M" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
qq <- table(good$EVTYPE)
dim(qq)
## [1] 373
The variables PROPDMGEXP and CROPDMGEXP are supposed to encode exponents for power of 10 units
for multiplying factors for their respective economic damage variables.
They too suffer from inconsistent data entry in the transition years around 1995 as it
can be seen by this quick analysis of the number of unique values for each year, accompanied
by the list of values for the years with the highest number of them:
PROPDMGEXP.by.year <- tapply(good$PROPDMGEXP, good$YEAR, function(x) table(x), simplify=TRUE)
sapply(PROPDMGEXP.by.year, function(x) length(names(x)))
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 3 3 3 3 3 3 3 3 3 3 3 3 3 7 9
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 17 3 4 4 4 4 4 3 4 4 4 4 2 3 2
## 2010 2011
## 3 4
PROPDMGEXP.by.year[which.max(sapply(PROPDMGEXP.by.year, function(x) length(names(x))))]
## $`1995`
## x
## + - 0 1 2 3 4 5 6 7 8
## 18331 4 1 186 25 13 4 3 26 3 5 1
## ? B H K M
## 5 5 6 8768 584
CROPDMGEXP.by.year <- tapply(good$CROPDMGEXP, good$YEAR, function(x) table(x), simplify=TRUE)
sapply(CROPDMGEXP.by.year, function(x) length(names(x)))
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 1 1 1 1 1 1 1 1 1 1 1 1 1 6 5
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## 7 3 3 3 3 3 3 3 3 3 4 4 2 2 2
## 2010 2011
## 2 3
CROPDMGEXP.by.year[which.max(sapply(CROPDMGEXP.by.year, function(x) length(names(x))))]
## $`1995`
## x
## 0 2 ? B K M
## 27019 8 1 5 3 800 134
We corrected their values according to the following interpretation:
H, K, M, B are interpreted as: hecto (=2), kilo (=3), mega (=6), billion (=9).PROPDMGEXP and CROPDMGEXP variable are changed to numeric class.good$PROPDMGEXP[good$PROPDMGEXP == "?" | good$PROPDMGEXP == "+" | good$PROPDMGEXP == "-" | good$PROPDMGEXP == ""] <- 0
good$PROPDMGEXP[good$PROPDMGEXP == "B"] <- 9
good$PROPDMGEXP[good$PROPDMGEXP == "M"] <- 6
good$PROPDMGEXP[good$PROPDMGEXP == "K"] <- 3
good$PROPDMGEXP[good$PROPDMGEXP == "H"] <- 2
good$CROPDMGEXP[good$CROPDMGEXP == "?" | good$CROPDMGEXP == "+" | good$CROPDMGEXP == "-" | good$CROPDMGEXP == ""] <- 0
good$CROPDMGEXP[good$CROPDMGEXP == "B"] <- 9
good$CROPDMGEXP[good$CROPDMGEXP == "M"] <- 6
good$CROPDMGEXP[good$CROPDMGEXP == "K"] <- 3
good$CROPDMGEXP[good$CROPDMGEXP == "H"] <- 2
good$PROPDMGEXP <- as.numeric(good$PROPDMGEXP)
good$CROPDMGEXP <- as.numeric(good$CROPDMGEXP)
str(good)
## 'data.frame': 902297 obs. of 13 variables:
## $ REFNUM : num 28479 28480 83316 114844 6348 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 20 20 37 48 5 20 37 63 63 63 ...
## $ COUNTY : num 119 135 189 161 113 91 93 47 39 201 ...
## $ YEAR : Factor w/ 62 levels "1950","1951",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : POSIXlt, format: "1950-01-03" "1950-01-03" ...
## $ END_DATE : POSIXlt, format: NA NA ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 1 0 0 0 0 1 ...
## $ INJURIES : num 0 3 3 1 1 0 5 2 0 12 ...
## $ PROPDMG : num 250 250 2.5 25 2.5 250 250 0 25 25 ...
## $ PROPDMGEXP: num 3 3 6 3 3 3 3 3 3 3 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: num 0 0 0 0 0 0 0 0 0 0 ...
After reviewing and cleaning the data, we choose to limit our analysis to the data since 1996-01-01. There are two main motivations for this choice:
DMGEXP and EVTYPE variables (but also the various date/time variables.)# subsetting 'good' to 'recent' taking only events since 1996
recent <- subset(good, BGN_DATE > as.POSIXlt("1996-01-01"))
row.names(recent) <- NULL
str(recent)
## 'data.frame': 653507 obs. of 13 variables:
## $ REFNUM : num 249653 251131 251416 252425 252426 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 7 8 9 10 10 13 13 13 13 13 ...
## $ COUNTY : num 12 11 4 1 2 105 19 19 19 109 ...
## $ YEAR : Factor w/ 62 levels "1950","1951",..: 47 47 47 47 47 47 47 47 47 47 ...
## $ BGN_DATE : POSIXlt, format: "1996-01-02" "1996-01-02" ...
## $ END_DATE : POSIXlt, format: "1996-01-02" "1996-01-02" ...
## $ EVTYPE : chr "HIGH WIND" "HIGH WIND" "HEAVY SNOW" "HEAVY SNOW" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 0 0 ...
## $ INJURIES : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PROPDMG : num 0 0 0 0 0 50 1.5 2 5 1.5 ...
## $ PROPDMGEXP: num 0 0 0 0 0 3 3 3 3 3 ...
## $ CROPDMG : num 0 1 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: num 0 6 0 0 0 0 3 3 3 3 ...
Final step of the data processing is the computation of actual straight US Dollar values
for damages combining the DMG and DMGEXP variables.
NOTE: before we compute them, we make an ad hoc fix to the value of PROPDMGEXP for
one specific entry, a flood in Napa in 2005. More details about this can be found in the Appendix.
recent$PROPDMGEXP[which.max(recent$PropDamage)] <- 6
recent$PropDamage <- recent$PROPDMG * 10^recent$PROPDMGEXP
recent$CropDamage <- recent$CROPDMG * 10^recent$CROPDMGEXP
str(recent)
## 'data.frame': 653507 obs. of 15 variables:
## $ REFNUM : num 249653 251131 251416 252425 252426 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 7 8 9 10 10 13 13 13 13 13 ...
## $ COUNTY : num 12 11 4 1 2 105 19 19 19 109 ...
## $ YEAR : Factor w/ 62 levels "1950","1951",..: 47 47 47 47 47 47 47 47 47 47 ...
## $ BGN_DATE : POSIXlt, format: "1996-01-02" "1996-01-02" ...
## $ END_DATE : POSIXlt, format: "1996-01-02" "1996-01-02" ...
## $ EVTYPE : chr "HIGH WIND" "HIGH WIND" "HEAVY SNOW" "HEAVY SNOW" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 0 0 ...
## $ INJURIES : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PROPDMG : num 0 0 0 0 0 50 1.5 2 5 1.5 ...
## $ PROPDMGEXP: num 0 0 0 0 0 3 3 3 3 3 ...
## $ CROPDMG : num 0 1 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: num 0 6 0 0 0 0 3 3 3 3 ...
## $ PropDamage: num 0 0 0 0 0 50000 1500 2000 5000 1500 ...
## $ CropDamage: num 0e+00 1e+06 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 0e+00 ...
Loading the libraries that will be used to prepare the plots illustrating the results.
library("plyr")
library("reshape2")
library("ggplot2")
library("grid")
We aim to answer two questions concerning the impact of different type of events across the United States: 1. which types of events are most harmful with respect to population health? 2. which types of events have the greatest economic consequences?
We work with a data-frame comprising a subset of variables and summarize the data for human
and economic impact in two final data-frames: human and economic.
In each of them, we aggregate data by EVTYPE and compute:
>0 impact).
We keep only EVTYPE with at least 100 occurrences.lean <- recent[c(1,2,4,7,8,9,14:15)]
human <- ddply(lean, .(EVTYPE), summarize,
N.tot=length(FATALITIES),
N.with.F=length(FATALITIES[FATALITIES>0]),
pct.with.F=length(FATALITIES[FATALITIES>0])/length(FATALITIES)*100.,
F.tot=sum(FATALITIES),
F.avrg=sum(FATALITIES)/length(FATALITIES[FATALITIES>0]),
N.with.I=length(INJURIES[INJURIES>0]),
pct.with.I=length(INJURIES[INJURIES>0])/length(FATALITIES)*100.,
I.tot=sum(INJURIES),
I.avrg=sum(INJURIES)/length(INJURIES[INJURIES>0]),
flag= (sum(FATALITIES) + sum(INJURIES))>0 )
human <- subset(human, N.tot >100)
economic <- ddply(lean, .(EVTYPE), summarize,
N.tot=length(PropDamage),
N.with.PrDmg=length(PropDamage[PropDamage>0]),
pct.with.PrDmg=length(PropDamage[PropDamage>0])/length(PropDamage)*100.,
PrDmg.tot=sum(PropDamage),
PrDmg.avrg=sum(PropDamage)/length(PropDamage[PropDamage>0]),
N.with.CrDmg=length(CropDamage[CropDamage>0]),
pct.with.CrDmg=length(CropDamage[CropDamage>0])/length(PropDamage)*100.,
CrDmg.tot=sum(CropDamage),
CrDmg.avrg=sum(CropDamage)/length(CropDamage[CropDamage>0]),
flag= (sum(PropDamage) + sum(CropDamage))>0 )
economic <- subset(economic, N.tot >100)
str(human)
## 'data.frame': 42 obs. of 11 variables:
## $ EVTYPE : chr "ASTRONOMICAL LOW TIDE" "AVALANCHE" "BLIZZARD" "COASTAL FLOOD" ...
## $ N.tot : int 277 378 2634 795 643 618 2479 173 559 1851 ...
## $ N.with.F : int 0 173 46 8 93 14 0 3 9 565 ...
## $ pct.with.F: num 0 45.77 1.75 1.01 14.46 ...
## $ F.tot : num 0 223 70 10 131 ...
## $ F.avrg : num NaN 1.29 1.52 1.25 1.41 ...
## $ N.with.I : int 0 105 41 6 4 14 1 9 50 163 ...
## $ pct.with.I: num 0 27.778 1.557 0.755 0.622 ...
## $ I.tot : num 0 156 385 10 24 ...
## $ I.avrg : num NaN 1.49 9.39 1.67 6 ...
## $ flag : logi FALSE TRUE TRUE TRUE TRUE TRUE ...
str(economic)
## 'data.frame': 42 obs. of 11 variables:
## $ EVTYPE : chr "ASTRONOMICAL LOW TIDE" "AVALANCHE" "BLIZZARD" "COASTAL FLOOD" ...
## $ N.tot : int 277 378 2634 795 643 618 2479 173 559 1851 ...
## $ N.with.PrDmg : int 10 52 193 197 19 193 53 66 150 23 ...
## $ pct.with.PrDmg: num 3.61 13.76 7.33 24.78 2.95 ...
## $ PrDmg.tot : num 9.74e+06 3.71e+06 5.26e+08 4.07e+08 2.54e+06 ...
## $ PrDmg.avrg : num 974500 71381 2723622 2067861 133895 ...
## $ N.with.CrDmg : int 0 0 3 0 6 2 243 6 4 2 ...
## $ pct.with.CrDmg: num 0 0 0.114 0 0.933 ...
## $ CrDmg.tot : num 0 0 7060000 0 30742500 ...
## $ CrDmg.avrg : num NaN NaN 2353333 NaN 5123750 ...
## $ flag : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
With the data summarized we can easily slice them in different ways.
In the next two sections we review the evidence emerging from the data collected and
organized as described so far.
The first table shows the top ten event types by total cumulative number of fatalities.
print(human[order(human$F.tot, decreasing=TRUE),][1:10,c(1,5,2,3,4)], quote=FALSE, row.names=FALSE)
## EVTYPE F.tot N.tot N.with.F pct.with.F
## EXCESSIVE HEAT 1799 1851 565 30.5240411
## TORNADO 1511 23154 459 1.9823789
## FLASH FLOOD 887 51005 584 1.1449858
## LIGHTNING 650 13203 608 4.6050140
## RIP CURRENT 542 734 480 65.3950954
## FLOOD 444 27761 294 1.0590397
## THUNDERSTORM WIND 379 211197 327 0.1548317
## EXTREME COLD/WIND CHILL 265 1892 195 10.3065539
## HIGH WIND 254 20240 194 0.9584980
## HEAT 237 829 133 16.0434258
This instead ranks event types by the fraction (%) of events that caused fatalities.
print(human[order(human$pct.with.F, decreasing=TRUE),][1:10,c(1,4,2,3,5)], quote=FALSE, row.names=FALSE)
## EVTYPE pct.with.F N.tot N.with.F F.tot
## RIP CURRENT 65.395095 734 480 542
## AVALANCHE 45.767196 378 173 223
## EXCESSIVE HEAT 30.524041 1851 565 1799
## HEAT 16.043426 829 133 237
## HURRICANE 15.498155 271 42 125
## COLD/WIND CHILL 14.463453 643 93 131
## EXTREME COLD/WIND CHILL 10.306554 1892 195 265
## HIGH SURF 9.646609 1047 101 142
## LIGHTNING 4.605014 13203 608 650
## TROPICAL STORM 3.519062 682 24 57
Injuries data can be looked at in the same two ways, ranked by total number of injuries and fraction that caused injuries.
print(human[order(human$I.tot, decreasing=TRUE),][1:10,c(1,9,2,7,8)], quote=FALSE, row.names=FALSE)
## EVTYPE I.tot N.tot N.with.I pct.with.I
## TORNADO 20667 23154 1877 8.1065907
## FLOOD 6838 27761 169 0.6087677
## EXCESSIVE HEAT 6461 1851 163 8.8060508
## THUNDERSTORM WIND 5129 211197 2108 0.9981202
## LIGHTNING 4140 13203 2250 17.0415815
## FLASH FLOOD 1674 51005 335 0.6567984
## WILDFIRE 1458 4176 315 7.5431034
## HURRICANE 1328 271 27 9.9630996
## WINTER STORM 1292 11315 144 1.2726469
## HEAT 1239 829 40 4.8250905
print(human[order(human$pct.with.I, decreasing=TRUE),][1:10,c(1,8,2,7,9)], quote=FALSE, row.names=FALSE)
## EVTYPE pct.with.I N.tot N.with.I I.tot
## AVALANCHE 27.777778 378 105 156
## RIP CURRENT 26.021798 734 191 503
## LIGHTNING 17.041581 13203 2250 4140
## HURRICANE 9.963100 271 27 1328
## DUST STORM 8.944544 559 50 415
## EXCESSIVE HEAT 8.806051 1851 163 6461
## TORNADO 8.106591 23154 1877 20667
## WILDFIRE 7.543103 4176 315 1458
## HIGH SURF 6.781280 1047 71 238
## FOG GENERAL 5.233540 1777 93 855
hum20ft <- human[order(human$F.tot, decreasing=TRUE),][1:20,]
hum20fp <- human[order(human$pct.with.F, decreasing=TRUE),][1:20,]
hum20it <- human[order(human$I.tot, decreasing=TRUE),][1:20,]
hum20ip <- human[order(human$pct.with.I, decreasing=TRUE),][1:20,]
ph.ft <- ggplot(data=hum20ft, aes(x=EVTYPE)) + theme_bw()
ph.fp <- ggplot(data=hum20fp, aes(x=EVTYPE)) + theme_bw()
hplot.f1 <- ph.ft + geom_bar(stat="identity", aes(y=F.tot), fill="red2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position="none") +
xlab("Event Type") + ylab("Fatalities Count") +
ggtitle("Events Causing Fatalities")
hplot.f2 <- ph.fp + geom_bar(stat="identity", aes(y=pct.with.F), fill="red2") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position="none") +
xlab("Event Type") + ylab("Percentage") + ylim(c(-5,75)) +
ggtitle("Fraction of Events Causing Fatalities")
grid.draw(rbind(ggplotGrob(hplot.f1), ggplotGrob(hplot.f2), size="last"))
#—————————————————————————-
As in the previous section, we report events ranked by total damage and by fraction of events causing a recorded amount of damages.
print(economic[order(economic$pct.with.PrDmg, decreasing=TRUE),][1:10,c(1,4,2,3,5)], quote=FALSE, row.names=FALSE)
## EVTYPE pct.with.PrDmg N.tot N.with.PrDmg PrDmg.tot
## STRONG WIND 86.21607 3758 3240 176994240
## LIGHT SNOW 81.03448 174 141 2513000
## HURRICANE 72.32472 271 196 81718889010
## LIGHTNING 66.22737 13203 8744 743077080
## TROPICAL STORM 57.18475 682 390 7642475550
## STORM SURGE 52.86783 401 212 47834724000
## TORNADO 51.16611 23154 11847 24616905710
## THUNDERSTORM WIND 49.00306 211197 103493 7913540880
## DRY MICROBURST 38.15029 173 66 1732600
## FLASH FLOOD 36.56504 51005 18650 15222268910
print(economic[order(economic$PrDmg.tot, decreasing=TRUE),][1:10,c(1,5,2,3,4)], quote=FALSE, row.names=FALSE)
## EVTYPE PrDmg.tot N.tot N.with.PrDmg pct.with.PrDmg
## FLOOD 144129580200 27761 9838 35.438205
## HURRICANE 81718889010 271 196 72.324723
## STORM SURGE 47834724000 401 212 52.867830
## TORNADO 24616905710 23154 11847 51.166105
## FLASH FLOOD 15222268910 51005 18650 36.565043
## HAIL 14595213420 207767 20004 9.628093
## THUNDERSTORM WIND 7913540880 211197 103493 49.003063
## WILDFIRE 7760449500 4176 1026 24.568966
## TROPICAL STORM 7642475550 682 390 57.184751
## HIGH WIND 5250667860 20240 5258 25.978261
print(economic[order(economic$CrDmg.tot, decreasing=TRUE),][1:10,c(1,9,2,7,8)], quote=FALSE, row.names=FALSE)
## EVTYPE CrDmg.tot N.tot N.with.CrDmg pct.with.CrDmg
## DROUGHT 13367566000 2479 243 9.802340
## HURRICANE 5350107800 271 85 31.365314
## FLOOD 5013161500 27761 1722 6.202947
## HAIL 2496822450 207767 8109 3.902930
## FLASH FLOOD 1334901700 51005 1845 3.617292
## EXTREME COLD/WIND CHILL 1326023000 1892 50 2.642706
## FROST/FREEZE 1094186000 1343 103 7.669397
## THUNDERSTORM WIND 1016942600 211197 4453 2.108458
## HEAVY RAIN 729669800 11546 124 1.073965
## TROPICAL STORM 677711000 682 59 8.651026
print(economic[order(economic$pct.with.CrDmg, decreasing=TRUE),][1:10,c(1,8,2,7,9)], quote=FALSE, row.names=FALSE)
## EVTYPE pct.with.CrDmg N.tot N.with.CrDmg CrDmg.tot
## HURRICANE 31.365314 271 85 5350107800
## DROUGHT 9.802340 2479 243 13367566000
## TROPICAL STORM 8.651026 682 59 677711000
## FROST/FREEZE 7.669397 1343 103 1094186000
## FLOOD 6.202947 27761 1722 5013161500
## TORNADO 5.415911 23154 1254 283425010
## HAIL 3.902930 207767 8109 2496822450
## FLASH FLOOD 3.617292 51005 1845 1334901700
## DRY MICROBURST 3.468208 173 6 15000
## WILDFIRE 2.969349 4176 124 402255130
eco20plot <- economic[order(economic$PrDmg.tot, decreasing=TRUE),][1:20,]
order.pdt <- order(eco20plot$PrDmg.tot, decreasing=TRUE)
order.pdp <- order(eco20plot$pct.with.PrDmg, decreasing=TRUE)
order.cdt <- order(eco20plot$CrDmg.tot, decreasing=TRUE)
order.cdp <- order(eco20plot$pct.with.CrDmg, decreasing=TRUE)
eco20plot$PrDmg.tot.rank <- rep(0,20) ; eco20plot$PrDmg.tot.rank[order.pdt] <- 1:20
eco20plot$PrDmg.pct.rank <- rep(0,20) ; eco20plot$PrDmg.pct.rank[order.pdp] <- 1:20
eco20plot$CrDmg.tot.rank <- rep(0,20) ; eco20plot$CrDmg.tot.rank[order.cdt] <- 1:20
eco20plot$CrDmg.pct.rank <- rep(0,20) ; eco20plot$CrDmg.pct.rank[order.cdp] <- 1:20
eco20plot$PrDmg.tot.rank <- factor(eco20plot$PrDmg.tot.rank, levels=1:20, ordered=TRUE)
eco20plot$PrDmg.pct.rank <- factor(eco20plot$PrDmg.pct.rank, levels=1:20, ordered=TRUE)
eco20plot$CrDmg.tot.rank <- factor(eco20plot$CrDmg.tot.rank, levels=1:20, ordered=TRUE)
eco20plot$CrDmg.pct.rank <- factor(eco20plot$CrDmg.pct.rank, levels=1:20, ordered=TRUE)
#————————————————
p <- ggplot(data=eco20plot, aes(x=EVTYPE)) + theme_bw()
ecoplot.p1 <- p + geom_bar(stat="identity", aes(y=PrDmg.tot/1.0e6, fill=PrDmg.tot.rank)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position="none") +
scale_y_log10() + coord_cartesian(ylim=c(1e1, 3e5)) +
xlab("") + ylab("Damages (Million of Dollars)") +
ggtitle("Total Property Damages by Event Type")
ecoplot.p2 <- p + geom_bar(stat="identity", aes(y=pct.with.PrDmg, fill=PrDmg.pct.rank)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(legend.position="none") +
xlab("Event Type") + ylab("Percentage") + ylim(c(-5,105)) +
ggtitle("Fraction of Events Causing Damages")
grid.draw(rbind(ggplotGrob(ecoplot.p1), ggplotGrob(ecoplot.p2), size="last"))
Next, we can apply the same logic to identify the events which cause the
greatest economic impacts. First, we’ll split the dataframe in order to sum the
total property damages and crop damages per event type and then we’ll sort the
data and extract the top 10 events which cause the most economic damages to
properties and to crops.
…
Going by the “raw” data, the event that caused the highest property damage, was a flood
event in Napa, California, at the end of 200, reported at over $100 Billion.
However, the REMARKS entry in the database itsels raises doubts about this figure, and a
USGS report assessing the impact of the late 2005 storms confirms that.
Our best guess is that the value of the EXP parameter for this event should been M instead of M
as this would bring the damages amount in line with the narrative remarks and USGS report.
We therefore adjusted accordingly the PROPDMGEXP for this specific event.
Here is the original data:
recent[which.max(tmp.PropDamage),]
## REFNUM STATE COUNTY YEAR BGN_DATE END_DATE EVTYPE FATALITIES
## 354392 605943 CA 55 2006 2006-01-01 2006-01-01 FLOOD 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PropDamage
## 354392 0 115 9 32.5 6 1.15e+11
## CropDamage TotDamage
## 354392 32500000 115032500000
data$REMARKS[data$REFNUM == recent$REFNUM[which.max(tmp.PropDamage)]]
## [1] "Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million."
Now, the single event which resulted in highest property damage is storm surge caused by Hurricane Katrina in Lousiana, resulting in over 31 Billion.
This is the source of the evt_regularization.R script:
cat("# Entering 'evtype_regularization' script\n")
TESTvec <- good$EVTYPE
TESTvec <- gsub("^SNOW$", "HEAVY SNOW", TESTvec, perl=TRUE)
TESTvec <- gsub("^WIND$", "HIGH WIND", TESTvec, perl=TRUE)
TESTvec <- gsub("^WINDS$", "HIGH WIND", TESTvec, perl=TRUE)
TESTvec <- gsub("^ICE$", "ICE STORM", TESTvec, perl=TRUE)
TESTvec <- gsub("^UNSEASONABLY WARM$", "HEAT", TESTvec, perl=TRUE)
TESTvec <- gsub("^TORNDAO$", "TORNADO", TESTvec, perl=TRUE)
TESTvec <- gsub("^WAYTERSPOUT$", "WATERSPOUT", TESTvec, perl=TRUE)
TESTvec <- gsub("^EXTREME HEAT$", "EXCESSIVE HEAT", TESTvec, perl=TRUE)
TESTvec <- gsub("^LIGHT FREEZING RAIN$", "SLEET", TESTvec, perl=TRUE)
TESTvec <- gsub("^UNSEASONABLY DRY$", "DROUGHT", TESTvec, perl=TRUE)
TESTvec <- gsub("^TSTM HEAVY RAIN$", "HEAVY RAIN", TESTvec, perl=TRUE)
TESTvec <- gsub("^SMALL HAIL$", "HAIL", TESTvec, perl=TRUE)
TESTvec <- gsub("^$EXCESSIVE SNOW", "HEAVY SNOW", TESTvec, perl=TRUE)
TESTvec[grepl("^ASTRONOMICAL", TESTvec, perl=TRUE)] <- "ASTRONOMICAL LOW TIDE"
TESTvec[grepl("^AVALA", TESTvec, perl=TRUE)] <- "AVALANCHE"
TESTvec[grepl("^BEACH", TESTvec, perl=TRUE)] <- "COASTAL FLOOD"
TESTvec[grepl("^BITTER", TESTvec, perl=TRUE)] <- "EXTREME COLD/WIND CHILL"
TESTvec[grepl("^BLIZZARD", TESTvec, perl=TRUE)] <- "BLIZZARD"
TESTvec[grepl("^BRUSH FIRE", TESTvec, perl=TRUE)] <- "WILDFIRE"
TESTvec[grepl("^COASTAL", TESTvec, perl=TRUE)] <- "COASTAL FLOOD"
TESTvec[grepl("^DROUGHT", TESTvec, perl=TRUE)] <- "DROUGHT"
TESTvec[grepl("^DUST D", TESTvec, perl=TRUE)] <- "DUST DEVIL"
TESTvec[grepl("^EXCESSIVE H", TESTvec, perl=TRUE)] <- "EXCESSIVE HEAT"
TESTvec[grepl("^EXTREME [CW]", TESTvec, perl=TRUE)] <- "EXTREME COLD/WIND CHILL"
TESTvec[grepl("^FLASH", TESTvec, perl=TRUE)] <- "FLASH FLOOD"
TESTvec[grepl("^FLO.*FLA", TESTvec, perl=TRUE)] <- "FLASH FLOOD"
TESTvec[grepl("^HYP", TESTvec, perl=TRUE)] <- "EXTREME COLD/WIND CHILL"
TESTvec[grepl("^LIGHT[A-Z]", TESTvec, perl=TRUE)] <- "LIGHTNING"
TESTvec[grepl("^LANDSL", TESTvec, perl=TRUE)] <- "DEBRIS FLOW"
TESTvec[grepl("^HAIL", TESTvec, perl=TRUE)] <- "HAIL"
TESTvec[grepl("^HEAT WAVE", TESTvec, perl=TRUE)] <- "EXCESSIVE HEAT"
TESTvec[grepl("^HEAVY PR", TESTvec, perl=TRUE)] <- "HEAVY RAIN"
TESTvec[grepl("^HEAVY RAIN", TESTvec, perl=TRUE)] <- "HEAVY RAIN"
TESTvec[grepl("^HEAVY SNOW", TESTvec, perl=TRUE)] <- "HEAVY SNOW"
TESTvec[grepl("^HEAVY SURF", TESTvec, perl=TRUE)] <- "HIGH SURF"
TESTvec[grepl("^HIGH S[UW]", TESTvec, perl=TRUE)] <- "HIGH SURF"
TESTvec[grepl("^MARINE T", TESTvec, perl=TRUE)] <- "MARINE THUNDERSTORM WIND"
TESTvec[grepl("^MINOR FL", TESTvec, perl=TRUE)] <- "FLOOD"
TESTvec[grepl("^MIXED PR", TESTvec, perl=TRUE)] <- "SLEET"
TESTvec[grepl("^MUD", TESTvec, perl=TRUE)] <- "DEBRIS FLOW"
TESTvec[grepl("^RECORD C", TESTvec, perl=TRUE)] <- "EXTREME COLD/WIND CHILL"
TESTvec[grepl("^RECORD H", TESTvec, perl=TRUE)] <- "EXCESSIVE HEAT"
TESTvec[grepl("^RECORD WA", TESTvec, perl=TRUE)] <- "EXCESSIVE HEAT"
TESTvec[grepl("^RIP", TESTvec, perl=TRUE)] <- "RIP CURRENT"
TESTvec[grepl("^R.*FLOOD", TESTvec, perl=TRUE)] <- "FLOOD"
TESTvec[grepl("^SEVERE TH", TESTvec, perl=TRUE)] <- "THUNDERSTORM WIND"
TESTvec[grepl("^SLEET", TESTvec, perl=TRUE)] <- "SLEET"
TESTvec[grepl("^SMALL STR", TESTvec, perl=TRUE)] <- "FLOOD"
TESTvec[grepl("^STRONG W", TESTvec, perl=TRUE)] <- "STRONG WIND"
TESTvec[grepl("^TIDAL FL", TESTvec, perl=TRUE)] <- "COASTAL FLOOD"
TESTvec[grepl("^TORRENTIAL", TESTvec, perl=TRUE)] <- "HEAVY RAIN"
TESTvec[grepl("^TROPICAL STORM", TESTvec, perl=TRUE)] <- "TROPICAL STORM"
TESTvec[grepl("^TSTM W", TESTvec, perl=TRUE)] <- "THUNDERSTORM WIND"
TESTvec[grepl("^URBAN", TESTvec, perl=TRUE)] <- "FLOOD"
TESTvec[grepl("^UNSEASO.* CO", TESTvec, perl=TRUE)] <- "COLD/WIND CHILL"
TESTvec[grepl("^WINTER ST", TESTvec, perl=TRUE)] <- "WINTER STORM"
TESTvec[grepl("^WINTER W", TESTvec, perl=TRUE)] <- "WINTER WEATHER"
TESTvec[grepl("^WATER", TESTvec, perl=TRUE)] <- "WATERSPOUT"
TESTvec[grepl("^WILD", TESTvec, perl=TRUE)] <- "WILDFIRE"
TESTvec[grepl("^WIND CHILL", TESTvec, perl=TRUE)] <- "COLD/WIND CHILL"
TESTvec[grepl("^VOLCANIC", TESTvec, perl=TRUE)] <- "VOLCANIC ASH"
TESTvec[grepl("[VF]OG", TESTvec, perl=TRUE)] <- "FOG GENERAL"
TESTvec[grepl("TYPHOON", TESTvec, perl=TRUE)] <- "HURRICANE"
TESTvec[grepl("SMOKE", TESTvec, perl=TRUE)] <- "DENSE SMOKE"
TESTvec[grepl("FIRES", TESTvec, perl=TRUE)] <- "WILDFIRE"
TESTvec[grepl("SQUALL", TESTvec, perl=TRUE)] <- "HEAVY SNOW"
TESTvec[grepl("EROSION", TESTvec, perl=TRUE)] <- "COASTAL FLOOD"
TESTvec[grepl("^SNOW.*ICE", TESTvec, perl=TRUE)] <- "ICE STORM"
TESTvec[grepl("^SNOW.*SLEET", TESTvec, perl=TRUE)] <- "SLEET"
TESTvec[grepl("^SNOW.*RAIN", TESTvec, perl=TRUE)] <- "SLEET"
TESTvec[grepl("LAKE.*SNOW", TESTvec, perl=TRUE)] <- "LAKE-EFFECT SNOW"
TESTvec[grepl("LAKE.*FLOOD", TESTvec, perl=TRUE)] <- "LAKESHORE FLOOD"
TESTvec[grepl("HURRICANE-GENERATED SWELLS", TESTvec, perl=TRUE)] <- "STORM SURGE"
TESTvec[grepl("^HURRICANE", TESTvec, perl=TRUE)] <- "HURRICANE" # run after |HURRICANE-GENERATED SWELLS
TESTvec[grepl("^ICE[ /S][A-EO-T]", TESTvec, perl=TRUE)] <- "ICE STORM" # run after |[VF]OG
TESTvec[grepl("^FREEZING [DR]", TESTvec, perl=TRUE)] <- "SLEET" # run after |[VF]OG
TESTvec[grepl("^WINT.* MIX", TESTvec, perl=TRUE)] <- "SLEET" # run after |^WINTER W
TESTvec[grepl("SURGE", TESTvec, perl=TRUE)] <- "STORM SURGE" # run after |^COASTAL
TESTvec[grepl("FUNNEL", TESTvec, perl=TRUE)] <- "FUNNEL CLOUD" # run after |^WATER
TESTvec[grepl("TORNADO", TESTvec, perl=TRUE)] <- "TORNADO" # run after |^WATER
TESTvec[grepl("DUST", TESTvec, perl=TRUE)] <- "DUST STORM" # run after |^DUST D
TESTvec[grepl("^HIGH WIN", TESTvec, perl=TRUE)] <- "HIGH WIND" # run after |DUST
TESTvec[grepl("^FLOOD", TESTvec, perl=TRUE)] <- "FLOOD" # run after |^FLO.*FLA
TESTvec[grepl("^COLD", TESTvec, perl=TRUE)] <- "COLD/WIND CHILL" # run after |FUNNEL , |TORNADO
TESTvec[grepl("^THUNDERSTORM", TESTvec, perl=TRUE)] <- "THUNDERSTORM WIND" # run after |FUNNEL
TESTvec[grepl("^THU.*TOR.*WI", TESTvec, perl=TRUE)] <- "THUNDERSTORM WIND" # run after |FUNNEL
cat("# Leaving 'evtype_regularization' script\n")