Review of Health and Economic Impact of Weather Events in the United States from January 1996 to November 2011

Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This report explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property and crop damage.

This report will focus on those three areas: fatalities, injuries, and economic damage (crop and property damage). For each of these topics, this report shows which weather event types were the most fatal, most injuring, and most costly for the U.S. as a whole. Additionally, this report reviews the most fatal, most injuring, and most costly event type by state. Although the events in the database start in the year 1950, prior to 1996, the only event types that were available were “Tornado, Thunderstorm Wind and Hail” (1955-1996) or “Tornado” (1950-1954). From 1996 forward, 48 event types are recorded as defined in NWS Directive 10-1605, and so this report will focus on this period of time.

Data Processing

This initial code chunk reads in the data and provides information about from where the data were downloaded.

setwd("~/Documents/Coursera/5_ReproducibleResearch/PeerAssessment2")

#This section downloads the data and writes the data to a csv. Because the datafile is quite large, 
#this section can be commented out after the first download and then the reviewer could start with 
#uploading the csv that was previously created.
temp <- tempfile()
link <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(link, temp, method = "curl")
data <- read.csv(bzfile(temp))
unlink(temp)

write.csv(data, file = "StormData.csv", row.names = FALSE) 
data <- read.csv("StormData.csv", stringsAsFactors = FALSE)

This code chunk first subsets the data to only keep the columns used in this analysis. Next, the character variables are all translated to uppercase, and the column with the Beginning date is converted to a date-time variable. Finally, the data is subset based on date to include only observations since January 1996. A total of 248790 observations will be dropped for a remaining total of 653507.

#Libraries used:
library(stringdist)
library(maps)
library(mapdata)
library(scales)
library(knitr)
library(ggplot2)
library(RColorBrewer)

#Subset DF to columns of intrest
ColsInt <- c("BGN_DATE", "STATE", "EVTYPE", "MAG", "FATALITIES", "INJURIES", "PROPDMG", 
             "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "LATITUDE", "LONGITUDE")

SubData <- data[,ColsInt] #902297 observations of 12 variables

#Ensure character variables are uppercase
SubData$STATE <- toupper(SubData$STATE)
SubData$EVTYPE <- toupper(SubData$EVTYPE)
SubData$PROPDMGEXP <- toupper(SubData$PROPDMGEXP)
SubData$CROPDMGEXP <- toupper(SubData$CROPDMGEXP)

#Convert BGN_DATE variable to date-time variable
SubData$BGN_DATE <- gsub(" 0:00:00", "", SubData$BGN_DATE)
SubData$BGN_DATE <- strptime(SubData$BGN_DATE, "%m/%d/%Y")

#Subset data by date
MoreComplete <- as.POSIXlt("1996-01-01 00:00:00")
SubData <- SubData[(SubData$BGN_DATE > MoreComplete),]

Data Processing: Property and Crop Damage

The following code chunk converts the PROPDMG and CROPDMG back to dollar values and then adds them to create a Total Economic Damage column (TtlEcnDmg). The resulting column is then converted to millions of dollars and rounded.

##Processing for Property and Crop Damage variables

K <- 1000
M <- 1000000
B <- 1000000000

SubData[(SubData$PROPDMGEXP == "K"),]$PROPDMG <- SubData[(SubData$PROPDMGEXP == "K"),]$PROPDMG*K
SubData[(SubData$PROPDMGEXP == "M"),]$PROPDMG <- SubData[(SubData$PROPDMGEXP == "M"),]$PROPDMG*M
SubData[(SubData$PROPDMGEXP == "B"),]$PROPDMG <- SubData[(SubData$PROPDMGEXP == "B"),]$PROPDMG*B
SubData[(SubData$CROPDMGEXP == "K"),]$CROPDMG <- SubData[(SubData$CROPDMGEXP == "K"),]$CROPDMG*K
SubData[(SubData$CROPDMGEXP == "M"),]$CROPDMG <- SubData[(SubData$CROPDMGEXP == "M"),]$CROPDMG*M
SubData[(SubData$CROPDMGEXP == "B"),]$CROPDMG <- SubData[(SubData$CROPDMGEXP == "B"),]$CROPDMG*B

SubData$CROPDMG <- as.numeric(SubData$PROPDMG)
SubData$CROPDMG <- as.numeric(SubData$CROPDMG)

SubData$TtlEcnDmg <- SubData$PROPDMG + SubData$CROPDMG

#convert to Billions
SubData$TtlEcnDmg <- SubData$TtlEcnDmg/B
SubData$TtlEcnDmg <- round(SubData$TtlEcnDmg, digits = 2)

Data Processing: Event Types

This section of the document processes the EVTYPE column by using a csv file that contains the 48 event types as defined in NWS Directive 10-1605 (found at http://www.ncdc.noaa.gov/stormevents/pd01016005curr.pdf). The review of unique event types below revealed that there were 438 unique event types. There are five code chunks in this section. The first code chunk uploads the csv file and creates a data frame of unique event types and frequency in the data.

#Create a vector of Event Types as they are classified in the National Weather Service Documentation
EventTypes <- read.csv("EventTypesNWS.csv", header = FALSE)
EventTypes <- toupper(EventTypes$V1)

#Review unique Event Types
UnqFreq <- as.data.frame(table(SubData$EVTYPE))
UnqFreq <- UnqFreq[order(UnqFreq$Var1),]
UnqFreq <- UnqFreq[order(UnqFreq$Freq, decreasing = TRUE),]

This code chunk includes processing that is based on the initial visual inspection of the unique event types and frequencies.

#Replace "\" with "/"
SubData$EVTYPE <- gsub("\\\\", "/", SubData$EVTYPE)

#Remove Parens and periods
SubData$EVTYPE <- gsub("\\(", "", SubData$EVTYPE)
SubData$EVTYPE <- gsub("\\)", "", SubData$EVTYPE)
SubData$EVTYPE <- gsub("\\.", "", SubData$EVTYPE)

#Remove MPH
SubData$EVTYPE <- gsub("MPH", "", SubData$EVTYPE)

#Remove all numbers
SubData$EVTYPE <- gsub("[0-9]", "", SubData$EVTYPE)

#TSTM to Thunderstorm 
SubData$EVTYPE <- gsub("TSTM", "THUNDERSTORM", SubData$EVTYPE)

#Rain and Snow - Rain to "Heavy Rain", Snow to "Heavy Snow"
SubData$EVTYPE[SubData$EVTYPE=="RAIN"]<-"HEAVY RAIN"
SubData$EVTYPE[SubData$EVTYPE=="SNOW"]<-"HEAVY SNOW"

#Anycase of Microburst to Thunderstorm Winds
microburst <- function(evtypes) {
    tgtList <- unique((grep("MICROBURST", evtypes, value = TRUE)))
    for (i in 1:length(tgtList)) {
        evtypes[(evtypes == tgtList[i])] <- "THUNDERSTORM WINDS"
    }
    evtypes
}
SubData$EVTYPE <- microburst(SubData$EVTYPE)

#Hurricane or Typhoon to "Hurricane/Typhoon" do nothing
hurricane <- function(evtypes) {
    tgtList <- unique(grep("HURRICANE|TYPHOON", evtypes, value = TRUE))
    for (i in 1:length(tgtList)) {
        evtypes[(evtypes == tgtList[i])] <- "HURRICANE/TYPHOON"
    }
    evtypes
}
SubData$EVTYPE <- hurricane(SubData$EVTYPE)

The goal of this section is to identify EVTYPE observations that are very close to one of the EventTypes. The unique event types from the data set are looped through compared to the event types listed in the NWS Documentation using the function stringdist from the stringdist package. The default method for stringdist is Optimal string alignment. The Optimal String Alignment distance (osa) counts the number of deletions, insertions and substitutions necessary to turn b into a, and also allows transposition of adjacent characters. Each substring may be edited only once. (For example, a character cannot be transposed twice to move it forward in the string).

UniqueEvt <- unique(SubData$EVTYPE)
UnqEVtype <- c()
TxtDist <- c()
NewEvt <- c() 

for (i in 1:length(UniqueEvt)) {
    Distances <- stringdist(UniqueEvt[i], EventTypes)
    minDist <- min(Distances)
    minEventType <- EventTypes[which.min(Distances)]
    UnqEVtype <- c(UnqEVtype, UniqueEvt[i])
    TxtDist <- c(TxtDist, minDist)
    NewEvt <- c(NewEvt, minEventType)
}

UnqEvts <- data.frame(EVTYPE = UnqEVtype, TXTDIST = TxtDist, NEWEVT = NewEvt, 
                      stringsAsFactors = FALSE)
OffByUnder3 <- UnqEvts[(UnqEvts$TXTDIST < 3 & UnqEvts$TXTDIST > 0),] 
OffByUnder3 #these look good

##                  EVTYPE TXTDIST            NEWEVT
## 19         RIP CURRENTS       1       RIP CURRENT
## 33   THUNDERSTORM WINDS       1 THUNDERSTORM WIND
## 49         COASTALFLOOD       1     COASTAL FLOOD
## 80         STRONG WINDS       1       STRONG WIND
## 95        COASTAL FLOOD       1     COASTAL FLOOD
## 160 THUNDERSTORM WIND G       2 THUNDERSTORM WIND
## 170  THUNDERSTORM WIND        1 THUNDERSTORM WIND
## 175    THUNDERSTORM WND       1 THUNDERSTORM WIND
## 177   THUNDERSTORM WIND       1 THUNDERSTORM WIND
## 189    LAKE EFFECT SNOW       1  LAKE-EFFECT SNOW
## 200       FUNNEL CLOUDS       1      FUNNEL CLOUD
## 201         WATERSPOUTS       1        WATERSPOUT
## 217          HIGH WINDS       1         HIGH WIND
## 219           LIGHTNING       1         LIGHTNING
## 231         HIGH WIND G       2         HIGH WIND
## 314         FLASH FLOOD       1       FLASH FLOOD
## 327          WATERSPOUT       1        WATERSPOUT
## 338          DUST DEVEL       1        DUST DEVIL

DiffEvts3up <- UnqEvts[(UnqEvts$TXTDIST > 2),]

#Count the number of Event types in the SubData that have a stringdist of 3 or more.
sum(SubData$EVTYPE %in% DiffEvts3up$EVTYPE) #12584

## [1] 12584

#Checking the frequencies of the remaining unique values that have a stringdist greater than 2
UnqFreq <- as.data.frame(table(SubData$EVTYPE))
UnqFreq3plus <- UnqFreq[(UnqFreq$Var1 %in% DiffEvts3up$EVTYPE),]
UnqFreq3plus <- UnqFreq3plus[order(UnqFreq3plus$Freq, decreasing = TRUE),]
head(UnqFreq3plus, n=10)

##                       Var1 Freq
## 341   URBAN/SML STREAM FLD 3391
## 362       WILD/FOREST FIRE 1443
## 375     WINTER WEATHER/MIX 1104
## 309 THUNDERSTORM WIND/HAIL 1026
## 68            EXTREME COLD  617
## 158              LANDSLIDE  588
## 85                     FOG  532
## 363                   WIND  326
## 267            STORM SURGE  253
## 126   HEAVY SURF/HIGH SURF  228

Based on review of frequencies for unique event types with a stringdist of three or more, if the top ten are manually corrected for, this will account for 9508 of the 12584 observations (76%). The code chunk below replaces the EVTYPE variables where appropriate and dropped rows of data for which the appropriate event type is unclear.

#Top Ten based on the freqencies: URBAN/SML STREAM FLD (3391), WILD/FOREST FIRE (1443), WINTER 
#WEATHER/MIX (1104), THUNDERSTORM WIND/HAIL (1026), EXTREME COLD (617), LANDSLIDE (588), FOG (532), 
#WIND (326), STORM SURGE (253), HEAVY SURF/HIGH SURF (228) -- Manually edit these where appropriate

#"Heavy rain situations, resulting in urban and/or small stream flooding, should be classified as a 
#Heavy Rain event.." according to the NWS documentation
SubData[(SubData$EVTYPE == "URBAN/SML STREAM FLD"),]$EVTYPE <- "HEAVY RAIN"

#"Any significant forest fire, grassland fire, rangeland fire, or wildland-urban interface fire..." 
#according to the NWS documentation this should be "WILDFIRE"
SubData[(SubData$EVTYPE == "WILD/FOREST FIRE"),]$EVTYPE <- "WILDFIRE"

#"A Winter Weather event could result from one or more winter precipitation types (snow, or blowing/
#drifting snow, or freezing rain/drizzle), on a widespread or localized basis ..." 
#according to the NWS documentation this should be "WINTER WEATHER"
SubData[(SubData$EVTYPE == "WINTER WEATHER/MIX"),]$EVTYPE <- "WINTER WEATHER"

#THUNDERSTORM WIND/HAIL -- It is not clear whether this should be "THUNDERSTORM WIND" or "HAIL", so
#These will be dropped.
SubData <- SubData[!(SubData$EVTYPE == "THUNDERSTORM WIND/HAIL"),] #1026 rows

#EXTREME COLD to "EXTREME COLD/WIND CHILL"
SubData[(SubData$EVTYPE == "EXTREME COLD"),]$EVTYPE <- "EXTREME COLD/WIND CHILL"

#LANDSLIDE should be renamed to "DEBRIS FLOW"
SubData[(SubData$EVTYPE == "LANDSLIDE"),]$EVTYPE <- "DEBRIS FLOW"
SubData$EVTYPE <- gsub("LANDSLIDE", "DEBRIS FLOW", SubData$EVTYPE)

#FOG to DENSE FOG
SubData[(SubData$EVTYPE == "FOG"),]$EVTYPE <- "DENSE FOG"

#WIND to STRONG WIND or HIGH WIND based on MAG value from dataset - those with MAG values
#greater than 50 assigned to "HIGH WIND", those with values less than 50 assigned to STRONG WIND
SubData[(SubData$EVTYPE == "WIND" & SubData$MAG > 50),]$EVTYPE <- "HIGH WIND"
SubData[(SubData$EVTYPE == "WIND" & SubData$MAG <= 50),]$EVTYPE <- "STRONG WIND"

#STORM SURGE 
SubData[(SubData$EVTYPE == "STORM SURGE"),]$EVTYPE <- "STORM SURGE/TIDE"

#HEAVY SURF/HIGH SURF
SubData[(SubData$EVTYPE == "HEAVY SURF/HIGH SURF"),]$EVTYPE <- "HIGH SURF"

In the final code chunk in the Event Types section, the string distances are recalculated. Visual inspection confirmed that all of of the unique EVTYPES with a stringdist of less than three were matched to the correct event from the NWS Event Type vector. The EVTYPES in the data were then updated. The remaining rows of data that had a stringdist of three or more were dropped from the data.

#Re-calculate the string distances
UniqueEvt <- unique(SubData$EVTYPE)
UnqEVtype <- c()
TxtDist <- c()
NewEvt <- c() 

for (i in 1:length(UniqueEvt)) {
    Distances <- stringdist(UniqueEvt[i], EventTypes)
    minDist <- min(Distances)
    minEventType <- EventTypes[which.min(Distances)]
    UnqEVtype <- c(UnqEVtype, UniqueEvt[i])
    TxtDist <- c(TxtDist, minDist)
    NewEvt <- c(NewEvt, minEventType)
}

UnqEvts <- data.frame(EVTYPE = UnqEVtype, TXTDIST = TxtDist, NEWEVT = NewEvt, 
                      stringsAsFactors = FALSE)
DiffEvts3up <- UnqEvts[(UnqEvts$TXTDIST > 2),]

#off by less than three -- Check again after modifications to confirm that all of these are matched 
#to the correct event from the NWS Event Type vector.  Update EVTYPES in Subdata
OffByUnder3 <- UnqEvts[(UnqEvts$TXTDIST < 3 & UnqEvts$TXTDIST > 0),]
OffByUnder3

##                  EVTYPE TXTDIST            NEWEVT
## 17         RIP CURRENTS       1       RIP CURRENT
## 31   THUNDERSTORM WINDS       1 THUNDERSTORM WIND
## 46         COASTALFLOOD       1     COASTAL FLOOD
## 77         STRONG WINDS       1       STRONG WIND
## 92        COASTAL FLOOD       1     COASTAL FLOOD
## 157 THUNDERSTORM WIND G       2 THUNDERSTORM WIND
## 167  THUNDERSTORM WIND        1 THUNDERSTORM WIND
## 172    THUNDERSTORM WND       1 THUNDERSTORM WIND
## 173   THUNDERSTORM WIND       1 THUNDERSTORM WIND
## 185    LAKE EFFECT SNOW       1  LAKE-EFFECT SNOW
## 196       FUNNEL CLOUDS       1      FUNNEL CLOUD
## 197         WATERSPOUTS       1        WATERSPOUT
## 210        DEBRIS FLOWS       1       DEBRIS FLOW
## 213          HIGH WINDS       1         HIGH WIND
## 215           LIGHTNING       1         LIGHTNING
## 227         HIGH WIND G       2         HIGH WIND
## 310         FLASH FLOOD       1       FLASH FLOOD
## 323          WATERSPOUT       1        WATERSPOUT
## 334          DUST DEVEL       1        DUST DEVIL

for (i in 1:nrow(OffByUnder3)) {
    SubData[(SubData$EVTYPE == OffByUnder3$EVTYPE[i]),]$EVTYPE <- OffByUnder3$NEWEVT[i]
}

#Count the number of Event types in the SubData that have a stringdist of 3 or more.
sum(SubData$EVTYPE %in% DiffEvts3up$EVTYPE) #3074 rows will be dropped

## [1] 3074

#3074 out of the 652481 remaining observations in the dataset is less than 0.5%.  This small 
#percentage of observations will be dropped from the dataset as cleaning them will be excessively time consuming
SubData <- SubData[(!(SubData$EVTYPE %in% DiffEvts3up$EVTYPE)),]

UnqFreq <- as.data.frame(table(SubData$EVTYPE)) #48 Unique Event Types remain

Results

Results: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The following code chunk sums the total fatalities by event type and the total injuries by event type. The top ten of each are shown in the table below. Excessive Heat and Tornadoes are the most fatal events by quite a bit in the U.S. as a whole. Also as shown in the table, Tornadoes cause more than three times the number of injuries that the next event type (Flood).

Fatalities <- aggregate(FATALITIES~EVTYPE, data = SubData, sum)
Fatalities <- Fatalities[order(Fatalities$FATALITIES, decreasing = TRUE),]
Injuries <- aggregate(INJURIES~EVTYPE, data = SubData, sum)
Injuries <- Injuries[order(Injuries$INJURIES, decreasing = TRUE),]
TopTen <- cbind(Fatalities[c(1:10),], Injuries[c(1:10),])

kable(head(TopTen, n=10), format = "html", row.names = FALSE, 
      col.names = c("Event Type", "Total Fatalities", "Event Type", "Total Injuries"), 
      align = c("l", "c", "l", "c"), 
      caption = c("Top Ten Event Types that are Most Harmful to Human Health in the US"))

Top Ten Event Types that are Most Harmful to Human Health in the US
Event Type	Total Fatalities	Event Type	Total Injuries
EXCESSIVE HEAT	1797	TORNADO	20667
TORNADO	1511	FLOOD	6758
FLASH FLOOD	887	EXCESSIVE HEAT	6391
LIGHTNING	650	THUNDERSTORM WIND	5058
RIP CURRENT	542	LIGHTNING	4140
FLOOD	414	FLASH FLOOD	1674
THUNDERSTORM WIND	376	WILDFIRE	1456
EXTREME COLD/WIND CHILL	240	HURRICANE/TYPHOON	1328
HEAT	237	WINTER STORM	1292
HIGH WIND	235	HEAT	1222

Total Fatalities by Event Type for U.S. States and Territories

The map (Figure 1) below shows the most fatal event type for each of the contiguous 48 states, and the table provides the information for the states and territories that are not on the map but for which there is data. The map shows some patterns that we would expect to see - Tornadoes are most fatal in the areas that see large numbers of tornadoes. One interesting note is that Wildfires are not a most fatal event for any state, nor are Hurricane/Typhoons. It seems likely that this is due to effective evacuation efforts and some ability to provide warning.

StatesCensus <- read.table("state.txt", header = TRUE, sep = "|")
names(StatesCensus) <- c("STATE_CODE", "STATE", "STATE_NAME", "STATENS")

FatalState <- aggregate(FATALITIES~STATE + EVTYPE, data = SubData, sum)
FatalState.agg <- aggregate(FATALITIES~STATE, data=FatalState, max)
FatalState.max <- merge(FatalState.agg, FatalState)

#Merge the Maximum dataset to the US Census Bureau state data based on the State abbrevations 
FatalState.max <- merge(FatalState.max, StatesCensus, by = "STATE")
ForMapFtSt.max <- FatalState.max
ForMapFtSt.max$STATE_NAME <- tolower(ForMapFtSt.max$STATE_NAME)

if (require("maps")) {
    states <- map_data("state")
    names(states) <- c("long", "lat", "group", "order","STATE_NAME", "subregion")
    choro <- merge(ForMapFtSt.max, states, sort = FALSE, by = "STATE_NAME")
    choro <- choro[order(choro$order), ]
    colors <- rainbow(length(unique(choro$EVTYPE)))
    qplot(long, lat, data = choro, group = group, fill = EVTYPE, xlab = "", ylab = "", 
          geom = "polygon", main = "Figure 1: Most Fatal Event Type for U.S. States")
}

NonLower48F <- FatalState.max[!(ForMapFtSt.max$STATE_NAME %in% states$STATE_NAME),]
NonLower48F <- NonLower48F[,c(5, 3, 2)]
kable(head(NonLower48F, n=nrow(NonLower48F)), format = "html", row.names = FALSE, 
      col.names = c("State or Territory", "Event Type", "Total Fatalities"), 
      align = c("l", "l", "c"), 
      caption = c("Total Fatalities by Event Type for Regions not included in Map"))

Total Fatalities by Event Type for Regions not included in Map
State or Territory	Event Type	Total Fatalities
Alaska	AVALANCHE	33
American Samoa	TSUNAMI	32
Guam	RIP CURRENT	38
Hawaii	HIGH SURF	17
Puerto Rico	FLASH FLOOD	34
U.S. Virgin Islands	HIGH SURF	3

Total Injuries by Event Type for U.S. States and Territories

This section maps the event type that had the greatest number of injuries for each of the contiguous 48 states, and the table provides the information for the states and territories that are not on the map but for which there is data. Similarly to the previous map, we see some patterns that we expect to see, however it is interesting to note that some of the unexpected results (Heat in Michigan) are likely due to lack of preparation for unexpected weather events.

InjState <- aggregate(INJURIES~STATE + EVTYPE, data = SubData, sum)
InjState.agg <- aggregate(INJURIES~STATE, data=InjState, max)
InjState.max <- merge(InjState.agg, InjState)

#Merge the Maximum dataset to the US Census Bureau state data based on the State abbrevations 
InjState.max <- merge(InjState.max, StatesCensus, by = "STATE")
ForMapIjSt.max <- InjState.max
ForMapIjSt.max$STATE_NAME <- tolower(ForMapIjSt.max$STATE_NAME)

if (require("maps")) {
    states <- map_data("state")
    names(states) <- c("long", "lat", "group", "order","STATE_NAME", "subregion")
    choro <- merge(ForMapIjSt.max, states, sort = FALSE, by = "STATE_NAME")
    choro <- choro[order(choro$order), ]
    colors <- rainbow(length(unique(choro$EVTYPE)))
    qplot(long, lat, data = choro, group = group, fill = EVTYPE, xlab = "", ylab = "", 
          geom = "polygon", main = "Figure 2: Weather Event Type with Most Injuries for U.S. States")
}

NonLower48I <- InjState.max[!(ForMapIjSt.max$STATE_NAME %in% states$STATE_NAME),]
NonLower48I <- NonLower48I[,c(5, 3, 2)]
kable(head(NonLower48I, n=nrow(NonLower48I)), format = "html", row.names = FALSE, 
      col.names = c("State or Territory", "Event Type", "Total Injuries"), 
      align = c("l", "l", "c"), 
      caption = c("Total Injuries by Event Type for Regions not included in Map"))

Total Injuries by Event Type for Regions not included in Map
State or Territory	Event Type	Total Injuries
Alaska	ICE STORM	34
American Samoa	TSUNAMI	129
Guam	HURRICANE/TYPHOON	339
Hawaii	HIGH SURF	28
Puerto Rico	HEAVY RAIN	10
U.S. Virgin Islands	RIP CURRENT	1
U.S. Virgin Islands	LIGHTNING	1

Results: Across the United States, which types of events have the greatest economic consequences?

This code chunk aggregates total economic damage by event type. The top twenty most costly weather event types are listed in the table below. Given the propensity that of Americans for building near water sources, it is not surprising that the top three would all occur especially near water sources. It seems that limiting building properties and businesses on or very near water sources could help reduce the economic costs of Flood events, Hurricane/Typhoon events, and Storm Surges/Tide events.

TecnDMGst <- aggregate(TtlEcnDmg~EVTYPE, data = SubData, sum)
TecnDMGst <- TecnDMGst[order(TecnDMGst$TtlEcnDmg, decreasing = TRUE),]

TopTwenty <- TecnDMGst[1:20,]

kable(head(TopTwenty, n=20), format = "html", row.names = FALSE, 
      col.names = c("Event Type", "Approximate Total Economic Cost"), 
      align = c("l", "c"), 
      caption = c("Top Twenty Event Types by Approximate Total 
                  Economic Cost (in Billions) Across the U.S."))

Top Twenty Event Types by Approximate Total Economic Cost (in Billions) Across the U.S.
Event Type	Approximate Total Economic Cost
FLOOD	284.98
HURRICANE/TYPHOON	163.34
STORM SURGE/TIDE	95.61
TORNADO	45.30
HAIL	27.15
FLASH FLOOD	26.46
TROPICAL STORM	15.10
WILDFIRE	15.04
THUNDERSTORM WIND	10.36
HIGH WIND	9.73
ICE STORM	7.11
WINTER STORM	2.73
DROUGHT	2.08
HEAVY RAIN	1.06
HEAVY SNOW	1.03
BLIZZARD	0.97
DEBRIS FLOW	0.58
COASTAL FLOOD	0.46
TSUNAMI	0.29
LIGHTNING	0.26

Most Damaging Event Type and Total Cost for U.S. States and Territories This code chunk calculates the most damaging event type per state. The table below shows each state, the most economically damaging event type for that state, and the total cost of that event type in billions for the state.

CostState <- aggregate(TtlEcnDmg~STATE + EVTYPE, data = SubData, sum)
CostState.agg <- aggregate(TtlEcnDmg~STATE, data=CostState, max)
CostState.max <- merge(CostState.agg, CostState)
CostState.max <- merge(CostState.max, StatesCensus, by = "STATE")

ForTable.max <- CostState.max[,c(5, 3, 2)]

kable(ForTable.max, format = "html", row.names = FALSE,
      col.names = c("State or Territory", "Event Type", "Total Cost"), 
      align = c("l", "l", "c"), 
      caption = c("Most Damaging Event Type and Total Cost in Billions by State"))

Most Damaging Event Type and Total Cost in Billions by State
State or Territory	Event Type	Total Cost
Alaska	FLOOD	0.24
Alabama	TORNADO	9.78
Arkansas	TORNADO	2.94
American Samoa	TSUNAMI	0.16
Arizona	HAIL	5.66
California	FLOOD	233.41
Colorado	HAIL	2.70
Connecticut	TROPICAL STORM	0.12
District of Columbia	TROPICAL STORM	0.25
Delaware	COASTAL FLOOD	0.08
Florida	HURRICANE/TYPHOON	56.88
Georgia	TORNADO	1.69
Guam	HURRICANE/TYPHOON	1.72
Hawaii	FLASH FLOOD	0.30
Iowa	FLOOD	2.44
Idaho	FLOOD	0.21
Illinois	FLASH FLOOD	1.49
Indiana	FLOOD	1.57
Kansas	TORNADO	1.32
Kentucky	HAIL	1.20
Louisiana	STORM SURGE/TIDE	63.65
Massachusetts	TORNADO	0.92
Maryland	TROPICAL STORM	1.06
Maine	ICE STORM	0.64
Michigan	TORNADO	0.60
Minnesota	FLOOD	2.47
Missouri	TORNADO	7.05
Mississippi	HURRICANE/TYPHOON	28.35
Montana	HAIL	0.18
North Carolina	HURRICANE/TYPHOON	11.01
North Dakota	FLOOD	7.72
Nebraska	HAIL	1.60
New Hampshire	ICE STORM	0.12
New Jersey	FLOOD	4.18
New Mexico	WILDFIRE	3.07
Nevada	FLOOD	1.34
New York	FLASH FLOOD	3.29
Ohio	FLASH FLOOD	2.21
Oklahoma	TORNADO	3.35
Oregon	FLOOD	1.42
Pennsylvania	FLASH FLOOD	2.67
Puerto Rico	HURRICANE/TYPHOON	3.65
Rhode Island	FLOOD	0.18
South Carolina	ICE STORM	0.30
South Dakota	BLIZZARD	0.12
Tennessee	FLOOD	8.44
Texas	TROPICAL STORM	10.95
Utah	FLOOD	0.65
Virginia	HURRICANE/TYPHOON	1.27
U.S. Virgin Islands	HURRICANE/TYPHOON	0.05
Vermont	FLOOD	2.12
Washington	FLOOD	0.39
Wisconsin	HAIL	1.82
West Virginia	FLASH FLOOD	0.78
Wyoming	HAIL	0.20

Most Economically Damaging Event Type for U.S. States and Territories This section maps the event type that had the greatest economic cost for each of the contiguous 48 states. The table above includes the information for the states and territories that are not on the map but for which there is data. Similar to Figure 2 above, there are some patterns here that one would expect to see, such as Tornadoes in OK, MO, AR, and KS. However, again, given that Coastal Flood, Flash Flood, Flood, Hurricane/Typhoon, Storm Surge/Tide, and Tropical Storm are the most economically damaging for 35 out of the 55 states and territories, it seems that planning for Flood control and Hurricane/Typhoon/Tsunami preparedness is important.

ForMapCostSt.max <- CostState.max
ForMapCostSt.max$STATE_NAME <- tolower(ForMapCostSt.max$STATE_NAME)
if (require("maps")) {
    states <- map_data("state")
    names(states) <- c("long", "lat", "group", "order","STATE_NAME", "subregion")
    choro <- merge(ForMapCostSt.max, states, sort = FALSE, by = "STATE_NAME")
    choro <- choro[order(choro$order), ]
    colors <- rainbow(length(unique(choro$EVTYPE)))
    mapping <- qplot(long, lat, data = choro, group = group, fill = EVTYPE, geom = "polygon", 
                     xlab = "", ylab = "", 
                     main = "Figure 3: Weather Event Type with Greatest 
                                Economic Cost for U.S. States")
    mapping + scale_fill_brewer(palette="Set3")
}

uniqueCostEVTYPES <- unique(CostState.max$EVTYPE)
uniqueCostEVTYPES

##  [1] "FLOOD"             "TORNADO"           "TSUNAMI"          
##  [4] "HAIL"              "TROPICAL STORM"    "COASTAL FLOOD"    
##  [7] "HURRICANE/TYPHOON" "FLASH FLOOD"       "STORM SURGE/TIDE" 
## [10] "ICE STORM"         "WILDFIRE"          "BLIZZARD"

Watery <- c("COASTAL FLOOD", "FLASH FLOOD", "FLOOD", "HURRICANE/TYPHOON", "STORM SURGE/TIDE",
            "TROPICAL STORM", "TSUNAMI")

WaterRelated <- ForMapCostSt.max[(ForMapCostSt.max$EVTYPE %in% Watery), c(1:3)]
nrow(WaterRelated) #35 States/Territories

## [1] 35