Synopsis

In this report we aim to discover which major weather events across the United States have the greatest economic and public health impact. We used the National Weather Service (NWS) database from 1996 to 2011, the most complete years available, which summarizes all major weather events, enumerating deaths, injuries, and property and crop damage. These factors were grouped by the NWS official weather event categories to find the types of events with the biggest impact. When looking at health factors (deaths and injuries), we found that ten weather event categories caused 85% of the human suffering. In order of health impact, these are tornadoes, excessive heat, floods, thunderstorm winds, lightning, flash floods, wildfires, winter storms, heat and hurricanes. Economic impact was measured by adding the cost of property damage and crop damage. The top ten weather event categories for monetary damage accounted for 93% of the total cost: hurricanes, storm surges, floods, tornadoes, hail, flash floods, drought, thunderstorm winds, tropical storms, and wildfires. Overall, coastal events have the largest economic impact. The economic data was also adjusted for inflation, but no change in the order of these top ten categories resulted.

Data Processing

Reading Weather Data

We first read in the complete storm data from 1950 to 2011 provided by the National Weather Service in a compressed raw text file. The dates were converted to POSIX format. A subset of the data were selected including just the dates from 1996 to 2011, the records that include all event types per NOAA. In addition only variables needed for this analysis were kept to speed processing time: date, event type, number of fatalities, number of injuries, property and crop damage and their corresponding exponentials, and the remarks to help in cleaning up the categories. In addition, observations with no economic or health impact were cut, since this analysis was only looking at those factors.

library(curl);library(ggplot2);library(XML);library(reshape2);library(xtable)

fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileURL,"StormData.csv.bz2")
raw_storms <- read.csv("StormData.csv.bz2", na.strings = "")
raw_storms$BGN_DATE <- strptime(as.character(raw_storms$BGN_DATE), "%m/%d/%Y %H:%M:%S")
initial_size <- dim(raw_storms)
initial_size
## [1] 902297     37
#per http://www.ncdc.noaa.gov/stormevents/details.jsp only 1996 and forward have all 48 factors
storms <- raw_storms[raw_storms$BGN_DATE>=as.POSIXlt("1996/1/1"), c("BGN_DATE", "EVTYPE","FATALITIES","INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","REMARKS")]

#Eliminate records with no damage
storms <- storms[!(storms$FATALITIES == 0 & storms$INJURIES == 0 & storms$PROPDMG == 0 & storms$CROPDMG == 0),]
small_size <- dim(storms)
small_size
## [1] 201318      9

The original data frame had 902297 observations and 37 variables. The reduced data frame used for this analysis had 201318 observations and 9 variables. The storm data required extensive pre-processing as explained in the sections below.

Calculating Damage

Damage to both property and crops were stored as two variables each: a base value and an exponent code. These variables were used to arrive at single variable for each type of damage. The columns in the data frame were reordered to keep the lengthy remarks at the end.

#Apply exponents to base damage numbers
storms$PROPTOTAL <- 
      ifelse(is.na(storms$PROPDMGEXP),storms$PROPDMG,
             ifelse(storms$PROPDMGEXP == "K",storms$PROPDMG*10^3,
                    ifelse(storms$PROPDMGEXP == "M",storms$PROPDMG*10^6,
                           ifelse(storms$PROPDMGEXP == "B",storms$PROPDMG*10^9,storms$PROPDMG))))


storms$CROPTOTAL <- 
      ifelse(is.na(storms$CROPDMGEXP),storms$CROPDMG,
            ifelse(storms$CROPDMGEXP == "K",storms$CROPDMG*10^3,
                  ifelse(storms$CROPDMGEXP == "M",storms$CROPDMG*10^6,
                        ifelse(storms$CROPDMGEXP == "B",storms$CROPDMG*10^9,storms$CROPDMG))))

#columns of storms reordered for easy viewing
storms <- storms[,c(1:8,10:11,9)]

Checking Data for Errors

The damage conversion was spot-checked, and summaries of the numeric data were run to check for errors or inconsistencies.

head(storms)
##          BGN_DATE       EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 248768 1996-01-06 WINTER STORM          0        0     380          K
## 248769 1996-01-11      TORNADO          0        0     100          K
## 248770 1996-01-11    TSTM WIND          0        0       3          K
## 248771 1996-01-11    TSTM WIND          0        0       5          K
## 248772 1996-01-11    TSTM WIND          0        0       2          K
## 248774 1996-01-18    HIGH WIND          0        0     400          K
##        CROPDMG CROPDMGEXP PROPTOTAL CROPTOTAL
## 248768      38          K    380000     38000
## 248769       0       <NA>    100000         0
## 248770       0       <NA>      3000         0
## 248771       0       <NA>      5000         0
## 248772       0       <NA>      2000         0
## 248774       0       <NA>    400000         0
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            REMARKS
## 248768 A winter storm brought a mixture of freezing rain, sleet, and snow to the northern two-thirds of Alabama.  Precipitation began as freezing rain and sleet but quickly changed to snow.  The precipitation coated roads and caused serious travel problems across the northern sections of thestate that lasted into Monday morning (the 8th).  Some higher elevations of the northeast corner of Alabama had travel problems into Tuesday.  Amounts were generally light with the highest snowfall reported at Huntsville International Airport with 2 inches.  Most other locations across North Alabama reported one-quarter of an inch to an inch and a half.  On Sunday the 7th, one fatality occurred in an automobile/train collision in Calhoun County that was attributed to icy roads.  The teenage driver of the car was not wearing a seat belt and was thrown from the vehicle.
## 248769                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 A tornado destroyed 4 house trailers that were unoccupied. Debris was scattered for about 1 mile, according to county emergency management.
## 248770                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         Several trees were blown down and two backyard sheds were destroyed according to newspaper reports and county emergency management.
## 248771                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      County emergency management confirmed that three sheds were destroyed, and several houses received superficial damage.
## 248772                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      County emergency management reported that a porch roof was lifted off a home and a shed was destroyed.
## 248774                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             High wind over the northern third of Alabama generally along a cold front caused widespread scattered damage.  Damage was primarily to downed trees, limbs, and power lines but some roof damage occurred in several locations.
#Check summary data for metrics
summary(storms$CROPTOTAL)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 0.000e+00 0.000e+00 1.726e+05 0.000e+00 1.510e+09
summary(storms$PROPTOTAL)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.000e+00 2.000e+03 9.000e+03 1.822e+06 3.000e+04 1.150e+11
summary(storms$FATALITIES)
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
##   0.00000   0.00000   0.00000   0.04337   0.00000 158.00000
summary(storms$INJURIES)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    0.000    0.000    0.000    0.288    0.000 1150.000

The high value for property damage seemed out of proportion. When the remarks for that entry were checked, it turned out that the exponent for that event (a flood in Napa Valley) should have represented millions instead of billions of dollars. This error was fixed since it would affect the analysis. Once that outlier was fixed, the maximums in all four variables matched the remarks and represented the known catastrophic events of Hurricane Katrina and the 2011 tornadoes in Joplin, Missouri.

#Check high number for property damage
MAXPROPDMG <- which(storms$PROPTOTAL==max(storms$PROPTOTAL))
#Miscoded Napa Valley Flood
storms[MAXPROPDMG,]
##          BGN_DATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 605953 2006-01-01  FLOOD          0        0     115          B    32.5
##        CROPDMGEXP PROPTOTAL CROPTOTAL
## 605953          M  1.15e+11  32500000
##                                                                                                                                                                                                                                                                                                                                                                                               REMARKS
## 605953 Major flooding continued into the early hours of January 1st, before the Napa River finally fell below flood stage and the water receeded. Flooding was severe in Downtown Napa from the Napa Creek and the City and Parks Department was hit with $6 million in damage alone. The City of Napa had 600 homes with moderate damage, 150 damaged businesses with costs of at least $70 million.
#Remarks show that the Napa Valley flood damage should be "M" multiplier not "B"
storms[MAXPROPDMG,]$PROPDMGEXP <- "M"
storms$PROPTOTAL[MAXPROPDMG] <- storms$PROPDMG[MAXPROPDMG]*10^6

Reading Factors for Inflation

Since inflation factors may effect damage analysis, we read in the Consumer Price Index (CPI) data to be able to make inflation adjustments to economic data by year. The data was read from an html table and only the variables of year and average CPI for that year were kept. No other processing was required for this data. Note: Despite the URL link text, this inflation table is updated each month.

inflationURL = "http://www.usinflationcalculator.com/inflation/consumer-price-index-and-annual-percent-changes-from-1913-to-2008/"
doc.html = htmlTreeParse(inflationURL, useInternal = TRUE)
inf <- readHTMLTable(doc.html, as.data.frame = TRUE, skip.rows = 2, stringsAsFactors = FALSE)[[1]][,c(1,14)]
colnames(inf) <- c("Year","CPI")
inf$CPI <- as.numeric(inf$CPI)
head(inf)
##   Year  CPI
## 1        NA
## 2 1913  9.9
## 3 1914 10.0
## 4 1915 10.1
## 5 1916 10.9
## 6 1917 12.8

Creating Inflation-adjusted Variables

Next, additional property and crop damage estimates were made to determine the effect on the analysis of inflation. The values were normalized to 2011 dollars since that is the last year in the dataset. Columns were reordered again to keep remarks at the end. Changed data was reviewed.

#Add an inflation factor to the total damage to see if it makes a difference in economic rankings
storms$Year <- as.character(storms$BGN_DATE$year+1900)
storms <- merge(x=storms, y = inf, by.x = "Year", by.y = "Year")
CPI2011 <- inf[inf$Year == 2011,]$CPI
storms$PROPTOTAL2011dol <- storms$PROPTOTAL*CPI2011/storms$CPI
storms$CROPTOTAL2011dol <- storms$CROPTOTAL*CPI2011/storms$CPI
storms <- storms[,c(1:11,13:15,12)]
head(storms[6:14])
##   PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP PROPTOTAL CROPTOTAL   CPI
## 1     380          K      38          K    380000     38000 156.9
## 2     100          K       0       <NA>    100000         0 156.9
## 3       3          K       0       <NA>      3000         0 156.9
## 4       5          K       0       <NA>      5000         0 156.9
## 5       2          K       0       <NA>      2000         0 156.9
## 6     400          K       0       <NA>    400000         0 156.9
##   PROPTOTAL2011dol CROPTOTAL2011dol
## 1       544785.341         54478.53
## 2       143364.563             0.00
## 3         4300.937             0.00
## 4         7168.228             0.00
## 5         2867.291             0.00
## 6       573458.254             0.00

Cleaning and Categorizing Weather Events

The biggest challenge in cleaning this data was getting consistent factors for the event types. The National Weather Service Directive 10-1605 lists 48 factors that should classify all weather events. For this analysis we attempted to classify all of events in our data as these 48 event types listed below.

##Now turn to the 48 event type factors
#list of the 48 factors from NWS Directive 10-1605, plus OTHER for those unclassified
events_cap <- c("ASTRONOMICAL LOW TIDE","AVALANCHE","BLIZZARD","COASTAL FLOOD",
                "COLD/WIND CHILL","DEBRIS FLOW","DENSE FOG","DENSE SMOKE","DROUGHT",
                "DUST DEVIL","DUST STORM","EXCESSIVE HEAT","EXTREME COLD/WIND CHILL",
                "FLASH FLOOD","FLOOD","FROST/FREEZE","FUNNEL CLOUD","FREEZING FOG",
                "HAIL","HEAT","HEAVY RAIN","HEAVY SNOW","HIGH SURF","HIGH WIND","HURRICANE/TYPHOON",
                "ICE STORM","LAKE-EFFECT SNOW","LAKESHORE FLOOD","LIGHTNING","MARINE HAIL",
                "MARINE HIGH WIND","MARINE STRONG WIND","MARINE THUNDERSTORM WIND","RIP CURRENT",
                "SEICHE","SLEET","STORM SURGE/TIDE","STRONG WIND","THUNDERSTORM WIND","TORNADO",
                "TROPICAL DEPRESSION","TROPICAL STORM","TSUNAMI","VOLCANIC ASH","WATERSPOUT",
                "WILDFIRE","WINTER STORM","WINTER WEATHER")
events_cap   
##  [1] "ASTRONOMICAL LOW TIDE"    "AVALANCHE"               
##  [3] "BLIZZARD"                 "COASTAL FLOOD"           
##  [5] "COLD/WIND CHILL"          "DEBRIS FLOW"             
##  [7] "DENSE FOG"                "DENSE SMOKE"             
##  [9] "DROUGHT"                  "DUST DEVIL"              
## [11] "DUST STORM"               "EXCESSIVE HEAT"          
## [13] "EXTREME COLD/WIND CHILL"  "FLASH FLOOD"             
## [15] "FLOOD"                    "FROST/FREEZE"            
## [17] "FUNNEL CLOUD"             "FREEZING FOG"            
## [19] "HAIL"                     "HEAT"                    
## [21] "HEAVY RAIN"               "HEAVY SNOW"              
## [23] "HIGH SURF"                "HIGH WIND"               
## [25] "HURRICANE/TYPHOON"        "ICE STORM"               
## [27] "LAKE-EFFECT SNOW"         "LAKESHORE FLOOD"         
## [29] "LIGHTNING"                "MARINE HAIL"             
## [31] "MARINE HIGH WIND"         "MARINE STRONG WIND"      
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"             
## [35] "SEICHE"                   "SLEET"                   
## [37] "STORM SURGE/TIDE"         "STRONG WIND"             
## [39] "THUNDERSTORM WIND"        "TORNADO"                 
## [41] "TROPICAL DEPRESSION"      "TROPICAL STORM"          
## [43] "TSUNAMI"                  "VOLCANIC ASH"            
## [45] "WATERSPOUT"               "WILDFIRE"                
## [47] "WINTER STORM"             "WINTER WEATHER"
#Check how many unique event types we have now.
num_types_0 <- as.numeric(length(levels(storms$EVTYPE)))
#985 event types

At the beginning, before any editing, the raw data had 985 distinct event types. Removing rows with no damages, injuries, or deaths eliminated some of the variety. Next, the event type variable was changed from a factor variable to a character variable for manipulation. The first transformations involved changing the categories to all to the same case (upper), trimming white space from the ends, and eliminating multiple spaces within category names.

#There are a large variety of errors in this list so before analysis it needed to be cleaned up

storms$EVTYPE <- as.character(storms$EVTYPE)     #Change factor event types to character to edit

#Clean up names by changing all to the same case, removing leading, trailing and extra spaces
storms$EVTYPE <- toupper(storms$EVTYPE)         
storms$EVTYPE <- trimws(storms$EVTYPE)           
storms$EVTYPE  <- gsub(" {2,}"," ", storms$EVTYPE) 

#Check how many unique event types we have now.
storms$EVTYPE <- as.factor(storms$EVTYPE)
num_types_1 <- as.numeric(length(levels(storms$EVTYPE)))
#181 event types
storms$EVTYPE <- as.character(storms$EVTYPE) 

After some review of the 181 left, we found there were many misspellings, alternate spellings, and abbreviations used. These were altered to be consistent with the 48 official categories.

#Abbreviations or alternate/misspellings
storms$EVTYPE  <- gsub("TSTM" ,"THUNDERSTORM", storms$EVTYPE)
storms$EVTYPE  <- gsub("THUNDERSTORMS" ,"THUNDERSTORM", storms$EVTYPE)
storms$EVTYPE  <- gsub("WINDS" ,"WIND", storms$EVTYPE)
storms$EVTYPE  <- gsub("WND" ,"WIND", storms$EVTYPE)
storms$EVTYPE  <- gsub("VOG" ,"FOG", storms$EVTYPE)
storms$EVTYPE  <- gsub("CURRENTS" ,"CURRENT", storms$EVTYPE)
storms$EVTYPE  <- gsub("WATERSPOUTS" ,"WATERSPOUT", storms$EVTYPE)
storms$EVTYPE  <- gsub("FLOODS" ,"FLOOD", storms$EVTYPE)
storms$EVTYPE  <- gsub("FLOODING" ,"FLOOD", storms$EVTYPE)
storms$EVTYPE  <- gsub("CLOUDS" ,"CLOUD", storms$EVTYPE)
storms$EVTYPE  <- gsub("FLD" ,"FLOOD", storms$EVTYPE)
storms$EVTYPE  <- gsub("FLDG" ,"FLOOD", storms$EVTYPE)
storms$EVTYPE  <- gsub("FLOODG" ,"FLOOD", storms$EVTYPE)
storms$EVTYPE  <- gsub("SML" ,"SMALL", storms$EVTYPE)
storms$EVTYPE  <- gsub("DEVEL" ,"DEVIL", storms$EVTYPE)
storms$EVTYPE  <- gsub("CSTL" ,"COASTAL", storms$EVTYPE)
storms$EVTYPE <- gsub("COASTALFLOOD", "COASTAL FLOOD",storms$EVTYPE)
storms$EVTYPE <- gsub("WINTRY", "WINTER",storms$EVTYPE)
storms$EVTYPE <- gsub("WINTERY", "WINTER",storms$EVTYPE)
storms$EVTYPE <- gsub("MICOBURST", "MICROBURST",storms$EVTYPE)
storms$EVTYPE <- gsub("LAKE EFFECT SNOW", "LAKE-EFFECT SNOW",storms$EVTYPE)
storms$EVTYPE  <- gsub("SLIDES" ,"SLIDE", storms$EVTYPE)

Next, more of the events were collected into the proper categories by just a review of the event type names. For example, “Hurricane” was put into the “Hurricane/Typhoon” category. Care was taken not to replace categories by using grep or exact matches where appropriate.

#Collect events into the 48 official types by looking for strings or replacing names
#care is taken to not replace vaild categories e.g. WIND CHILL replacing EXTREME WIND CHILL
storms$EVTYPE <- ifelse(storms$EVTYPE=="WILD/FOREST FIRE", "WILDFIRE",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("HURRICANE",storms$EVTYPE), "HURRICANE/TYPHOON",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("TYPHOON",storms$EVTYPE), "HURRICANE/TYPHOON",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("COASTAL FLOOD",storms$EVTYPE), "COASTAL FLOOD",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("WINTER WEATHER",storms$EVTYPE), "WINTER WEATHER",storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE=="STORM SURGE", "STORM SURGE/TIDE",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("FROST",storms$EVTYPE), "FROST/FREEZE",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("FREEZE",storms$EVTYPE), "FROST/FREEZE",storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE=="FOG", "DENSE FOG",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("THUNDERSTORM WIND \\(",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("THUNDERSTORM WIND G",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("THUNDERSTORM WIND 4",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("HIGH WIND \\(",storms$EVTYPE), "HIGH WIND",storms$EVTYPE)

#Check how many unique event types we have now.
storms$EVTYPE <- as.factor(storms$EVTYPE)
num_types_3 <- as.numeric(length(levels(storms$EVTYPE)))
#144 event types
storms$EVTYPE <- as.character(storms$EVTYPE)

After the obvious edits, we were left with 144 categories and some more work was required to place weather events appropriately. Researching remarks and the NWS Directive 10-1605 gave more transformations.

#After researching the codebook and remarks extensively, the following recodings were appropriate
storms$EVTYPE <- ifelse(storms$EVTYPE=="COLD", "COLD/WIND CHILL",storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE=="COLD TEMPERATURE", "COLD/WIND CHILL",storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE=="COLD WEATHER", "COLD/WIND CHILL",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("LANDSLIDE",storms$EVTYPE), "DEBRIS FLOW",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("MUDSLIDE",storms$EVTYPE), "DEBRIS FLOW",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("MUD SLIDE",storms$EVTYPE), "DEBRIS FLOW",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("ROCK SLIDE",storms$EVTYPE), "DEBRIS FLOW",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("LANDSLUMP",storms$EVTYPE), "DEBRIS FLOW",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("EXTREME COLD",storms$EVTYPE), "EXTREME COLD/WIND CHILL",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("EXTREME WIND",storms$EVTYPE), "EXTREME COLD/WIND CHILL",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("HYPOTHERMIA",storms$EVTYPE), "EXTREME COLD/WIND CHILL",storms$EVTYPE)

storms$EVTYPE <- ifelse(storms$EVTYPE=="ASTRONOMICAL HIGH TIDE", "COASTAL FLOOD",storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE=="TIDAL FLOOD", "COASTAL FLOOD",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("DOWNBURST",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("MICROBURST",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("EXCESSIVE SNOW",storms$EVTYPE), "HEAVY SNOW",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("TORRENTIAL RAINFALL",storms$EVTYPE), "HEAVY RAIN",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("LANDSPOUT",storms$EVTYPE), "DUST DEVIL",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("WHIRLWIND",storms$EVTYPE), "DUST DEVIL",storms$EVTYPE)

storms$EVTYPE <- ifelse(grepl("LIGHT SNOW",storms$EVTYPE), "WINTER WEATHER",storms$EVTYPE)

#Check how many unique event types we have now.
storms$EVTYPE <- as.factor(storms$EVTYPE)
num_types_4 <- as.numeric(length(levels(storms$EVTYPE)))
#122 event types
storms$EVTYPE <- as.character(storms$EVTYPE)

The last set of transformations were used to classify as many more events as possible. Some assumptions were made here: Floods were classified as flash floods if the word flash appeared in the remarks. If two or three valid event types were mentioned in a name, it was assumed that the first one was the most important, and the event was put in that category.

#These re-classifications are more rough and give priority to the first item listed or use text in 
#events
storms$EVTYPE <- ifelse(storms$EVTYPE == "URBAN/SMALL STREAM FLOOD",
                        ifelse(grepl("flash",storms$REMARKS), "FLASH FLOOD","FLOOD"),storms$EVTYPE)
storms$EVTYPE <- ifelse(storms$EVTYPE == "RIVER FLOOD",
                        ifelse(grepl("flash",storms$REMARKS), "FLASH FLOOD","FLOOD"),storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^THUNDERSTORM",storms$EVTYPE), "THUNDERSTORM WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^GUSTY",storms$EVTYPE), "HIGH WIND",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^HEAVY RAIN",storms$EVTYPE), "HEAVY RAIN",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^HEAVY SNOW",storms$EVTYPE), "HEAVY SNOW",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^HIGH SURF",storms$EVTYPE), "HIGH SURF",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^FLOOD",storms$EVTYPE), "FLOOD",storms$EVTYPE)
storms$EVTYPE <- ifelse(grepl("^FLASH FLOOD",storms$EVTYPE), "FLASH FLOOD",storms$EVTYPE)

#Check how many unique event types we have now.
storms$EVTYPE <- as.factor(storms$EVTYPE)
num_types_5 <- as.numeric(length(levels(storms$EVTYPE)))
#108 event types
unclassified_events <- storms[!(storms$EVTYPE %in% events_cap),]
num_unclassified <- nrow(unclassified_events)

At this point, we had 108 categories, but only 389 events left unclassified, so we opted to put these in an “Other” category rather than going through the remarks with a fine tooth comb and risking introducing any more errors.

#The groups that were still left either contained multiple events or were ambiguous
#These were collected into an OTHER category and included to make sure it didn't contain
#a bulk of damage to health or property
storms$EVTYPE <- as.character(storms$EVTYPE)
storms$EVTYPE <- ifelse(!(storms$EVTYPE %in% events_cap), "OTHER",storms$EVTYPE)
storms$EVTYPE <- as.factor(storms$EVTYPE)

Aggregating Health Impact Data

For the impact on population health, we looked at both deaths and injuries aggregated by the weather event types. Deaths and injuries affect the population differently, but they both have significant health impact on the survivors in addition to victims. A health impact variable was calculated by adding together the deaths and injuries to measure the number of individuals impacted. This was used to sort the events by which ones had the largest health impact. The health impact data was also melted to create a factor that designated death or injury. This was used for the plot in the results.

#Aggregate health factors by event type and merge
deaths_by_event <- aggregate(list(FATALITIES = storms$FATALITIES), list(EVTYPE = storms$EVTYPE), sum)
injuries_by_event <- aggregate(list(INJURIES = storms$INJURIES), list(EVTYPE = storms$EVTYPE), sum)
health_by_event <- merge(deaths_by_event,injuries_by_event)
#Create new health impact variable to sort
health_by_event$HEALTHIMPACT <- health_by_event$FATALITIES+health_by_event$INJURIES
#Sort dataframe by health impact
health_by_event <- health_by_event[order(health_by_event$HEALTHIMPACT, decreasing = TRUE),]
#Sort factors (to select top few)
health_by_event$EVTYPE <- factor(health_by_event$EVTYPE, levels = health_by_event$EVTYPE)
#To show both types of impact on plot the data set is melted
health_by_event_melt<- melt(health_by_event, id.vars = c("EVTYPE","HEALTHIMPACT"), 
                        measure.vars = c("FATALITIES", "INJURIES"),
                        variable.name = "TYPE", 
                        value.name = "NUMBER")
#and sorted
health_by_event_melt <- health_by_event_melt[order(health_by_event_melt$HEALTHIMPACT, decreasing = TRUE),]

Aggregating Economic Impact Data

For the impact on the economy, we added the property and crop damage together to determine the total monetary impact. All of the total values were so high that we divided by 10^6 to have results in millions of dollars. The total damage value was used to sort the weather event categories according to most damage. This economic impact data was also melted to create a factor that designated property or crop damage to use in the plots.

The same transformations were done on the inflation-adjusted numbers so we could see if there were any difference in the impact categories once damage numbers were adjusted by year.

#Aggregate property damage by event type, change to millions of dollars, and merge
prop_damage_by_event <- aggregate(list(PROPTOTAL = storms$PROPTOTAL/10^6), list(EVTYPE = storms$EVTYPE), sum)
crop_damage_by_event <- aggregate(list(CROPTOTAL = storms$CROPTOTAL/10^6), list(EVTYPE = storms$EVTYPE), sum)
damage_by_event <- merge(prop_damage_by_event,crop_damage_by_event)
#Add a total damage category also in millions of dollars
damage_by_event$DMGTOTAL <- damage_by_event$PROPTOTAL + damage_by_event$CROPTOTAL
#Sort data frame by economic impact
damage_by_event <- damage_by_event[order(damage_by_event$DMGTOTAL, decreasing = TRUE),]
#Sort factors (to select top few)
damage_by_event$EVTYPE <- factor(damage_by_event$EVTYPE, levels = damage_by_event$EVTYPE)
#To show both types of damage in plots melt the damage types
damage_by_event_melt <- melt(damage_by_event, id.vars = c("EVTYPE","DMGTOTAL"), 
                        measure.vars = c("PROPTOTAL", "CROPTOTAL"),
                        variable.name = "TYPE", 
                        value.name = "DOLLARS")
#and sort again
damage_by_event_melt <- damage_by_event_melt[order(damage_by_event_melt$DMGTOTAL, decreasing = TRUE),]

#Same transformations for inflation adjusted data
prop_damage2011dol_by_event <- aggregate(list(PROPTOTAL2011dol = storms$PROPTOTAL2011dol/10^6), list(EVTYPE = storms$EVTYPE), sum)
crop_damage2011dol_by_event <- aggregate(list(CROPTOTAL2011dol = storms$CROPTOTAL2011dol/10^6), list(EVTYPE = storms$EVTYPE), sum)
damage2011dol_by_event <- merge(prop_damage2011dol_by_event,crop_damage2011dol_by_event)
damage2011dol_by_event$DMGTOTAL <- damage2011dol_by_event$PROPTOTAL2011dol + damage2011dol_by_event$CROPTOTAL2011dol
damage2011dol_by_event <- damage2011dol_by_event[order(damage2011dol_by_event$DMGTOTAL, decreasing = TRUE),]
damage2011dol_by_event$EVTYPE <- factor(damage2011dol_by_event$EVTYPE, levels = damage2011dol_by_event$EVTYPE)
damage2011dol_by_event_melt <- melt(damage2011dol_by_event, id.vars = c("EVTYPE","DMGTOTAL"), 
                        measure.vars = c("PROPTOTAL2011dol", "CROPTOTAL2011dol"),
                        variable.name = "TYPE", 
                        value.name = "DOLLARS")
damage2011dol_by_event_melt <- damage2011dol_by_event_melt[order(damage2011dol_by_event_melt$DMGTOTAL, decreasing = TRUE),]

Results

Impact on Population Health

Once the data was tidy, we were able to study the effect of different weather events on health. We chose the metric of total individuals involved (deaths plus injuries) to rank the weather events in population health impact. The table below shows the top twenty of the NWS weather event categories ranked.

#grab the top 20 health impact event categories to print in table
health_table <- health_by_event[1:20,]
percent_health_explained_1 <- 100*sum(health_by_event$HEALTHIMPACT[1])/sum(health_by_event$HEALTHIMPACT)
percent_health_explained_10<- 100*sum(health_by_event$HEALTHIMPACT[1:10])/sum(health_by_event$HEALTHIMPACT)
percent_health_explained_20 <- 100*sum(health_by_event$HEALTHIMPACT[1:20])/sum(health_by_event$HEALTHIMPACT)
#neaten the names
colnames(health_table) <- c("Event_Type", "Fatalities", "Injuries", "Number_Individuals_Impacted")
health_table <- xtable(health_table, digits = 0, display = c("d","s","d", "d", "d"),
                       caption = '<b> Top 20 Weather Event Types in Population Health Impact</b> ')
print(health_table, type = "html", include.rownames=FALSE, caption.placement = 'top')
Top 20 Weather Event Types in Population Health Impact
Event_Type Fatalities Injuries Number_Individuals_Impacted
TORNADO 1511 20667 22178
EXCESSIVE HEAT 1797 6391 8188
FLOOD 444 6838 7282
THUNDERSTORM WIND 382 5154 5536
LIGHTNING 651 4141 4792
FLASH FLOOD 887 1674 2561
WILDFIRE 87 1456 1543
WINTER STORM 191 1292 1483
HEAT 237 1222 1459
HURRICANE/TYPHOON 125 1328 1453
HIGH WIND 239 1095 1334
RIP CURRENT 542 503 1045
DENSE FOG 69 855 924
OTHER 141 731 872
HEAVY SNOW 107 702 809
HAIL 7 713 720
WINTER WEATHER 62 485 547
BLIZZARD 70 385 455
STRONG WIND 110 299 409
ICE STORM 82 318 400

.

Tornadoes caused by far the highest number of deaths and injuries across the United States, accounting alone for 33%. In addition to their incredible destructive ability, their impact across the country is probably partially due to the fact that they occur in widespread areas and are frequent events compared to hurricanes, which only impact the United States a few times a year. Excessive heat is second in total health impact, but actually causes more deaths than tornadoes. The top twenty events in the table above account for 96% of deaths and injuries. The top ten graphed below account for 85%. The graph below shows how deaths and injuries vary for the top ten weather events.

#grab the top 10 health impact event categories to plot (20 lines since this data is melted)
health_top_events <- health_by_event_melt[1:20,]
ggplot(health_top_events, aes(x = EVTYPE,y= NUMBER, fill = TYPE)) + geom_bar(stat = "identity") + 
      labs(title="Health Impact by Top Weather Factors\n 1996-2011", 
           x="Type of Weather Event", y="Health Impact (Fatalities + Injuries)") +
      scale_fill_manual("Type",labels = c("Fatalities", "Injuries"), values=c("#CD5C5C","#5C5CCD")) +
      theme(axis.text.x = element_text(angle = 90, hjust = 1))

This graph clearly shows that population health impact per person is mainly in the form of injuries as opposed to fatalities. Collecting data on the types of injuries involved in different events would be a helpful extension of this study, so medical facilities and government relief agencies could be more prepared.

Impact on the Economy

The top events in economic impact were different from those with the most health impact. The top twenty are shown in the table below.

#grab the top 20 categories in economic impact
damage_table <- damage_by_event[1:20,]
percent_damage_explained_1 <- 100*sum(damage_by_event$DMGTOTAL[1])/sum(damage_by_event$DMGTOTAL)
percent_damage_explained_10 <- 100*sum(damage_by_event$DMGTOTAL[1:10])/sum(damage_by_event$DMGTOTAL)
percent_damage_explained_20 <- 100*sum(damage_by_event$DMGTOTAL[1:20])/sum(damage_by_event$DMGTOTAL)
coastal_top_event_damage <- 100*sum(damage_by_event$DMGTOTAL[c(1,2,9,21)])/sum(damage_by_event$DMGTOTAL)
colnames(damage_table) <- c("Event_Type", "Propery_Damage", "Crop_Damage", "Total_Damage")
damage_table <- xtable(damage_table, digits = 0, display = c("d","s","d", "d", "d"),
                       caption = '<b> Top 20 Weather Event Types in Economic Impact (in Millions of Dollars)</b> ')
print(damage_table, type = "html", include.rownames=FALSE, caption.placement = 'top')
Top 20 Weather Event Types in Economic Impact (in Millions of Dollars)
Event_Type Propery_Damage Crop_Damage Total_Damage
HURRICANE/TYPHOON 81718 5350 87068
STORM SURGE/TIDE 47834 0 47835
FLOOD 29137 4985 34122
TORNADO 24616 283 24900
HAIL 14595 2476 17071
FLASH FLOOD 15329 1363 16692
DROUGHT 1046 13367 14413
THUNDERSTORM WIND 7915 1016 8932
TROPICAL STORM 7642 677 8320
WILDFIRE 7760 402 8162
HIGH WIND 5250 633 5883
ICE STORM 3642 15 3657
WINTER STORM 1532 11 1544
FROST/FREEZE 18 1368 1387
EXTREME COLD/WIND CHILL 29 1326 1355
HEAVY RAIN 598 729 1328
LIGHTNING 743 6 749
HEAVY SNOW 636 71 707
BLIZZARD 525 7 532
EXCESSIVE HEAT 7 492 500

.

Hurricanes topped the economic impact list accounting for 30% of the total damage. These powerful storms cause massive damage and typically hit in densely populated and developed areas where property values are high. The health impact of these is likely less because their landfall can often be predicted, and some evacuation is possible. Storm surge/tide damage occurs in the same types of areas. In fact, a large portion of damage comes from events in the coastal regions, with hurricanes, storm surges, tropical storms, and coastal floods accounting for 50% of damage numbers.

The top twenty events in the table above account for 99% of monetary loss. The top ten graphed below account for 93%. The graph below shows how property and crop damage vary for the top ten weather events.

#grab the top 10 economic impact event categories to plot (20 lines since this data is melted)
damage_top_events <- damage_by_event_melt[1:20,]
ggplot(damage_top_events, aes(x = EVTYPE,y= DOLLARS, fill = TYPE)) + geom_bar(stat = "identity") + 
      labs(title="Economic Impact by Top Weather Factors\n 1996-2011", 
           x="Type of Weather Event", y="Damage (in millions of dollars)") +
      scale_fill_manual("Type of Damage",labels = c("Property", "Crops"), values=c("#236B8E","#6B8E23")) +
      theme(axis.text.x = element_text(angle = 90, hjust = 1))

The graph shows that the bulk of the cost of weather events comes from property loss other than crops. The large exception is drought which causes more crop damage than other property, and actually caused more crop damage than any other weather category. This factor would be important to consider in farming areas.

In our analysis, we did look at inflation adjusted values, but though the numbers are overall higher in 2011 dollars, they made no difference in the ranking of the top events as can be seen in the graph below. This is likely because all of the types of events are distributed across the years of differing dollar values.

damage2011dol_top_events <- damage2011dol_by_event_melt[1:20,]
ggplot(damage2011dol_top_events, aes(x = EVTYPE,y= DOLLARS, fill = TYPE)) + geom_bar(stat = "identity") + 
      labs(title="Economic Impact by Top Weather Factors\n 1996-2011 (adjusted for inflation)", 
           x="Type of Weather Event", y="Damage (in millions of 2011 dollars)") +
      scale_fill_manual("Type of Damage",labels = c("Property", "Crops"), values=c("#236B8E","#6B8E23")) +
      theme(axis.text.x = element_text(angle = 90, hjust = 1))

One other economic factor that is not addressed in the NWS database is health care costs. The vast numbers of injuries in these weather events added to the high cost of medical care in the United States suggests that there would be a large economic impact of those injuries (in addition to the human suffering.) Though the data collection required might be difficult, this would be an interesting facet to explore in considering the full economic impact of weather events.

Conclusion

The weather events that most impact human health in the United States were found to to be tornadoes, excessive heat, and flooding. In terms of economic value, coastal events like hurricanes and storm surges/tides had the greatest impact. Since the United States is so vast and varied, different weather events are experienced in different areas. To make best use of this data, individual municipalities would need to look at both economic and health data and focus on the types of events most likely to effect their constituents. This analysis could be re-run subsetting by locale (from the original data) to give local governments more precise information.

.