Synopsis

Extreme weather routinely plays havoc with national infrastructure, and with climate change continuing to pose a challenge for world leaders, the odds are strong that extreme weather events will become more frequent in the years ahead. As the United States prepares for this eventuality it makes sense to ask which types of extreme weather events are most harmful with respect to population health? In this study I examine data collected by the National Climatic Data Center at the National Oceanic and Atmospheric Administration (NOAA). After examining the total amount of devastation done to life and property during the period of 1950-2011, I conclude that tornadoes are the most harmful to both the population, and the economy of the United States.

Data Processing

For data I use the Storm Events Database maintained by NOAA. The data set contains 197,706 observations collected during the period of 1950 - 2011, with each observation corresponding to one extreme weather event. The documentataion for this data set is freely available on-line with additional information available via the FAQ. The code used for accessing and reading in this data is as follows:

rm(list=ls())
if (!file.exists("repdata_data_StormData.csv.bz2")) {
  fileUrl <- paste("https://d396qusza40orc.cloudfront.net",
                   "/repdata%2Fdata%2FStormData.csv.bz2", sep="")
  download.file(fileUrl, destfile = "./repdata_data_StormData.csv.bz2")
  rm(fileUrl)}
library("data.table")
original_data <- data.table(read.csv("repdata_data_StormData.csv.bz2"))

The key independent variable of interest is the type of weather event denoted by the variable label ``EVTYPE." The dependent variables of interest are “FATALITIES” (the number of fatalities), “INJURIES” (the number of people injured), “PROPDMG” (an estimate of damage to property in U.S. dollars), and “CROPDMG” (an estimate of damage to crops in U.S. dollars). In addition, the variables “PROPDMGEXP” and “CROPDMGEXP” denote the magnitude of their respective variables “PROPDMG” (property damage) and “CROPDMG” (crop damage) (e.g. “B” corresponds to damage in the billions of dollars). From the Storm Data Documentation:

      “Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of
       the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include”K" for thousands,
       “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry.“

                                                                                                                           - NWSI 10-1605 AUGUST 17, 2007, p.12

The initial .bz2 file is 49.2 MB and generates a 902,297 observation x 37 horizontal variable data frame when opened. The analysis was performed using R 3.2.2 on a computer with a 3.06 GHz Intel Core 2 Duo processor and 4 GB of memory. To conserve resources, I drop all variables not directly relevant for out analysis. Thus, I retain”EVTYPE" (event type), “FATALITIES”, “INJURIES”, “PROPDMG”, “PROPDMGEXP”, “CROPDMG, and”CROPDMGEXP."

library(dplyr)
working_data <- select(original_data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

To prepare the data for analysis, I begin by multiplying property damage “PROPDMG” and crop damage “PROPDMGEXP” by the categorical weights denoted by “PROPDMGEXP” and “CROPDMGEXP.” This requires the explicit assigning of missing values, as well as the conversion of “PROPDMGEXP” and “CROPDMGEXP” from factor (i.e. categorical) variables to numeric ones.

working_data$PROPDMGEXP[working_data$PROPDMGEXP == ""]  <- NA 
working_data$CROPDMGEXP[working_data$CROPDMGEXP == ""]  <- NA 
library("dplyr")
working_data <- working_data  %>%
        mutate(PROPDMGcost = ifelse((is.na(working_data$PROPDMGEXP) |
                         working_data$PROPDMGEXP %in% c("m", "+", 0, 5, 6, "?", 4,   
                         2, 3, "h", 7, "-", 1, 8)), 1, as.character(working_data$PROPDMGEXP)))
working_data$PROPDMGcost[working_data$PROPDMGEXP == "H"] <- 100
working_data$PROPDMGcost[working_data$PROPDMGEXP == "K"] <- 1000
working_data$PROPDMGcost[working_data$PROPDMGEXP == "M"] <- 1000000
working_data$PROPDMGcost[working_data$PROPDMGEXP == "B"] <- 1000000000
working_data$PROPDMGcost <- as.numeric(working_data$PROPDMGcost)
working_data <- working_data  %>%
        mutate(CROPDMGcost = ifelse((is.na(working_data$CROPDMGEXP) |
                         working_data$CROPDMGEXP %in% c("m", "+", 0, 5, 6, "?", 4, "k",  
                         2, 3, "h", 7, "-", 1, 8)), 1, as.character(working_data$CROPDMGEXP)))
working_data$CROPDMGcost[working_data$CROPDMGEXP == "H"] <- 100
working_data$CROPDMGcost[working_data$CROPDMGEXP == "K"] <- 1000
working_data$CROPDMGcost[working_data$CROPDMGEXP == "M"] <- 1000000
working_data$CROPDMGcost[working_data$CROPDMGEXP == "B"] <- 1000000000
working_data$CROPDMGcost <- as.numeric(working_data$CROPDMGcost)
working_data <- select(working_data, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGcost, CROPDMG,                            CROPDMGcost)

Now we multiply the completed weights “PROPDMGcost” and “CROPDMGcost” by “PROPDMG” and “CROPDMG” respectively. We are interested in the total economic cost of these extreme weather events irrespective of whether the damage affected property or agriculture. For this reason, we sum together the two products.

working_data$Total_Damage <- (working_data$PROPDMG * working_data$PROPDMGcost) +
                                (working_data$CROPDMG * working_data$CROPDMGcost)
working_data <- select(working_data, EVTYPE, FATALITIES, INJURIES, Total_Damage)

Having constructed the dependent variable, we proceed to tidy the data necessary for the independent variable. The variable “EVTYPE” has 224 unique values, making it difficult to interpret. To simplify this measure, I consolidated these 224 unique values into the official 48 weather event types as described in the NOAA documentation.

working_data$EVTYPE <- as.character(working_data$EVTYPE)
working_data <- as.data.frame(working_data)
working_data[grep("WINTER STORM", working_data$EVTYPE), 1]             <- "Winter Storm"# 7.47
Winter_Weather_text <- c("WINTER WEATHER", "FREEZING DRIZZLE")                              
working_data[(grep(paste(Winter_Weather_text,
        collapse="|"), working_data$EVTYPE)), 1]                     <- "Winter Weather"# 7.48
working_data[grep("THUNDERSTORM WIND", working_data$EVTYPE), 1]   <- "Thunderstorm Wind"# 7.39
Strong_Wind_text <- c("STRONG WIND", "TSTM WIND", "WIND",                               
                      "GUSTY WINDS", "GUSTNADO AND", 
                      "SEVERE TURBULENCE", "THUNDERSTORM WINS")                
working_data[(grep(paste(Strong_Wind_text,
        collapse="|"), working_data$EVTYPE)), 1]                        <- "Strong Wind"# 7.38
working_data[grep("SURGE", working_data$EVTYPE), 1]                <- "Storm Surge Tide"# 7.37
working_data[grep("SEICHE", working_data$EVTYPE), 1]                         <- "Seiche"# 7.35
working_data[grep("MARINE STRONG WIND", working_data$EVTYPE), 1] <- "Marine Strong Wind"# 7.32
Marine_High_Wind_text <- c("MARINE HIGH WIND", "MARINE MISHAP")                              
working_data[(grep(paste(Marine_High_Wind_text,
        collapse="|"), working_data$EVTYPE)), 1]                   <- "Marine High Wind"# 7.31
working_data[grep("MARINE HAIL", working_data$EVTYPE), 1]               <- "Marine Hail"# 7.30
working_data[grep("LAKE-EFFECT SNOW", working_data$EVTYPE), 1]     <- "Lake-Effect Snow"# 7.28
working_data[grep("LAKESHORE FLOOD", working_data$EVTYPE), 1]       <- "Lakeshore Flood"# 7.27
working_data[grep("TIDE", working_data$EVTYPE), 1]            <- "Astronomical Low Tide"# 7.1
working_data[grep("AVALANCHE", working_data$EVTYPE), 1]                   <- "Avalanche"# 7.2
Blizzard_text <- c("BLIZZARD", "WIND/BLIZZARD/FREEZING",                                
                   "WIND/BLIZZARD", "BLOWING SNOW", "SNOW")               
working_data[(grep(paste(Blizzard_text,
        collapse="|"), working_data$EVTYPE)), 1]                           <- "Blizzard"# 7.3
working_data[grep("FLOOD", working_data$EVTYPE), 1]                   <- "Coastal Flood"# 7.4
working_data[grep("COLD", working_data$EVTYPE), 1]                  <- "Cold/Wind Chill"# 7.5
working_data[grep("DEBRIS", working_data$EVTYPE), 1]                    <- "Debris Flow"# 7.6
working_data[grep("FOG", working_data$EVTYPE), 1]                         <- "Dense Fog"# 7.7
working_data[grep("SMOKE", working_data$EVTYPE), 1]                     <- "Dense Smoke"# 7.8
Drought_text <- c("DROUGHT", "DRY")
working_data[(grep(paste(Drought_text,
        collapse="|"), working_data$EVTYPE)), 1]                            <- "Drought"# 7.9
working_data[grep("DUST DEVIL", working_data$EVTYPE), 1]                 <- "Dust Devil"# 7.10
Dust_Storm_text <- c("BLOWING DUST", "DUST STORM")                  
working_data[(grep(paste(Dust_Storm_text,
        collapse="|"), working_data$EVTYPE)), 1]                         <- "Dust Storm"# 7.11
Excessive_Heat_text <- c("HEAT WAVE", "RECORD HEAT", "EXCESSIVE HEAT",                  
                         "RECORD WARMTH", "UNSEASONABLY WARM", "HIGH") 
working_data[(grep(paste(Excessive_Heat_text,
        collapse="|"), working_data$EVTYPE)), 1]                     <- "Excessive Heat"# 7.12
working_data[grep("COLD", working_data$EVTYPE), 1]          <- "Extreme Cold/Wind Chill"# 7.13
Flash_Flood_text <- c("FLASH FLOOD", "WET MICROBURST", 
                      "UNSEASONABLY WET", "MUDSLIDES")              
working_data[(grep(paste(Flash_Flood_text,
        collapse="|"), working_data$EVTYPE)), 1]                        <- "Flash Flood"# 7.14
working_data[grep("FLOOD", working_data$EVTYPE), 1]                           <- "Flood"# 7.15
working_data[grep("FREEZING FOG", working_data$EVTYPE), 1]             <- "Freezing Fog"# 7.16
Frost_Freeze_text <- c("FROST", "DAMAGING FREEZE", "WINTRY MIX", 
                       "GLAZE", "FREEZE", "RECORD LOW", "LOW TEMPERATURE RECORD")              
working_data[(grep(paste(Frost_Freeze_text,
        collapse="|"), working_data$EVTYPE)), 1]                       <- "Frost Freeze"# 7.17
Funnel_Cloud_text <- c("FUNNEL CLOUD", "FUNNEL", "WALL CLOUD")             
working_data[(grep(paste(Funnel_Cloud_text,
        collapse="|"), working_data$EVTYPE)), 1]                       <- "Funnel Cloud"# 7.18
working_data[grep("HAIL", working_data$EVTYPE), 1]                             <- "Hail"# 7.19
working_data[grep("HEAT", working_data$EVTYPE), 1]                             <- "Heat"# 7.20
Heavy_Rain_text <- c("HEAVY RAIN", "MICROBURST", "RAINSTORM", 
                     "HEAVY PRECIPATATION", "RECORD RAINFALL", 
                     "NORMAL PRECIPITATION", "DOWNBURST")              
working_data[(grep(paste(Heavy_Rain_text,
        collapse="|"), working_data$EVTYPE)), 1]                         <- "Heavy Rain"# 7.21
working_data[grep("HEAVY SNOW", working_data$EVTYPE), 1]                 <- "Heavy Snow"# 7.22
High_Surf_text <- c("SURF", "STORM SURGE")                                              
working_data[(grep(paste(High_Surf_text,
        collapse="|"), working_data$EVTYPE)), 1]                          <- "High Surf"# 7.23
working_data[grep("HIGH WIND", working_data$EVTYPE), 1]                   <- "High Wind"# 7.24
Hurricane_Typhoon_text <- c("HURRICANE OPAL", "HURRICANE ERIN",                         
                            "SEVERE THUNDERSTORMS", "THUNDERSTORMS", 
                            "SEVERE THUNDERSTORM", "THUNDERSTORM")                         
working_data[(grep(paste(Hurricane_Typhoon_text,
        collapse="|"), working_data$EVTYPE)), 1]                  <- "Hurricane/Typhoon"# 7.25
working_data[grep("ICE", working_data$EVTYPE), 1]                         <- "Ice Storm"# 7.26
Lightning_text <- c("LIGHTNING", "LIGHTING")                                              
working_data[(grep(paste(Lightning_text,
        collapse="|"), working_data$EVTYPE)), 1]                          <- "Lightning"# 7.29
working_data[grep("THUNDERSTROM", working_data$EVTYPE), 1] <- "Marine Thunderstorm Wind"# 7.33
working_data[grep("CURRENT", working_data$EVTYPE), 1]                   <- "Rip Current"# 7.34
Sleet_text <- c("SLEET", "FREEZING RAIN")                        
working_data[(grep(paste(Sleet_text,
        collapse="|"), working_data$EVTYPE)), 1]                              <- "Sleet"# 7.36
working_data[grep("TORNADO", working_data$EVTYPE), 1]                       <- "Tornado"# 7.40
working_data[grep("DEPRESSION", working_data$EVTYPE), 1]        <- "Tropical Depression"# 7.41
Tropical_Storm_text <- c("TROPICAL STORM", "TROPICAL STORM ALBERTO",               
                              "TROPICAL STORM", "TROPICAL STORM GORDON", 
                              "TROPICAL STORM JERRY")  
working_data[(grep(paste(Tropical_Storm_text,
        collapse="|"), working_data$EVTYPE)), 1]                     <- "Tropical Storm"# 7.42
working_data[grep("TSUNAMI", working_data$EVTYPE), 1]                       <- "Tsunami"# 7.43
working_data[grep("VOLCANIC", working_data$EVTYPE), 1]                 <- "Volcanic Ash"# 7.44
Waterspout_text <- c("WATERSPOUT", "WATERSPOUT", "WAYTERSPOUT",
                     "WATER SPOUT")                         
working_data[(grep(paste(Waterspout_text,
        collapse="|"), working_data$EVTYPE)), 1]                         <- "Waterspout"# 7.45 
Wildfire_text <- c("FIRE", "WILDFIRE")                                                  
working_data[(grep(paste(Wildfire_text ,
        collapse="|"), working_data$EVTYPE)), 1]                           <- "Wildfire"# 7.46 

The following event codes could not be interpreted and were removed from the data set.

remove <- c("APACHE COUNTY", "URBAN/SMALL", "URBAN AND SMALL", "URBAN AND SMALL STREAM")
working_data<-working_data[!(working_data$EVTYPE %in% remove),]

These events only represent 5 out of the 197,706 available observations, and so there is little threat of inducing bias. After consolidating “EVTYPE”, we are left with 32 possible extreme weather events.

Analysis

The analysis is relatively straightforward. I collapse the number of fatalities, the number of injuries, and the total amount of damage (property damage + crop damage) by each event category. To facilitate interpretation, I limit my subsequent analysis to the ten highest ranking weather events for each of the 3 dependent variables.

library(dplyr)
Fatalities <- summarise(group_by(working_data, EVTYPE), sum(FATALITIES))
names(Fatalities)[names(Fatalities) == 'sum(FATALITIES)'] <- 'Fatalities'
Fatalities <- arrange(Fatalities, -Fatalities)
Fatalities_10 <- as.data.frame(Fatalities[1:10,]) 
rownames(Fatalities_10) <- Fatalities_10[,1]
Fatalities_10$EVTYPE <- NULL

Injuries <- summarise(group_by(working_data, EVTYPE), sum(INJURIES))
names(Injuries)[names(Injuries) == 'sum(INJURIES)'] <- 'Injuries'
Injuries <- arrange(Injuries, -Injuries)
Injuries_10 <- as.data.frame(Injuries[1:10,])
rownames(Injuries_10) <- Injuries_10[,1]
Injuries_10$EVTYPE <- NULL

Damage <- summarise(group_by(working_data, EVTYPE), sum(Total_Damage))
names(Damage)[names(Damage) == 'sum(Total_Damage)'] <- 'Damage'
Damage <- arrange(Damage, -Damage)
Damage_10 <- as.data.frame(Damage[1:10,])
rownames(Damage_10) <- Damage_10[,1]
Damage_10$EVTYPE <- NULL

Results

In this section I discuss the results of my analysis. Tornadoes are unambiguously the most deadly of all severe weather events recorded. Similarly, tornadoes account for an even greater share of injuries relative to the other kinds of extreme weather events. Looking at the economic consequences of these extreme weather events, tornadoes are again the chief perpetrator of economic damage. In context, the total amount of economic damage (property damage + crop damage) caused by tornadoes is more than 6 times that caused by the second most expensive extreme weather event - winter storms.

Fatalities

Here I focus on the 10 extreme weather events with the greatest number of fatalities. Figure 1 contains the total number of fatalities recorded during the period of 1950-2011. Working our way down the vertical axis, first note that there are separate entries for “Heat” and “Excessive Heat.” The former category can be thought of as high, but not unusual temperatures whereas “Excessive Heat” is rarer and more severe. “Heat” and “Excessive Heat” caused 17 and 22 fatalities during the period of 1950-2011 respectively.

“Winter Storm” and “Blizzard” also represent similar categories. While these 2 weather event categories are similar, not all “Winter Storm” events meet the technical definition of a “Blizzard.”1 I leave it to the reader to decide whether these two categories ought to be combined.

Moving to the very bottom of the vertical axis, we can see that “Strong Wind” and “Tornado” also resemble one another, the former category being less severe.2 In sum, tornadoes were responsible for 4,063 deaths during the period of 1950-2011. Strong winds are an order or magnitude less in severity, accounting for 313 deaths during the same period. In terms of deaths, tornadoes are a clear outlier.

par(oma=c(3,4,1,0)) # c(bottom margin, left margin, top margin, right margin)
barplot(Fatalities_10$Fatalities, sub = "(Figure 1)", main="Number of Fatalities by Extreme Weather Event (1950 - 2011)", xlab="Number of Fatalities", names.arg=c("Tornado", "Strong Wind", "Coastal Flood", "Lightning", "Rip Current", "Thunderstorm Wind", "Excessive Heat", "Blizzard", "Winter Storm", "Heat"), las=2, cex.names=0.8, cex.axis=0.8, col="blue", border="black", horiz=TRUE)

Injuries

Relative to the number of fatalities, the number of injuries is even more skewed towards tornadoes. Specifically, there are 74,510 recorded weather-related injuries during the period of 1950-2011.

sum(original_data$INJURIES) 
## [1] 140528

Of these, 69,138 were associated with a tornado. In other words, approximately 92.7% of all weather-related injuries during the period of study were caused by tornadoes. As with fatalities, “Strong Wind” came in 2nd place as the most injury-inducing weather event, causing a total of 3,480 injuries during the period of study.

par(oma=c(3,4,1,0)) # c(bottom margin, left margin, top margin, right margin)
barplot(Injuries_10$Injuries, sub = "(Figure 2)", main="Number of Injuries by Extreme Weather Event (1950 - 2011)", xlab="Number of Injuries", names.arg=c("Tornado", "Strong Wind", "Blizzard", "Hail", "Lightning", "Dense Fog", "Thunderstorm Wind", "Wildfire", "Dust Storm", "Tropical Storm"), las=2, cex.names=0.8, cex.axis=0.8, col="blue", border="black", horiz=TRUE)

Economic Impact

Recall that in the section on Data Processing, I constructed a new variable representing total damage by combining the variables “PROPDMG” and “CROPDMG.” To facilitate interpretation, I re-scale the composite variable “Damage” by dividing it by one million. The ten most expensive weather events in terms of total damages are represented in Figure 3.

Damage_10$Damage_mil <- Damage_10$Damage/1000000

par(oma=c(3,4,1,0)) # c(bottom margin, left margin, top margin, right margin)
barplot(Damage_10$Damage_mil, sub = "(Figure 3)", main="Total Weather-Induced Damages (1950 - 2011)", xlab="Damages in Millions $", names.arg=c("Tornado", "Winter Storm", "Hurricane/Typhoon", "Strong Wind", "Coastal Flood", "Wildfire", "Heat", "Thunderstorm Wind", "Blizzard", "Hail"), las=2, cex.names=0.8, cex.axis=0.7, col="blue", border="black", horiz=TRUE)

Following the pattern from earlier, tornadoes have caused the most economic damage to the United States of all weather events - some 31.17 billion U.S. dollars worth of devastation during the period of 1950-2011. Diverging from the pattern established by “Fatalities” and Injuries," winter storms caused approximately 5.15 billion dollars worth of damage. Proceeding among the remaining top-five most expensive weather events, hurricanes/typhoons, strong winds, and coastal flooding caused a total of 3.57, 2.33, and 1.04 billion dollars worth of damage respectively.

Conclusion

In the beginning we set out to establish which types of weather events are most harmful with respect to population health, and which types of events have the greatest economic consequences. In one respect, the findings are clear. Tornadoes have caused the most fatalities, injuries, and economic damage in the United States by a wide margin. Of all the types of weather events monitored by the NOAA, tornadoes are the most harmful to both the population, and the economy of the United States.


  1. “Blizzard: A winter storm which produces the following conditions for 3 hours or longer: (1) sustained winds or frequent gusts 30 knots (35 mph) or greater, and (2) falling and/or blowing snow reducing visibility frequently to less than 1/4 mile, on a widespread or localized basis.” (National Weather Service Instruction (NWSI) 10-1605, p.20)

  2. “Strong Wind: Non-convective winds gusting less than 50 knots (58 mph), or sustained winds less than 35 knots (40 mph), resulting in a fatality, injury, or damage. Consistent with regional guidelines, mountain states may have higher criteria.” (National Weather Service Instruction (NWSI) 10-1605, p.69)