Human and economic impact of storms and other severe weather events in the US from 1950 to 2011

Julien COHEN SOLAL

2015, January

Synopsis

This report aims to identify the severe weather events which have have been the most harmful on the complete US population health between 1950 and 2011, and the ones which have had the greatest economic consequences. It also aims to put numbers on these impacts and damages. It doesn’t aim to differentiate these impacts geographically within the country.

The analysis has been conducted using data from the NOAA storm database. There were several transformations from the raw data to the final data used in the figures, which will be explained in greater detail in the Data Processing section, along with implementation choices.

In the Results section, it will be shown that tornadoes have by far been the biggest cause of human deaths and injuries. It will also be shown that the economic damage of severe weather events is mostly caused on properties, with crop damage being less significant, and that floods are the biggest causes of economic damages.

Data Processing

Loading the data

This study starts by installing and loading the R libraries that will be needed later on. All the necessary data is then downloaded on a local hard drive.

# Install dependencies if necessary
if (!require(dplyr)) {install.packages("dplyr")}
if (!require(reshape2)) {install.packages("reshape2")}
if (!require(ggplot2)) {install.packages("ggplot2")}
if (!require(grid)) {install.packages("grid")}
require(dplyr)
require(reshape2)
require(ggplot2)
require(grid)

# Download storm data
if(!dir.exists("data"))    {dir.create("data")}    
if(!file.exists("data/stormData.csv.bz2"))
{
    download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                  destfile = "data/stormData.csv.bz2")
}
# File is large, so this could take some time
allData <- read.table("data/stormData.csv.bz2", 
                    header = TRUE, 
                    sep = ",", 
                    stringsAsFactors = FALSE, 
                    row.names = NULL)

Cleaning the data

Since the raw data is so big (more than 900K observations on 37 variables), some cleaning is needed to speed up operations, that is why it will be reduced to what is relevant in the scope of the present study. This study is about human health harms and economic damages. A good way to start is to remove all observations which contains neither human health harm nor any economic damage.

# Events which had 0 fatalities, injuries, property or crop damage are not 
# relevant in the current study
lightData <- allData[(allData$FATALITIES != 0) | 
                         (allData$INJURIES != 0) | 
                         (allData$PROPDMG != 0.0) | 
                         (allData$CROPDMG != 0.00), ]

This study’s aim is not to compare what happens between different states or localities, we just want to have global numbers repressenting impacts on the whole US territory. This is why we only keep the non-geographical data

# Since the study isn't about comparing what happens between locations, we can 
# only keep the non-geographical data (all of it is in the US) to allow further 
# easier reading of the relevant data
colKeep <- c("BGN_DATE", "BGN_TIME", "TIME_ZONE", "EVTYPE", "END_DATE", 
             "END_TIME", "LENGTH", "WIDTH", "F", "MAG", "FATALITIES", 
             "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", 
             "REMARKS")
lightData <- lightData[,names(lightData) %in% colKeep]

In the database documentation, it is said that there are 48 official types of severe weather events :

National Weather Service Storm Data Documentation

The EVTYPE variable of the database, which contains the type of each observation, has been poorly maintained. There is a lot of typos, several designations for the same type, etc. All in all, almost 1000 unique entries when there are supposedly only 48 possible. A lot of work has been put into harmonizing all the types in the database, using common sense and/or Wikipedia most of the time. This is the part of the study where different individuals could harmonize the entries in slightly different ways, but it shouldn’t change the gist of the study.

# Change all event types to lower case to simplify the study
lightData$EVTYPE <- tolower(lightData$EVTYPE)

# There are 48 event types according to the "Storm Data Preparation" document
# Regroup all events into these 48 main types
lightData[which(grepl("astronomical", lightData$EVTYPE)),
          "EVTYPE"] <- "ASTRONOMICAL H/L TIDE"    
lightData[which(grepl("avalanc[h]?e", lightData$EVTYPE)),
          "EVTYPE"] <- "AVALANCHE"    
lightData[which(grepl("blizzard", lightData$EVTYPE)),
          "EVTYPE"] <- "BLIZZARD"    
lightData[which(grepl("coastal", lightData$EVTYPE)),
          "EVTYPE"] <- "COASTAL FLOOD"    
lightData[which(grepl("cold", lightData$EVTYPE) & 
                    !grepl("extreme", lightData$EVTYPE)),
          "EVTYPE"] <- "COLD/WIND CHILL"    
lightData[which(grepl("mud", lightData$EVTYPE) | 
                    grepl("lands", lightData$EVTYPE) | 
                    grepl("erosion", lightData$EVTYPE) |
                    grepl("slide", lightData$EVTYPE)),
          "EVTYPE"] <- "DEBRIS FLOW"    
lightData[which(grepl("dense smoke", lightData$EVTYPE)),
          "EVTYPE"] <- "DENSE SMOKE"    
lightData[which(grepl("drought", lightData$EVTYPE)),
          "EVTYPE"] <- "DROUGHT"    
lightData[which(grepl("devil", lightData$EVTYPE)),
          "EVTYPE"] <- "DUST DEVIL"    
lightData[which(grepl("dust", lightData$EVTYPE) & 
                    !grepl("devil", lightData$EVTYPE)),
          "EVTYPE"] <- "DUST STORM"    
lightData[which(grepl("heat", lightData$EVTYPE) & 
                    grepl("excessive", lightData$EVTYPE)),
          "EVTYPE"] <- "EXCESSIVE HEAT"    
lightData[which((grepl("extreme", lightData$EVTYPE) & 
                     (grepl("cold", lightData$EVTYPE) | 
                          grepl("chill", lightData$EVTYPE))) |
                    grepl("thermia", lightData$EVTYPE) |
                    grepl("low temperature", lightData$EVTYPE)),
          "EVTYPE"] <- "EXTREME COLD/WIND CHILL"    
lightData[which(grepl("flood", lightData$EVTYPE) & 
                    grepl("flash", lightData$EVTYPE)),
          "EVTYPE"] <- "FLASH FLOOD"    
lightData[which((grepl("flood", lightData$EVTYPE) & 
                     !grepl("flash", lightData$EVTYPE)) | 
                    grepl("drowning", lightData$EVTYPE) |
                    grepl("urban", lightData$EVTYPE)),
          "EVTYPE"] <- "FLOOD"    
lightData[which(grepl("fog", lightData$EVTYPE)),
          "EVTYPE"] <- "FREEZING FOG"    
lightData[which(grepl("frost", lightData$EVTYPE) |
                    grepl("glaze", lightData$EVTYPE) | 
                    grepl("freeze", lightData$EVTYPE) | 
                    grepl("ic[ey]", lightData$EVTYPE) |
                    (grepl("freezing", lightData$EVTYPE) & 
                         !grepl("fog", lightData$EVTYPE))),
          "EVTYPE"] <- "FROST/FREEZE"    
lightData[which(grepl("hail", lightData$EVTYPE) & 
                    !grepl("marine", lightData$EVTYPE)),
          "EVTYPE"] <- "HAIL"    
lightData[which(grepl("funnel", lightData$EVTYPE)),
          "EVTYPE"] <- "FUNNEL CLOUD"    
lightData[which((grepl("heat", lightData$EVTYPE) & 
                     !grepl("excessive", lightData$EVTYPE)) |
                    grepl("warm", lightData$EVTYPE)), 
          "EVTYPE"] <- "HEAT"    
lightData[which(grepl("snow", lightData$EVTYPE) & 
                    !grepl("lake", lightData$EVTYPE)),
          "EVTYPE"] <- "HEAVY SNOW"    
lightData[which((grepl("heavy", lightData$EVTYPE) & 
                     grepl("rain", lightData$EVTYPE)) |
                    grepl("rainfall", lightData$EVTYPE) |
                    grepl("precipitation", lightData$EVTYPE) |
                    grepl("shower", lightData$EVTYPE) | 
                    grepl("mix", lightData$EVTYPE) | 
                    grepl("hvy rain", lightData$EVTYPE) | 
                    grepl("^rain", lightData$EVTYPE)),
          "EVTYPE"] <- "HEAVY RAIN"    
lightData[which(grepl("surf", lightData$EVTYPE)),
          "EVTYPE"] <- "HIGH SURF"    
lightData[which((grepl("high", lightData$EVTYPE) & 
                     grepl("wind", lightData$EVTYPE) & 
                     !grepl("marine", lightData$EVTYPE)) |
                    grepl("gradient", lightData$EVTYPE)),
          "EVTYPE"] <- "HIGH WIND"    
lightData[which(grepl("hurricane", lightData$EVTYPE) | 
                    grepl("typhoon", lightData$EVTYPE)),
          "EVTYPE"] <- "HURRICANE/TYPHOON"    
lightData[which(grepl("ice", lightData$EVTYPE) & 
                    grepl("storm", lightData$EVTYPE)),
          "EVTYPE"] <- "ICE STORM"    
lightData[which(grepl("lake", lightData$EVTYPE) & 
                    grepl("flood", lightData$EVTYPE)),
          "EVTYPE"] <- "LAKESHORE FLOOD"    
lightData[which(grepl("lake", lightData$EVTYPE) & 
                    grepl("snow", lightData$EVTYPE)),
          "EVTYPE"] <- "LAKE-EFFECT SNOW"    
lightData[which(grepl("lig.*ing", lightData$EVTYPE)),
          "EVTYPE"] <- "LIGHTNING"    
lightData[which(grepl("marine hail", lightData$EVTYPE)),
          "EVTYPE"] <- "MARINE HAIL"    
lightData[which(grepl("marine high wind", lightData$EVTYPE)),
          "EVTYPE"] <- "MARINE HIGH WIND"    
lightData[which(grepl("marine strong wind", lightData$EVTYPE)),
          "EVTYPE"] <- "MARINE STRONG WIND"    
lightData[which(grepl("marine thunderstorm wind", lightData$EVTYPE)),
          "EVTYPE"] <- "MARINE THUNDERSTORM WIND"    
lightData[which(grepl("rip", lightData$EVTYPE) & 
                    grepl("current", lightData$EVTYPE)),
          "EVTYPE"] <- "RIP CURRENT"    
lightData[which(grepl("seiche", lightData$EVTYPE)),
          "EVTYPE"] <- "SEICHE"    
lightData[which(grepl("sleet", lightData$EVTYPE)),
          "EVTYPE"] <- "SLEET"    
lightData[which(grepl("tide", lightData$EVTYPE) | 
                    grepl("wave", lightData$EVTYPE) |
                    grepl("water", lightData$EVTYPE) |
                    grepl("seas", lightData$EVTYPE) | 
                    grepl("swells", lightData$EVTYPE)),
          "EVTYPE"] <- "STORM TIDE"    
lightData[which((grepl("strong", lightData$EVTYPE) & 
                     grepl("wind", lightData$EVTYPE) & 
                     !grepl("marine", lightData$EVTYPE)) | 
                    grepl("wind damage", lightData$EVTYPE) |
                    grepl("turbulence", lightData$EVTYPE) | 
                    grepl("^wind", lightData$EVTYPE) | 
                    grepl("force.*wind", lightData$EVTYPE)),
          "EVTYPE"] <- "STRONG WIND"    
lightData[which((grepl("thu.*m", lightData$EVTYPE) & 
                     !grepl("marine", lightData$EVTYPE)) | 
                    grepl("understorm", lightData$EVTYPE) | 
                    grepl("tstm", lightData$EVTYPE) | 
                    grepl("downburst", lightData$EVTYPE) | 
                    grepl("surge", lightData$EVTYPE)),
          "EVTYPE"] <- "THUNDERSTORM WIND"    
lightData[which(grepl("nado", lightData$EVTYPE) | 
                    grepl("torn", lightData$EVTYPE) | 
                    grepl("gust", lightData$EVTYPE) | 
                    grepl("mi.*oburst", lightData$EVTYPE) | 
                    grepl("whirlwind", lightData$EVTYPE)),
          "EVTYPE"] <- "TORNADO"    
lightData[which(grepl("tropical", lightData$EVTYPE) & 
                    grepl("depression", lightData$EVTYPE)),
          "EVTYPE"] <- "TROPICAL DEPRESSION"    
lightData[which(grepl("tropical", lightData$EVTYPE) & 
                    !grepl("depression", lightData$EVTYPE)),
          "EVTYPE"] <- "TROPICAL STORM"    
lightData[which(grepl("tsunami", lightData$EVTYPE)),
          "EVTYPE"] <- "TSUNAMI"    
lightData[which(grepl("volcanic ash", lightData$EVTYPE)),
          "EVTYPE"] <- "VOLCANIC ASH"    
lightData[which(grepl("waterspout", lightData$EVTYPE) | 
                    grepl("wet", lightData$EVTYPE)),
          "EVTYPE"] <- "WATERSPOUT WIND"    
lightData[which(grepl("fire", lightData$EVTYPE)),
          "EVTYPE"] <- "WILDFIRE"    
lightData[which(grepl("winter", lightData$EVTYPE) & 
                    grepl("storm", lightData$EVTYPE)),
          "EVTYPE"] <- "WINTER STORM"    
lightData[which(grepl("winter", lightData$EVTYPE) & 
                    grepl("weather", lightData$EVTYPE)),
          "EVTYPE"] <- "WINTER WEATHER"    
lightData[which(grepl("apache", lightData$EVTYPE) | 
                    grepl("[?]", lightData$EVTYPE) | 
                    grepl("^high$", lightData$EVTYPE) | 
                    grepl("dam break", lightData$EVTYPE) | 
                    grepl("marine accident", lightData$EVTYPE) | 
                    grepl("marine mishap", lightData$EVTYPE) |
                    grepl("other", lightData$EVTYPE)),
          "EVTYPE"] <- "OTHERS"    

Not all kind of severe weather events started being recorded at the same time. In fact, for about the first 30 years, only tornadoes were recorded in the database. In the next 10 years, hail and thunderstorms were added, and only after that, in 1993, were all the other types recorded as well. Since we want to compare the kind of events against each other, it seems bad to compare events which weren’t recorded during the same period, and it also seems bad to completely ignore all the years where only a few kind of events were recorded. That is why to split the numbers into 3 time frames, so to have a better basis of comparison.

# Create new column which we'll need to group by later on
# Not all kind of events started being measured the same year, so for fair 
# comparisons we divide the study into different timeframes
nbRows <- nrow(lightData)
lightData$timeframe <- numeric(length = nbRows) 

# Iterate on rows to populate new column
for(i in 1:nbRows)
{
    curDate <- as.Date(lightData[i, 1], format = "%m/%d/%Y")
    year <- format(curDate, "%Y")
    if (year < 1983)                        {lightData[i, 18] <- 1} 
    if ((1982 < year) && (year < 1993))     {lightData[i, 18] <- 2} 
    if (1992 < year)                        {lightData[i, 18] <- 3} 
}

Property and crop damage are encoded in the database with an added column representing a multiplier to apply to the damage value. We need a ‘clean’ column to be able to compare those values.

# Update values for damages columns
lightData$PROPDMG <- with(lightData, 
                        ifelse(tolower(PROPDMGEXP)=="h", PROPDMG * 100, 
                               ifelse(tolower(PROPDMGEXP)=="k", PROPDMG * 1000, 
                                      ifelse(tolower(PROPDMGEXP)=="m", PROPDMG * 1000000, 
                                             ifelse(tolower(PROPDMGEXP)=="b", PROPDMG * 1000000000,
                                                    PROPDMG)))))
lightData$CROPDMG <- with(lightData, 
                        ifelse(tolower(CROPDMGEXP)=="h", CROPDMG * 100, 
                               ifelse(tolower(CROPDMGEXP)=="k", CROPDMG * 1000, 
                                      ifelse(tolower(CROPDMGEXP)=="m", CROPDMG * 1000000, 
                                             ifelse(tolower(CROPDMGEXP)=="b", CROPDMG * 1000000000,
                                                    CROPDMG)))))

Preparing the data for plotting

Finally, we prepare the data in a way that lets us plot it in a nice, clean and comprehensive way. There will be 3 panels corresponding to each time period, and we will show a maximum of 8 kind of severe weather events per panel (the 8 causing the most damage in that period).

# Reshape data for plotting
populationData <- group_by(lightData, EVTYPE, timeframe)
populationData <- summarize(populationData, 
                            totalFatalities = sum(FATALITIES), 
                            totalInjuries = sum(INJURIES))
populationData$totalDamage <- populationData$totalFatalities + populationData$totalInjuries
populationData <- populationData[order(populationData$timeframe, populationData$totalDamage, decreasing = TRUE), ]
rowIndices <- c(1:8, 47:50)
populationData <- populationData[rowIndices,]
populationData<- melt(populationData, 
                      id.vars = c("EVTYPE", "timeframe"), 
                      measure.vars = c("totalFatalities", "totalInjuries")) 

ecoData <- group_by(lightData, EVTYPE, timeframe)
ecoData <- summarize(ecoData, 
                     totalPropertyDamage = sum(PROPDMG), 
                     totalCropDamage = sum(CROPDMG))
ecoData$totalDamage <- ecoData$totalPropertyDamage + ecoData$totalCropDamage
ecoData <- ecoData[order(ecoData$timeframe, ecoData$totalDamage, decreasing = TRUE), ]
ecoData <- ecoData[rowIndices,]
ecoData<- melt(ecoData, 
               id.vars = c("EVTYPE", "timeframe"), 
               measure.vars = c("totalPropertyDamage", "totalCropDamage")) 

Results

Impact on human health (deaths and injuries)

# Prepare facet labels
facet_names <- list("1"="pre-1982",
                    "2"="1983-1992",
                    "3"="1993 and after")
facet_labeller <- function(variable,value)  {return(facet_names[value])}

# Plot it
plotTitle = 'Impact on human health'
plotSubtitle = '(deaths and injuries)'
popuPlot <- ggplot(populationData, aes(x = EVTYPE, 
                                       y = value, 
                                       fill = variable)) + 
    geom_bar(stat = "identity") + 
    ggtitle(bquote(atop(.(plotTitle), atop(italic(.(plotSubtitle)), "")))) + 
    labs(x = "Event type", 
         y = "Number of harmed people")  + 
    scale_fill_brewer(palette = "Set1", 
                      labels = c("Deaths", "Injuries")) + 
    guides(fill = guide_legend(title = NULL)) + 
    facet_grid(. ~ timeframe, 
               labeller = facet_labeller, 
               scales = "free_x", 
               space = "free_x") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7), 
          axis.text.y = element_text(size = 7), 
          axis.title = element_text(size = 10), 
          strip.text.x = element_text(size = 6))
print(popuPlot)

As illustrated in the figure above, there is no doubt that tornadoes have been, by a wide margin, the most harmful weather event with respect to human health (deaths + injuries) in the US, in every time period where there has been measurements. But if we look at deaths only, we can see that excessive heat has been about equally harmful than tornadoes since measurements began in 1993, even moreso if we add the heat type of event to excessive heat events (it could be argued that both of these types of events could be morphed into one). Tornadoes caused much more injuries however. Floods seem to be the third most harmful weather event with respect to human health.

Economic consequences

# Prepare facet labels
facet_names <- list("1"="pre-1982",
                    "2"="1983-1992",
                    "3"="1993 and after")
facet_labeller <- function(variable,value)  {return(facet_names[value])}

# Plot it
ecoPlot <- ggplot(ecoData, aes(x = EVTYPE, 
                              y = value / 1e9, 
                              fill = variable)) + 
    geom_bar(stat = "identity") + 
    ggtitle("Economic consequences") + 
    labs(x = "Event type", 
         y = "Cost of damage (in billions of USD)")  + 
    scale_fill_brewer(palette = "Set1", 
                      labels = c("Property damage", "Crop damage")) + 
    guides(fill = guide_legend(title = NULL)) + 
    facet_grid(. ~ timeframe, 
               labeller = facet_labeller, 
               scales = "free_x", 
               space = "free_x") +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 7), 
          axis.text.y = element_text(size = 7), 
          axis.title = element_text(size = 10), 
          strip.text.x = element_text(size = 6))
print(ecoPlot)

In the figure above, we can first see that even though hail and thunderstorm began being measured in 1983 (as seen in the Impact on human health figure), economic damages resulting from these events seemingly only started being measured in 1993 along with all the other types of weather events.

Also, it is pretty clear from this figure that the weather events which cause the most economic harm tend to be different from the ones which cause the most human harm. Here tornadoes are only ranked #4 while excessive heat doesn’t appear at all in the top 8. The biggest cause of economic damage are floods, by far, then come hurricanes/typhoons and thunderstorms.

One last thing illustrated in this figure is the fact that property damages from severe weather events represent a much bigger $ total than crop damages.The biggest causes (by far) of crop damages are droughts.