Tornadoes and Floods Across the United States are the Leading Cause of Fatalities and Injuries, and Crop and Property Damage

An Analysis of the Economic and Population Health Condition as a Consequence of Storm Event Types

Synopsis:

This study makes use of publicly available Storm Data from the National Climactic Data Center. The data includes information from year 1950 until November 2011.

In the analysis, we cleaned up the event types from the dataset to fix misspellings, abbreviations and other inconsistencies and standardized each of them to one of the 48 Storm Event Types identified in the National Weather Service Intruction 10-1605 dated August 17, 2007. To calculate the economic effects of each event type, the crop damage values and the property damage values as specified in the dataset were combined to represent a total damage value. To calculate the effects of event types to the human health condition, the fatalities and injuries were summed together by event type.

The result of this analysis shows that flood, hurricanes, tornadoes, storm surge and hail caused the most damage to property and crops and therefore, the highest economic consequence. The results also show that tornadoes, thunderstorms, excessive heat, flood and lightning and the biggest cause of human fatality and injury and are therefore the most harmful to the population.

Data Processing

Data Preparation / Cleaning

  1. Read the data into our working data frame w and remove unnecessary fields from our working data frame. Remove all rows with 0 fatalities, 0 injuries, 0 property damage and 0 crop damage.
w <- read.csv(bzfile("repdata-data-StormData.csv.bz2"));
w <- w[,c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")];
w <- w[w$FATALITIES > 0 | w$INJURIES > 0 | w$PROPDMG > 0 | w$CROPDMG > 0,];
  1. Cleanup the values found in EVTYPE column.

2.1. Convert all values in EVTYPE column to uppercase.

  w$EVTYPE <- toupper(w$EVTYPE)

2.2. Remove all punctuations, multiple spaces and spaces at the start and at the end of the EVTYPE values.

  trim <- function (x) gsub("^\\s+|\\s+$", "", x)
  w$EVTYPE <- gsub("[[:punct:]]"," ",w$EVTYPE)
  w$EVTYPE <- gsub("\\s+"," ",w$EVTYPE)
  w$EVTYPE <- trim(w$EVTYPE)

2.3. Fix misspellings, convert plural to singular, use appropriate replacement words, and expand abbreviations in EVTYPE values.

    fromthis <- c("STORMS", "FIRES", "FLOODING","FLOODS","TUNDERSTORM", "TSTM", "WINDS", "WINDSS", "SLIDES", "THUDERSTORM", "THUNDEERSTORM", "THUNDERESTORM", "LIGHTING","LANDSLIDES","WINDS","LIGNTNING","WILD FIRE", "WAYTERSPOUT", "SML", "FLD","RAINS","SHOWERS", "FUNNELS","CLOUDS","TORNADOES","WINTERY","WINTRY","WND","FLOODG","FLOODIN","WATER SPOUT", "TORNDAO","THUNERSTORM","W INDS","THUNDERSTORMWIND","EROSIN","AVALANCE", "FLOOD FLOOD", "FLASHFLOOD","HVY", "THUNDERSTORMW","CURRENTS", "FLOOD FLASH","MUD SLIDE","THUNDERTORM", "WINS", "THUNDERSTROM", "HAIL STORM", "RAINTORM","COOL AND WET", "UNSEASONABLY", "UNSEASONAL", "WINDCHILL","CSTL", "SNOW FALL", "SQUALLS", "ICY");

    tothis <- c("STORM","FIRE","FLOOD","FLOOD","THUNDERSTORM","THUNDERSTORM","WIND","WIND","SLIDE","THUNDERSTORM", "THUNDERSTORM","THUNDERSTORM","LIGHTNING","LANDSLIDE","WIND","LIGHTNING","WILDFIRE","WATERSPOUT", "SMALL", "FLOOD","RAIN","SHOWER","FUNNEL","CLOUD", "TORNADO","WINTER","WINTER", "WIND","FLOOD","FLOOD","WATERSPOUT", "TORNADO","THUNDERSTORM","WINDS","THUNDERSTORM WIND","EROSION","AVALANCHE","FLOOD","FLASH FLOOD", "HEAVY", "THUNDERSTORM WIND","CURRENT", "FLASH FLOOD", "MUDSLIDE", "THUNDERSTORM","WIND","THUNDERSTORM","HAILSTORM", "RAINSTORM","COLD AND WET", "UNSEASONABLE", "UNSEASONABLE", "WIND CHILL", "COASTAL", "SNOWFALL","SQUALL","ICE" )

    for ( i in 1:length(fromthis)) {
      w$EVTYPE <- gsub(fromthis[i], tothis[i], w$EVTYPE)
    }

2.4. Simplify EVTYPE values by removing supplemental descriptions that come after certain words. i.e. “THUNDERSTORM WIND 50”, “THUNDERSTORM WIND/HAIL”, and “THUNDERSTORM WIND G51” will all be simplified and be replaced with name “THUNDERSTORM WIND”

  keeppattern <- c("FLASH FLOOD", "COASTAL FLOOD", "BLIZZARD", "FLOOD", "HAIL", "HEAVY RAIN", "HEAVY SNOW",
  "HIGH SURF","HIGH WIND","HURRICANE", "LIGHTNING", "MUD SLIDE", "THUNDERSTORM", "TORNADO", "TROPICAL STORM", 
  "WINTER WEATHER", "WINTER STORM", "DUST DEVIL", "DROUGHT", "HEAT WAVE", "WATERSPOUT","STORM SURGE", 
  "GUSTY WIND","HAIL","RAIN", "FREEZING RAIN", "COLD AND WET","DUST STORM", "HEAVY SURF")

  for (i in 1:length(keeppattern)) {
    w$EVTYPE <- gsub(paste0(keeppattern[i],"\\s+.*"), keeppattern[i], w$EVTYPE)
  }
  1. Assign the appropriate standardized Storm Event Type to the EVTYPE values.

3.1. Create a reference vector that lists all official storm events. List is based on Table 1 of Section 2.1.1. of the document National Weather Service Instruction 10-1605 published Aug 17, 2007.

  stormevent <- c("ASTRONOMICAL LOW TIDE", "EXTREME COLD/WIND CHILL", "LAKE-EFFECT SNOW", "MARINE HIGH WIND", 
  "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND", "COASTAL FLOOD", "COLD/WIND CHILL", "DEBRIS FLOW", 
  "DENSE FOG", "DENSE SMOKE", "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "FLASH FLOOD", "FREEZING FOG", 
  "FUNNEL CLOUD", "HEAVY RAIN", "HEAVY SNOW", "HIGH SURF", "HIGH WIND", "ICE STORM", "LAKESHORE FLOOD", 
  "MARINE HAIL", "RIP CURRENT", "STORM SURGE/TIDE", "STRONG WIND", "THUNDERSTORM WIND", "TROPICAL DEPRESSION",
  "TROPICAL STORM", "VOLCANIC ASH", "WINTER STORM", "WINTER WEATHER", "AVALANCHE", "BLIZZARD", "DROUGHT", "FLOOD",
  "FROST/FREEZE", "HAIL", "HEAT", "HURRICANE/TYPHOON", "LIGHTNING", "OTHER", "SEICHE", "SLEET", "TORNADO", 
  "TSUNAMI", "WATERSPOUT", "WILDFIRE")

3.2. Prepare the storm event list for matching

Create extra records for standardized storm events with an alternate name or contains a “/” in the name.

  stormevent <- append(stormevent, stormevent[grepl("/",stormevent)])
  stormevent <- as.vector(stormevent)
  stormevent <- stormevent[order(stormevent)]

Identify which events have only 1 name and assign that name in the matchname variable.

  senamecnt <- as.data.frame(table(stormevent))
  stormevent <- as.data.frame(cbind(stormevent, rep(NA, length(stormevent))))
  names(stormevent) <- c("stormevent","matchname")
  sematch <- stormevent$stormevent %in% senamecnt[senamecnt$Freq == 1,1]
  stormevent$stormevent <- as.character(stormevent$stormevent)
  stormevent$matchname <- as.character(stormevent$matchname)
  stormevent$matchname[sematch] <- stormevent$stormevent[sematch]

Create matchname values for the remaining storm events. These are storm events that have tagged as having an alternate name.

  for (v in unique(stormevent$stormevent[is.na(stormevent$matchname)])){
    for (j in grep(v, stormevent$stormevent)) {
      stormevent$matchname[j] <- gsub("/\\w+","", stormevent$stormevent[j])
      stormevent$matchname[j+1] <- gsub("\\w+/","", stormevent$stormevent[j+1])
      break
    }
  }

Sort the storm event data frame by number of words in the matchname variable. We are going to use the matchname variable to match storm events with the raw EVTYPEs in our data set.

  countwords <- function(x) length(strsplit(gsub("[[:punct:]]"," ",x)," ")[[1]])
  wordcount <- as.vector(sapply(stormevent$matchname, countwords))
  stormevent <- cbind(stormevent, wordcount)
  stormevent <- stormevent[order(-stormevent$wordcount, stormevent$matchname),]
  1. Start matching EVTYPES to the official Storm Event Types.

Use the stormevent data frame to match standardized event types to the EVTYPEs.

  w$STANDARDIZED <- NA
  for (i in 1:nrow(stormevent)) {
    matchthis <- stormevent$matchname[i]
    usethis <- stormevent$stormevent[i]
    matched <- (grepl(matchthis, w$EVTYPE) & is.na(w$STANDARDIZED))
    w$EVTYPE[matched] <- usethis
    w$STANDARDIZED[matched] <- TRUE
  }

For the remaining EVTYPEs, create data frame to standardize each EVTYPE. The dataframe will be a mapping between different EVTYPEs and the official Storm Event Type list. The mapping is based on hours of research and conforms with the specifications of the National Weather Service Instruction 10-1605 dated August 17, 2007.

  rawevtype <- c("BEACH EROSION", "COASTAL EROSION", "LATE SEASON SNOW", "COASTAL STORM", "COASTAL SURGE", 
  "COLD", "COLD AND WET", "COLD COASTAL", "SNOW COLD", "UNSEASONABLE COLD", "DAM BREAK", "LANDSLIDE", 
  "LANDSLUMP", "LANDSPOUT", "MUDSLIDE", "ROCK SLIDE", "FOG", "BLOWING DUST", "EXTENDED COLD", 
  "EXTREME COASTAL", "EXTREME COLD", "EXTREME COLD COASTAL", "EXTREME COASTAL", "RECORD COLD", 
  "SNOW BITTER COLD", "URBAN AND SMALL", "URBAN SMALL", "URBAN SMALL STREAM", "FREEZING DRIZZLE", 
  "FREEZING RAIN", "GLAZE", "GLAZE ICE", "ICE", "ICE AND SNOW", "ICE FLOES", "ICE JAM", "ICE ROADS", 
  "LIGHT FREEZING RAIN", "SNOW FREEZING RAIN", "UNSEASONABLE WARM", "EXCESSIVE WETNESS", "HEAVY PRECIPITATION", 
  "HEAVY SHOWER", "RAIN", "RAINSTORM", "RECORD RAINFALL", "COASTAL RAIN", "EXCESSIVE SNOW", "HEAVY LAKE SNOW", 
  "LIGHT SNOW", "LIGHT SNOWFALL", "RECORD SNOW", "SNOW", "SNOW ACCUMULATION", "SNOW AND ICE", "SNOW BLOWING SNOW", 
  "SNOW SQUALL", "ASTRONOMICAL HIGH TIDE", "HEAVY SURF", "HEAVY SWELLS", "HIGH SEAS", "HIGH SWELLS", "HIGH TIDES", 
  "HIGH WATER", "ROUGH SURF", "DOWNBURST", "DRY MICROBURST", "GRADIENT WIND", "GUSTY WIND", "MICROBURST", 
  "MICROBURST WIND", "WET MICROBURST", "SNOW ICE","LAKE EFFECT SNOW", "MARINE THUNDERSTORM", "APACHE COUNTY", 
  "MARINE ACCIDENT", "NON SEVERE WIND DAMAGE", "NON THUNDERSTORM", "WHIRLWIND", "WIND", "WIND AND WAVE", 
  "WIND DAMAGE", "GUSTNADO", "SEVERE THUNDERSTORM", "SEVERE TURBULENCE", "STORM FORCE WIND", "THUNDERSNOW", 
  "THUNDERSTORM", "WIND STORM", "BRUSH FIRE", "FOREST FIRE","GRASS FIRE", "WILD FOREST FIRE", "BLOWING SNOW", 
  "HEAVY MIX", "MIXED PRECIPITATION", "WINTER MIX", "MARINE MISHAP", "HIGH", "DRY MIRCOBURST WIND", "COLD WAVE", 
  "FOG AND COLD TEMPERATURES", "UNSEASONABLE WARM AND DRY", "HIGH WAVES", "LOW TEMPERATURE", "COLD WIND", 
  "COLD WEATHER", "RAPIDLY RISING WATER", "EXCESSIVE RAINFALL", "HEAVY SEAS", "TORRENTIAL RAINFALL", "MIXED PRECIP", 
  "FREEZING SPRAY", "COLD TEMPERATURE", "COLD AND SNOW", "BLACK ICE", "COASTALSTORM", "UNSEASONABLE RAIN", 
  "HYPERTHERMIA EXPOSURE", "ROUGH SEAS", "WARM WEATHER", "ROGUE WAVE", "FALLING SNOW ICE", "HAZARDOUS SURF", 
  "ICE ON ROAD", "DROWNING")

  stormeventtype <- c("ASTRONOMICAL LOW TIDE", "ASTRONOMICAL LOW TIDE", "WINTER WEATHER", "COASTAL FLOOD", 
  "COASTAL FLOOD", "COLD/WIND CHILL", "COLD/WIND CHILL", "COLD/WIND CHILL", "COLD/WIND CHILL", "COLD/WIND CHILL",
  "DEBRIS FLOW", "DEBRIS FLOW", "DEBRIS FLOW", "DEBRIS FLOW", "DEBRIS FLOW", "DEBRIS FLOW", "DENSE FOG", 
  "DUST DEVIL", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", 
  "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", 
  "FLOOD", "FLOOD", "FLOOD", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", 
  "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "FROST/FREEZE", "HEAT", 
  "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY RAIN", "HEAVY SNOW",
  "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW",
  "HEAVY SNOW", "HIGH SURF", "HIGH SURF", "HIGH SURF", "HIGH SURF", "HIGH SURF", "HIGH SURF", "HIGH SURF", 
  "HIGH SURF", "HIGH WIND", "HIGH WIND", "HIGH WIND", "HIGH WIND", "HIGH WIND", "HIGH WIND", "HIGH WIND", 
  "ICE STORM", "LAKE-EFFECT SNOW", "MARINE THUNDERSTORM WIND", "OTHER", "OTHER", "STRONG WIND", "STRONG WIND", 
  "STRONG WIND", "STRONG WIND", "STRONG WIND", "STRONG WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", 
  "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", 
  "WILDFIRE", "WILDFIRE", "WILDFIRE", "WILDFIRE", "WINTER WEATHER", "WINTER WEATHER", "WINTER WEATHER", 
  "WINTER WEATHER", "OTHER", "OTHER", "HIGH WIND", "COLD/WIND CHILL", "COLD/WIND CHILL", "HEAT", "HIGH SURF", 
  "COLD/WIND CHILL", "COLD/WIND CHILL", "COLD/WIND CHILL", "FLOOD", "HEAVY RAIN", "HIGH SURF", "HEAVY RAIN", 
  "WINTER WEATHER", "FROST/FREEZE", "COLD/WIND CHILL", "COLD/WIND CHILL", "SLEET", "STORM SURGE/TIDE", 
  "HEAVY RAIN", "COLD/WIND CHILL", "HIGH SURF", "HEAT", "HIGH SURF", "ICE STORM", "HIGH SURF", "SLEET", "OTHER")

  map <- as.data.frame(cbind(rawevtype, stormeventtype))
  map$rawevtype <- as.character(map$rawevtype)
  map$stormeventtype <- as.character(map$stormeventtype)

Use the mapping to assign the standardized storm event type directly into the working data frame w.

  for (i in 1:nrow(map)) {
    matched <- ((map$rawevtype[i] == w$EVTYPE) & is.na(w$STANDARDIZED))
    w$EVTYPE[matched] <- map$stormeventtype[i]
    w$STANDARDIZED[matched] <- TRUE
  }
  x <- w
  w$EVTYPE[is.na(w$STANDARDIZED)] <- "OTHER"
  1. Create a new working dataframe called health to contain only the sum of fatalities and injuries by evtype. Remove all rows with 0 total count.
    health <- aggregate(w$FATALITIES+w$INJURIES, by=list(w$EVTYPE), sum, na.rm=TRUE)
    names(health) <- c("eventtype","totalinjurydeath")
    health <- health[health$totalinjurydeath > 0,]
    health <- health[order(health$totalinjurydeath),]
    health$eventtype <- factor(health$eventtype, levels=as.character(health$eventtype))
    health$scaledvalue <- log(health$totalinjurydeath)^2
  1. Prepare the data to create a new dataframe called damage to contain the sum of property and crop damage by evtype. Remove all rows with 0 total count.
      cropmultiplier <- rep(NA, nrow(w))
      cropmultiplier[which(w$CROPDMGEXP == "M")] <- 1000000
      cropmultiplier[which(w$CROPDMGEXP == "m")] <- 1000000
      cropmultiplier[which(w$CROPDMGEXP == "K")] <- 1000
      cropmultiplier[which(w$CROPDMGEXP == "B")] <- 1000000000
      cropmultiplier[which(w$CROPDMGEXP == "k")] <- 1000
      cropmultiplier[is.na(cropmultiplier)] <- 1

      propmultiplier <- rep(NA, nrow(w))  
      propmultiplier[which(w$PROPDMGEXP =="M")] <- 1000000
      propmultiplier[which(w$PROPDMGEXP =="m")] <- 1000000
      propmultiplier[which(w$PROPDMGEXP =="k")] <- 1000
      propmultiplier[which(w$PROPDMGEXP =="K")] <- 1000
      propmultiplier[which(w$PROPDMGEXP =="B")] <- 1000000000
      propmultiplier[which(w$PROPDMGEXP =="b")] <- 1000000000
      propmultiplier[which(w$PROPDMGEXP =="h")] <- 100
      propmultiplier[which(w$PROPDMGEXP =="H")] <- 100
      propmultiplier[is.na(propmultiplier)] <- 1

      CROPDMGVAL <- w$CROPDMG * cropmultiplier
      PROPDMGVAL <- w$PROPDMG * propmultiplier
      w <- cbind(w, CROPDMGVAL, PROPDMGVAL)

      damage <- aggregate(w$CROPDMGVAL+w$PROPDMGVAL, by=list(w$EVTYPE), sum, na.rm=TRUE)
      names(damage) <- c("eventtype","totaldamage")
      damage <- damage[damage$totaldamage > 0,]

      hrnum <- function(x) {
        if (x < 1000) {
          x
        } else if (x >= 1000 && x < 1000000) {
          paste(round(x/1000,2),"K")
        } else if (x >= 1000000 && x < 1000000000) {
          paste(round(x/1000000,2),"M")
        } else if (x >= 1000000000) {
          paste(round(x/1000000000, 2),"B")
        }
      }

      damage$scaledvalue <- (log(damage$totaldamage))^4
      damage$readablevalue <- NA
      for (i in 1:nrow(damage)) {
        damage$readablevalue[i] <- hrnum(damage$totaldamage[i])
      }

      damage <- damage[order(damage$totaldamage),]
      damage$eventtype <- factor(damage$eventtype, levels=as.character(damage$eventtype))

Results

Q1. “Across the United States, which types of events are most harmful with respect to population health?”

  library(ggplot2)
  g <- ggplot(health, aes(y=scaledvalue, x=eventtype)) 
  g <- g + geom_bar(stat="identity", aes(fill=eventtype))
  g <- g + ggtitle("Population Health Effect of Storm Event Types Across the United States\n(Total Fatalities and Injuries by Event Type)")
  g <- g + ylab("Total Fatalities and Injuries (x=log(total)^2)")
  g <- g + xlab("Standardized Event Types")
  g <- g + coord_flip()
  g <- g + geom_text(aes(label=paste0(format(totalinjurydeath, format="d", big.mark=",")," ")), size=3, hjust=1)
  g <- g + theme_bw()
  g <- g + theme(axis.text.y=element_text(size=8))
  g <- g + theme(axis.text.x=element_blank())
  g <- g + theme(axis.title=element_text(size=10))
  g <- g + theme(plot.title=element_text(size=12))
  g <- g + theme(legend.position="none")
  print(g)

Figure 1. The above figure shows the effect on population health for every type of event. Numbers are based on total fatalities and injuries. X-axis in graph was scaled using the formula log(total)^2 to allow the entire range of values to fit in the plot. Actual values are labelled directly on the bar graph of the corresponding event type.

Q2. Across the United States, which types of events have the greatest economic consequences?

  g <- ggplot(damage, aes(y=scaledvalue, x=eventtype)) 
  g <- g + geom_bar(stat="identity", aes(fill=eventtype))
  g <- g + ggtitle("Economic Effects of Storm Event Types Across The United States\n(Total Crop and Property Damage by Event Type)")
  g <- g + ylab("Total Property and Crop Damage Value (x = log(total value)^4)")
  g <- g + xlab("Standardized Event Types")
  g <- g + coord_flip()
  g <- g + geom_text(aes(label=paste("$",readablevalue," ")), size=3, hjust=1)
  g <- g + theme_bw()
  g <- g + theme(axis.text.y=element_text(size=8))
  g <- g + theme(axis.text.x=element_blank())
  g <- g + theme(axis.title=element_text(size=10))
  g <- g + theme(plot.title=element_text(size=12))
  g <- g + theme(legend.position="none")
  print(g)

Figure 2. The above figure shows the different storm event types and their economic impact in US Dollars. Dollar amounts are based on total property and crop damage. X-axis in graph was scaled using the formula log(total)^4 to allow the entire range of values to fit in the plot. Actual values are labelled directly on the bar graph of the corresponding event type.