Abstract

The following analysis takes data on the aftermath of disasters collected from the NOAA. Binning the measurements based on common event types, the total impact of each event on human life (fatalities and injuries) and economic impact (property and crop damage) are calculated and ranked. Tornadoes are the most harmful disaster types to human health, and floods cause the most damage to property and crops.

Motivation

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Acquisition

Data was acquired from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, and is described by the NOAA at the following locations:


The NOAA storm data is acquired from the available URL and read.

if(!file.exists("./data")){dir.create("./data")}

if(file.exists("./data/repdata-data-StormData.csv")) {
      print("Data file is present")
      dataRaw <- read.csv("./data/repdata-data-StormData.csv", na.strings=c(""," ","NA"))
} else if (file.exists("./data/repdata-data-StormData.csv.bz2")) {
      print("Data file is present")
      dataRaw <- read.csv("./data/repdata-data-StormData.csv.bz2", na.strings=c(""," ","NA"))   
} else {
      print("Data file is downloading from internet")
      download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="./data/repdata-data-StormData.csv.bz2", method="curl")
      dataRaw <- read.csv("./data/repdata-data-StormData.csv.bz2", na.strings=c(""," ","NA"))
}
## [1] "Data file is present"


A quick look at the raw data shows a few things:

head(dataRaw[,c(1:8, 23:28)])
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1          0       15    25.0          K       0       <NA>
## 2          0        0     2.5          K       0       <NA>
## 3          0        2    25.0          K       0       <NA>
## 4          0        2     2.5          K       0       <NA>
## 5          0        2     2.5          K       0       <NA>
## 6          0        6     2.5          K       0       <NA>
str(dataRaw$EVTYPE)
##  Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...

Data Processing

In order to accurately count the impact of each type of event, the events in EVTYPE need to be converted to uniform formats corresponding to the event types provided in the NOAA documentation, and the amount of damage needs to combine the values and exponents

Subsetting

Since this study is focused on the health and economic impact, a subset of the raw data will be used.

impact <- dataRaw[, c("EVTYPE","FATALITIES","INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]


Damage Computation

The following converion puts all of the DMGEXP values as uppercase characters, swaps out non-number values with a corresponding number value (such as B representing 9), converts to numbers, and uses them to determine the full-length damage value to property and crops.

impact$PROPDMGEXP <- toupper(as.character(impact$PROPDMGEXP))
impact$PROPDMGEXP[is.na(impact$PROPDMGEXP)] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="-"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="?"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="+"] <- "0"
impact$PROPDMGEXP[impact$PROPDMGEXP=="B"] <- "9"
impact$PROPDMGEXP[impact$PROPDMGEXP=="M"] <- "6"
impact$PROPDMGEXP[impact$PROPDMGEXP=="K"] <- "3"
impact$PROPDMGEXP[impact$PROPDMGEXP=="H"] <- "2"
impact$PROPDMGEXP <- as.numeric(impact$PROPDMGEXP)
impact$PROPDMGFULL <- impact$PROPDMG*(10^impact$PROPDMGEXP)

impact$CROPDMGEXP <- toupper(as.character(impact$CROPDMGEXP))
impact$CROPDMGEXP[is.na(impact$CROPDMGEXP)] <- "0"
impact$CROPDMGEXP[impact$CROPDMGEXP=="?"] <- "0"
impact$CROPDMGEXP[impact$CROPDMGEXP=="B"] <- "9"
impact$CROPDMGEXP[impact$CROPDMGEXP=="M"] <- "6"
impact$CROPDMGEXP[impact$CROPDMGEXP=="K"] <- "3"
impact$CROPDMGEXP <- as.numeric(impact$CROPDMGEXP)
impact$CROPDMGFULL <- impact$CROPDMG*(10^impact$CROPDMGEXP)

Event Type Filtering

The event types are mainly entered in a uniform manner, but there are many that contain variations such as:

  • Misspellings
  • Multiple Descriptors
  • Changes in whitespace and separators
  • Abreviations

In order to more accurately describe the data, the EVTYPES were grouped into uniform categories based on some common character strings.

#Convert EVTYPEs to uniform codes
impact$EVTYPE <- toupper(impact$EVTYPE)
impact$EVTYPE <- gsub("* ", "", impact$EVTYPE)
impact$EVTYPE <- gsub("TSTM", "THUNDERSTORM", impact$EVTYPE)
impact$EVTYPE[grepl("LIGHTNING", x=impact$EVTYPE)] <- "Lightning"
impact$EVTYPE[grepl("LIGHTING", x=impact$EVTYPE)] <- "Lightning"
impact$EVTYPE[grepl("MARINETHUN", x=impact$EVTYPE)] <- "Marine Thunderstorm Winds"
impact$EVTYPE[grepl("THUN", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("TUNDER", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("THUD", x=impact$EVTYPE)] <- "Thunderstorm Winds"
impact$EVTYPE[grepl("FLASH", x=impact$EVTYPE)] <- "Flash Flood"
impact$EVTYPE[grepl("FIRE", x=impact$EVTYPE)] <- "Wildfire"
impact$EVTYPE[grepl("VOLC", x=impact$EVTYPE)] <- "Volcanic Ash"
impact$EVTYPE[grepl("MARINEHAIL", x=impact$EVTYPE)] <- "Marine Hail"
impact$EVTYPE[grepl("MARINESTRONG", x=impact$EVTYPE)] <- "Marine Strong Wind"
impact$EVTYPE[grepl("MARINEHIGH", x=impact$EVTYPE)] <- "Marine High Wind"
impact$EVTYPE[grepl("HAIL", x=impact$EVTYPE)] <- "Hail"
impact$EVTYPE[grepl("WATERSP", x=impact$EVTYPE)] <- "Waterspout"
impact$EVTYPE[grepl("SPOUT", x=impact$EVTYPE)] <- "Waterspout"
impact$EVTYPE[grepl("SLIDE", x=impact$EVTYPE)] <- "Debris Flow"
impact$EVTYPE[grepl("STREAM", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("URBAN", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("DRY", x=impact$EVTYPE)] <- "Drought"
impact$EVTYPE[grepl("DROUGHT", x=impact$EVTYPE)] <- "Drought"
impact$EVTYPE[grepl("DUST", x=impact$EVTYPE)] <- "Dust Storm"
impact$EVTYPE[grepl("RIP", x=impact$EVTYPE)] <- "Rip Current"
impact$EVTYPE[grepl("AVA", x=impact$EVTYPE)] <- "Avalanche"
impact$EVTYPE[grepl("EXCESSIVEHEAT", x=impact$EVTYPE)] <- "Excessive Heat"
impact$EVTYPE[grepl("HEAT", x=impact$EVTYPE)] <- "Heat"
impact$EVTYPE[grepl("LOWTI", x=impact$EVTYPE)] <- "Astronomical Low Tide"
impact$EVTYPE[grepl("EXT", x=impact$EVTYPE)] <- "Extreme Cold/Wind Chill"
impact$EVTYPE[grepl("EXCESSIVECOLD", x=impact$EVTYPE)] <- "Extreme Cold/Wind Chill"
impact$EVTYPE[grepl("COLD", x=impact$EVTYPE)] <- "Cold/Wind Chill"
impact$EVTYPE[grepl("CHILL", x=impact$EVTYPE)] <- "Cold/Wind Chill"
impact$EVTYPE[grepl("TORN", x=impact$EVTYPE)] <- "Tornado"
impact$EVTYPE[grepl("TROPICALSTORM", x=impact$EVTYPE)] <- "Tropical Storm"
impact$EVTYPE[grepl("TROPICALDEP", x=impact$EVTYPE)] <- "Tropical Depression"
impact$EVTYPE[grepl("SLEET", x=impact$EVTYPE)] <- "Sleet"
impact$EVTYPE[grepl("HURRICANE", x=impact$EVTYPE)] <- "Hurricane (Typhoon)"
impact$EVTYPE[grepl("TYPH", x=impact$EVTYPE)] <- "Hurricane (Typhoon)"
impact$EVTYPE[grepl("BLIZ", x=impact$EVTYPE)] <- "Blizzard"
impact$EVTYPE[grepl("COASTALFL", x=impact$EVTYPE)] <- "Coastal Flood"
impact$EVTYPE[grepl("CSTL", x=impact$EVTYPE)] <- "Coastal Flood"
impact$EVTYPE[grepl("SURGE", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("TIDAL", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("HIGHTIDE", x=impact$EVTYPE)] <- "Storm Surge/Tide"
impact$EVTYPE[grepl("LAKESHOREFLOOD", x=impact$EVTYPE)] <- "Lakeshore Flood"
impact$EVTYPE[grepl("LAKEFLOOD", x=impact$EVTYPE)] <- "Lakeshore Flood"
impact$EVTYPE[grepl("HEAVYRAIN", x=impact$EVTYPE)] <- "Heavy Rain"
impact$EVTYPE[grepl("ICESTO", x=impact$EVTYPE)] <- "Ice Storm"
impact$EVTYPE[grepl("FLOOD", x=impact$EVTYPE)] <- "Flood"
impact$EVTYPE[grepl("ICE", x=impact$EVTYPE)] <- "Ice Storm"
impact$EVTYPE[grepl("WINTERSTO", x=impact$EVTYPE)] <- "Winter Storm"
impact$EVTYPE[grepl("WINTERWEAT", x=impact$EVTYPE)] <- "Winter Weather"
impact$EVTYPE[grepl("TSUNA", x=impact$EVTYPE)] <- "Tsunami"
impact$EVTYPE[grepl("LAKE", x=impact$EVTYPE)] <- "Lake-Effect Snow"
impact$EVTYPE[grepl("WIND", x=impact$EVTYPE)] <- "Wind"
impact$EVTYPE[grepl("HEAVYSNOW", x=impact$EVTYPE)] <- "Heavy Snow"
impact$EVTYPE[grepl("FREEZING", x=impact$EVTYPE)] <- "Sleet"
impact$EVTYPE[grepl("FREEZ", x=impact$EVTYPE)] <- "Frost/Freeze"
impact$EVTYPE[grepl("FROST", x=impact$EVTYPE)] <- "Frost/Freeze"

Analysis

The first analysis looks at the harm inflicted on people as injuries or fatalities. Injuries and fatalities are summed by each event type and stored in a new data frame. Event types with the 10 most fatalities are taken. The columns of the data frame for the types of human impact are condensed into 1 column with the harm (fatality, injury, total) as a factor.

eventHarm <- aggregate(FATALITIES~EVTYPE,impact, FUN=sum)
eventHarm <- merge(eventHarm, aggregate(INJURIES~EVTYPE,impact, FUN=sum))
eventHarm$TotalHarmed <- eventHarm$FATALITIES + eventHarm$INJURIES
eventHarm <- eventHarm[order(-eventHarm$FATALITIES)[1:10],]
library(reshape2)
Harm <- melt(eventHarm, id.vars="EVTYPE", variable.name = "harmType")

eventHarm
##                      EVTYPE FATALITIES INJURIES TotalHarmed
## 249                 Tornado       5633    91364       96997
## 36           Excessive Heat       1920     6525        8445
## 58                     Heat       1212     2684        3896
## 44              Flash Flood       1035     1802        2837
## 99                Lightning        817     5232        6049
## 248      Thunderstorm Winds        737     9510       10247
## 159             Rip Current        577      529        1106
## 45                    Flood        511     6873        7384
## 283                    Wind        441     1910        2351
## 42  Extreme Cold/Wind Chill        305      260         565


The second analysis looks at the damage inflicted on property and crops in monetary values. Property and crop damages are summed by each event type and stored in a new data frame. Event types with the 10 largest property damages are taken. The columns of the data frame for the types of damage are condensed into 1 column with the damage (property, crop, total) as a factor.

eventDamage <- aggregate(PROPDMGFULL~EVTYPE,impact, FUN=sum)
eventDamage <- merge(eventDamage, aggregate(CROPDMGFULL~EVTYPE,impact, FUN=sum))
eventDamage$TotalDamage <- eventDamage$PROPDMGFULL + eventDamage$CROPDMGFULL
eventDamage <- eventDamage[order(-eventDamage$PROPDMGFULL)[1:10],]
Damage <- melt(eventDamage, id.vars="EVTYPE", variable.name = "damageType")

eventDamage
##                  EVTYPE  PROPDMGFULL CROPDMGFULL  TotalDamage
## 45                Flood 150208839377 10855961050 161064800427
## 83  Hurricane (Typhoon)  85356410010  5516117800  90872527810
## 249             Tornado  56952152376   414961470  57367113846
## 180    Storm Surge/Tide  47974662150      855000  47975517150
## 44          Flash Flood  17589312096  1532197150  19121509246
## 56                 Hail  15977560513  3046887623  19024448136
## 248  Thunderstorm Winds  12779403800  1274158988  14053562788
## 282            Wildfire   8496628500   403281630   8899910130
## 253      Tropical Storm   7714390550   694896000   8409286550
## 284        Winter Storm   6749497251    32444000   6781941251

Entries in the data are cleaned up to make figures look better.

Harm$harmType <- gsub("FATALITIES", "Fatalities", Harm$harmType)
Harm$harmType <- gsub("INJURIES", "Injuries", Harm$harmType)
Harm$harmType <- gsub("TotalHarm", "Total Harm", Harm$harmType)


Damage$damageType <- gsub("PROPDMGFULL", "Property Damage", Damage$damageType)
Damage$damageType <- gsub("CROPDMGFULL", "Crop Damage", Damage$damageType)
Damage$damageType <- gsub("TotalDamage", "Total Damage", Damage$damageType)

Results

Health Impact

The plot below shows the impact on human health based on the number of fatalities, injuries, and the sum of both. Based on this plot we see tornadoes cause the most harm to people.

library(ggplot2)
ggplot(Harm, aes(x=reorder(EVTYPE, -value), y=value)) + 
   geom_bar(stat="identity", aes(fill=harmType), position="dodge") +
   xlab("Disaster Type") +
   ylab("Number of People") +
   ggtitle("Top 10 Disaster Types with the Most Fatalities") + 
   theme(axis.text.x = element_text(angle=45, hjust=1))

Financial Impact

The plot below shows the economic impact based on the cost of damages to property, crops, and the sum of both. Based on this plot we see floods cause the most property damage.

ggplot(Damage, aes(x=reorder(EVTYPE, -value), y=value)) + 
   geom_bar(stat="identity", aes(fill=damageType), position="dodge") +
   xlab("Disaster Type") +
   ylab("Damage Caused ($)") +
   ggtitle("Top 10 Disaster Types with the Most Property Damage") + 
   theme(axis.text.x = element_text(angle=45, hjust=1))