Natural Disasters and their Impact On Public Health and Economy Of US

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We are analyzing the NOAA storm database containing data on extreme climate events from 1950 through 2011, to address the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?

The following study provides the answers as:
1. Across the United States, a tornado is the most hazordous climate event with more than 5600 deaths and 91400 injuries.
2. Across the United States, floods have the greatest economic consequences - more than 157 billion USD.

echo = TRUE           
options(scipen = 1)   
library(grid)
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra

Data Processing

The data is from a comma-separated-value file compressed via the bzip2 algorithm to reduce its size, available here.
There is also some documentation of the of the database available. Here you will find how some of the variables are constructed/defined here and here.

First, we read the downloaded csv file.

stormData <- read.csv("repdata_data_StormData.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE)

dim(stormData)
## [1] 902297     37

There are 48 events, as mentioned in the documentation paragraphs 7.1 through 7.48

evts <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")  

As some events are combined events separated with a slash (e.g ‘Hurricane/Typhoon’), regex is used to extract either a combined event (Hurricane/Typhoon) or any part of it (Hurricane or Typhoon)

evts_regex <- c("Astronomical Low Tide|Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill|Extreme Cold|Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze|Frost|Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon|Hurricane|Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind|Marine tstm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind|tstm wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")  

Now, rows are extracted to correspond to the event from the documentation. The relevant columns are:
* EVTYPE -> Type of event
* FATALITIES -> Number of fatalities
* INJURIES -> Number of injuries
* PROPDMG -> Amount of property damage in orders of magnitude
* PROPDMGEXP -> Order of magnitude for property damage (e.g. K for thousands)
* CROPDMG -> Amount of crop damage in orders of magnitude
* PROPDMGEXP -> Order of magnitude for crop damage (e.g. M for millions)

cleanedData <- data.frame(EVTYPE = character(0), FATALITIES = numeric(0), INJURIES = numeric(0), PROPDMG = numeric(0), PROPDMGEXP = character(0), CROPDMG = numeric(0), CROPDMGEXP = character(0))  
for (i in 1:length(evts)) {
    rows <- stormData[grep(evts_regex[i], ignore.case = TRUE, stormData$EVTYPE), ]
    rows <- rows[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
    rows <- cbind(rows, c(rep(evts[i], nrow(rows))))
    cleanedData <- rbind(cleanedData, rows)
}

Cnvert the letter exponents to integers(H = hundreds, K = thousands, M = millions, B= billions)

cleanedData[(cleanedData$PROPDMGEXP == "K" | cleanedData$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
cleanedData[(cleanedData$PROPDMGEXP == "M" | cleanedData$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
cleanedData[(cleanedData$PROPDMGEXP == "B" | cleanedData$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
cleanedData[(cleanedData$CROPDMGEXP == "K" | cleanedData$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
cleanedData[(cleanedData$CROPDMGEXP == "M" | cleanedData$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
cleanedData[(cleanedData$CROPDMGEXP == "B" | cleanedData$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9

The property and crops damage is multiplied by 10 raised to the power of the exponent and compute the combined economic damage (property damage + crops damage) . ‘PROPDMGEXP’ and ‘CROPDMGEXP’ columns which have become unnecessary after conversion are deleted.

suppressWarnings(cleanedData$PROPDMG <- cleanedData$PROPDMG * 10^as.numeric(cleanedData$PROPDMGEXP))  
suppressWarnings(cleanedData$CROPDMG <- cleanedData$CROPDMG * 10^as.numeric(cleanedData$CROPDMGEXP))  
suppressWarnings(TOTECODMG <- cleanedData$PROPDMG + cleanedData$CROPDMG)
cleanedData <- cbind(cleanedData, TOTECODMG)
cleanedData["cleanedEvent"]<-cleanedData["c(rep(evts[i], nrow(rows)))"]
cleanedData <- cleanedData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "cleanedEvent", "TOTECODMG")]

Results

1. Across the United States, which types of events are most harmful with respect to population health?

The Top 10 events for Fatalities is

totalFatalities <- aggregate(FATALITIES ~ cleanedEvent, data = cleanedData, FUN = sum)
totalFatalities  <- totalFatalities [order(totalFatalities $FATALITIES, decreasing = TRUE), ]
print(totalFatalities[1:10, ])  
##               cleanedEvent FATALITIES
## 38                 Tornado       5661
## 19                    Heat       3138
## 11          Excessive Heat       1922
## 14                   Flood       1525
## 13             Flash Flood       1035
## 28               Lightning        817
## 37       Thunderstorm Wind        753
## 33             Rip Current        577
## 12 Extreme cold/Wind Chill        382
## 23               High Wind        299

The Top 10 events for Injuries is

totalInjuries <- aggregate(INJURIES ~ cleanedEvent, data = cleanedData, FUN = sum)
totalInjuries <- totalInjuries[order(totalInjuries$INJURIES, decreasing = TRUE), ]
print(totalInjuries[1:10, ])  
##         cleanedEvent INJURIES
## 38           Tornado    91407
## 37 Thunderstorm Wind     9493
## 19              Heat     9224
## 14             Flood     8604
## 11    Excessive Heat     6525
## 28         Lightning     5232
## 25         Ice Storm     1992
## 13       Flash Flood     1802
## 23         High Wind     1523
## 18              Hail     1467

Following is a pair of graphs of Total Fatalities and Total Injuries caused by these Severe Weather Events.

par(mfrow = c(1, 2), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(totalFatalities[1:10, ]$FATALITIES, las = 3, names.arg = totalFatalities[1:10, ]$cleanedEvent, main = "Weather Events With\n The Top 10 Highest Fatalities", ylab = "Number of Fatalities", col = "grey")
barplot(totalInjuries[1:10, ]$INJURIES, las = 3, names.arg = totalInjuries[1:10, ]$cleanedEvent, main = "Weather Events With\n The Top 10 Highest Injuries", ylab = "Number of Injuries", col = "grey")

We can see that Tornado caused most fatalities and Tornado caused most injuries in the United States.

2. Across the United States, which types of events have the greatest economic consequences?

The Top 10 events for Property Damage is

totalPropDmg <- aggregate(PROPDMG ~ cleanedEvent, data = cleanedData, FUN = sum)
totalPropDmg <- totalPropDmg[order(totalPropDmg$PROPDMG, decreasing = TRUE), ]
print(totalPropDmg[1:10, ])
##         cleanedEvent      PROPDMG
## 14             Flood 168212215588
## 24 Hurricane/Typhoon  85356410010
## 38           Tornado  58603317864
## 18              Hail  17622990956
## 13       Flash Flood  17588791878
## 37 Thunderstorm Wind  11575228673
## 40    Tropical Storm   7714390550
## 45      Winter Storm   6749997251
## 23         High Wind   6166300000
## 44          Wildfire   4865614000

The Top 10 events for Crop Damage is

totalCropDmg <- aggregate(CROPDMG ~ cleanedEvent, data = cleanedData, FUN = sum)
totalCropDmg <- totalCropDmg[order(totalCropDmg$CROPDMG, decreasing = TRUE), ]
print(totalCropDmg[1:10, ])
##               cleanedEvent     CROPDMG
## 8                  Drought 13972621780
## 14                   Flood 12380109100
## 24       Hurricane/Typhoon  5516117800
## 25               Ice Storm  5022113500
## 18                    Hail  3114212870
## 16            Frost/Freeze  1997061000
## 13             Flash Flood  1532197150
## 12 Extreme cold/Wind Chill  1313623000
## 37       Thunderstorm Wind  1255947980
## 19                    Heat   904469280

The Top 10 events for Total Economic Damage is

totalEcoDmg <- aggregate(TOTECODMG ~ cleanedEvent, data = cleanedData,  FUN = sum)
totalEcoDmg <- totalEcoDmg[order(totalEcoDmg$TOTECODMG, decreasing = TRUE), ]
print(totalEcoDmg[1:10, ])
##         cleanedEvent    TOTECODMG
## 14             Flood 157764680787
## 24 Hurricane/Typhoon  44330000800
## 38           Tornado  18172843863
## 18              Hail  11681050140
## 13       Flash Flood   9224527227
## 37 Thunderstorm Wind   7098296330
## 25         Ice Storm   5925150850
## 44          Wildfire   3685468370
## 23         High Wind   3472442200
## 8            Drought   1886667000

And the following are graphs of Total Property Damages, Total Crop Damages and Total Economic Damages caused by these Severe Weather Events.

par(mfrow = c(1, 3), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(totalPropDmg[1:10, ]$PROPDMG/(10^9), las = 3, names.arg = totalPropDmg[1:10, ]$cleanedEvent, main = "Top 10 Events with\n Greatest Property Damages", ylab = "Cost of damages ($ billions)", col = "grey")
barplot(totalCropDmg[1:10, ]$CROPDMG/(10^9), las = 3, names.arg = totalCropDmg[1:10, ]$cleanedEvent, main = "Top 10 Events With\n Greatest Crop Damages", ylab = "Cost of damages ($ billions)", col = "grey")
barplot(totalEcoDmg[1:10, ]$TOTECODMG/(10^9), las = 3, names.arg = totalEcoDmg[1:10, ]$cleanedEvent, main = "Top 10 Events With\n Greatest Economic Damages", ylab = "Cost of damages ($ billions)", col = "grey")

Thus, floods cause the greatest economic consequences.

Conclusion

Thus, a tornado causes the greatest damage to health, and a flood the greatest damage to economy.