This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. We are analyzing the NOAA storm database containing data on extreme climate events from 1950 through 2011, to address the following questions:
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
2. Across the United States, which types of events have the greatest economic consequences?
The following study provides the answers as:
1. Across the United States, a tornado is the most hazordous climate event with more than 5600 deaths and 91400 injuries.
2. Across the United States, floods have the greatest economic consequences - more than 157 billion USD.
echo = TRUE
options(scipen = 1)
library(grid)
library(ggplot2)
library(plyr)
require(gridExtra)
## Loading required package: gridExtra
The data is from a comma-separated-value file compressed via the bzip2 algorithm to reduce its size, available here.
There is also some documentation of the of the database available. Here you will find how some of the variables are constructed/defined here and here.
First, we read the downloaded csv file.
stormData <- read.csv("repdata_data_StormData.csv", sep = ",", header = TRUE, stringsAsFactors = FALSE)
dim(stormData)
## [1] 902297 37
There are 48 events, as mentioned in the documentation paragraphs 7.1 through 7.48
evts <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
As some events are combined events separated with a slash (e.g ‘Hurricane/Typhoon’), regex is used to extract either a combined event (Hurricane/Typhoon) or any part of it (Hurricane or Typhoon)
evts_regex <- c("Astronomical Low Tide|Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme cold/Wind Chill|Extreme Cold|Wind Chill", "Flash Flood", "Flood", "Freezing", "Frost/Freeze|Frost|Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon|Hurricane|Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind|Marine tstm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind|tstm wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
Now, rows are extracted to correspond to the event from the documentation. The relevant columns are:
* EVTYPE -> Type of event
* FATALITIES -> Number of fatalities
* INJURIES -> Number of injuries
* PROPDMG -> Amount of property damage in orders of magnitude
* PROPDMGEXP -> Order of magnitude for property damage (e.g. K for thousands)
* CROPDMG -> Amount of crop damage in orders of magnitude
* PROPDMGEXP -> Order of magnitude for crop damage (e.g. M for millions)
cleanedData <- data.frame(EVTYPE = character(0), FATALITIES = numeric(0), INJURIES = numeric(0), PROPDMG = numeric(0), PROPDMGEXP = character(0), CROPDMG = numeric(0), CROPDMGEXP = character(0))
for (i in 1:length(evts)) {
rows <- stormData[grep(evts_regex[i], ignore.case = TRUE, stormData$EVTYPE), ]
rows <- rows[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
rows <- cbind(rows, c(rep(evts[i], nrow(rows))))
cleanedData <- rbind(cleanedData, rows)
}
Cnvert the letter exponents to integers(H = hundreds, K = thousands, M = millions, B= billions)
cleanedData[(cleanedData$PROPDMGEXP == "K" | cleanedData$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
cleanedData[(cleanedData$PROPDMGEXP == "M" | cleanedData$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
cleanedData[(cleanedData$PROPDMGEXP == "B" | cleanedData$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
cleanedData[(cleanedData$CROPDMGEXP == "K" | cleanedData$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
cleanedData[(cleanedData$CROPDMGEXP == "M" | cleanedData$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
cleanedData[(cleanedData$CROPDMGEXP == "B" | cleanedData$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9
The property and crops damage is multiplied by 10 raised to the power of the exponent and compute the combined economic damage (property damage + crops damage) . ‘PROPDMGEXP’ and ‘CROPDMGEXP’ columns which have become unnecessary after conversion are deleted.
suppressWarnings(cleanedData$PROPDMG <- cleanedData$PROPDMG * 10^as.numeric(cleanedData$PROPDMGEXP))
suppressWarnings(cleanedData$CROPDMG <- cleanedData$CROPDMG * 10^as.numeric(cleanedData$CROPDMGEXP))
suppressWarnings(TOTECODMG <- cleanedData$PROPDMG + cleanedData$CROPDMG)
cleanedData <- cbind(cleanedData, TOTECODMG)
cleanedData["cleanedEvent"]<-cleanedData["c(rep(evts[i], nrow(rows)))"]
cleanedData <- cleanedData[, c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "CROPDMG", "cleanedEvent", "TOTECODMG")]
The Top 10 events for Fatalities is
totalFatalities <- aggregate(FATALITIES ~ cleanedEvent, data = cleanedData, FUN = sum)
totalFatalities <- totalFatalities [order(totalFatalities $FATALITIES, decreasing = TRUE), ]
print(totalFatalities[1:10, ])
## cleanedEvent FATALITIES
## 38 Tornado 5661
## 19 Heat 3138
## 11 Excessive Heat 1922
## 14 Flood 1525
## 13 Flash Flood 1035
## 28 Lightning 817
## 37 Thunderstorm Wind 753
## 33 Rip Current 577
## 12 Extreme cold/Wind Chill 382
## 23 High Wind 299
The Top 10 events for Injuries is
totalInjuries <- aggregate(INJURIES ~ cleanedEvent, data = cleanedData, FUN = sum)
totalInjuries <- totalInjuries[order(totalInjuries$INJURIES, decreasing = TRUE), ]
print(totalInjuries[1:10, ])
## cleanedEvent INJURIES
## 38 Tornado 91407
## 37 Thunderstorm Wind 9493
## 19 Heat 9224
## 14 Flood 8604
## 11 Excessive Heat 6525
## 28 Lightning 5232
## 25 Ice Storm 1992
## 13 Flash Flood 1802
## 23 High Wind 1523
## 18 Hail 1467
Following is a pair of graphs of Total Fatalities and Total Injuries caused by these Severe Weather Events.
par(mfrow = c(1, 2), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(totalFatalities[1:10, ]$FATALITIES, las = 3, names.arg = totalFatalities[1:10, ]$cleanedEvent, main = "Weather Events With\n The Top 10 Highest Fatalities", ylab = "Number of Fatalities", col = "grey")
barplot(totalInjuries[1:10, ]$INJURIES, las = 3, names.arg = totalInjuries[1:10, ]$cleanedEvent, main = "Weather Events With\n The Top 10 Highest Injuries", ylab = "Number of Injuries", col = "grey")
We can see that Tornado caused most fatalities and Tornado caused most injuries in the United States.
The Top 10 events for Property Damage is
totalPropDmg <- aggregate(PROPDMG ~ cleanedEvent, data = cleanedData, FUN = sum)
totalPropDmg <- totalPropDmg[order(totalPropDmg$PROPDMG, decreasing = TRUE), ]
print(totalPropDmg[1:10, ])
## cleanedEvent PROPDMG
## 14 Flood 168212215588
## 24 Hurricane/Typhoon 85356410010
## 38 Tornado 58603317864
## 18 Hail 17622990956
## 13 Flash Flood 17588791878
## 37 Thunderstorm Wind 11575228673
## 40 Tropical Storm 7714390550
## 45 Winter Storm 6749997251
## 23 High Wind 6166300000
## 44 Wildfire 4865614000
The Top 10 events for Crop Damage is
totalCropDmg <- aggregate(CROPDMG ~ cleanedEvent, data = cleanedData, FUN = sum)
totalCropDmg <- totalCropDmg[order(totalCropDmg$CROPDMG, decreasing = TRUE), ]
print(totalCropDmg[1:10, ])
## cleanedEvent CROPDMG
## 8 Drought 13972621780
## 14 Flood 12380109100
## 24 Hurricane/Typhoon 5516117800
## 25 Ice Storm 5022113500
## 18 Hail 3114212870
## 16 Frost/Freeze 1997061000
## 13 Flash Flood 1532197150
## 12 Extreme cold/Wind Chill 1313623000
## 37 Thunderstorm Wind 1255947980
## 19 Heat 904469280
The Top 10 events for Total Economic Damage is
totalEcoDmg <- aggregate(TOTECODMG ~ cleanedEvent, data = cleanedData, FUN = sum)
totalEcoDmg <- totalEcoDmg[order(totalEcoDmg$TOTECODMG, decreasing = TRUE), ]
print(totalEcoDmg[1:10, ])
## cleanedEvent TOTECODMG
## 14 Flood 157764680787
## 24 Hurricane/Typhoon 44330000800
## 38 Tornado 18172843863
## 18 Hail 11681050140
## 13 Flash Flood 9224527227
## 37 Thunderstorm Wind 7098296330
## 25 Ice Storm 5925150850
## 44 Wildfire 3685468370
## 23 High Wind 3472442200
## 8 Drought 1886667000
And the following are graphs of Total Property Damages, Total Crop Damages and Total Economic Damages caused by these Severe Weather Events.
par(mfrow = c(1, 3), mar = c(15, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(totalPropDmg[1:10, ]$PROPDMG/(10^9), las = 3, names.arg = totalPropDmg[1:10, ]$cleanedEvent, main = "Top 10 Events with\n Greatest Property Damages", ylab = "Cost of damages ($ billions)", col = "grey")
barplot(totalCropDmg[1:10, ]$CROPDMG/(10^9), las = 3, names.arg = totalCropDmg[1:10, ]$cleanedEvent, main = "Top 10 Events With\n Greatest Crop Damages", ylab = "Cost of damages ($ billions)", col = "grey")
barplot(totalEcoDmg[1:10, ]$TOTECODMG/(10^9), las = 3, names.arg = totalEcoDmg[1:10, ]$cleanedEvent, main = "Top 10 Events With\n Greatest Economic Damages", ylab = "Cost of damages ($ billions)", col = "grey")
Thus, floods cause the greatest economic consequences.
Thus, a tornado causes the greatest damage to health, and a flood the greatest damage to economy.