Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was analysed to determine which events were harmful to human health and which to the economy. It was noticed that there was a large degree of overlap in the events causing injuries and fatalities. Excessive heat causing the most fatalities and tornadoes causing the most injuries. Different events were found to cause property or crop damage. Hurricanes and storm surges caused most property damage. The storms of 2005 accounting for a vast majority of this damage. While drought and hurricanes caused most crop damage.

Data Processing

The dplyr package was used to analyse the data, tidyr to reformat it, lubridate for dates, xtable for tables and ggplot2 for figures.

suppressMessages({
library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)})
suppressWarnings(library(xtable))

Data processing steps

  1. If the data file does not exist in the project directory it is downloaded.
  2. The data is then read and stored in a data table.
  3. A Date column is added by converting BGN_DATE
  4. A Year column is added by extracting the year from Date
  5. The subset of 15 years from 1996 to 2010 inclusive is taken. This was done as 2010 is the last complete year and records for earlier years are not as complete.
  6. The EVTYPE “TSTM WIND” used up to 2006, was changed to “THUNDERSTORM WIND”, which is used from 2006. Likewise “HURRICANE” was changed to “HURRICANE/TYPHOON”
  7. An incorrect PROPDMGEXP was corrected. The description of the event confirms that the damage was in millions (M) not billions (B)
  8. The PROPDMG and PROPDMGEXP (exponent) columns were combined to give a PropertyDamage column.
  9. The CROPDMG and CROPDMGEXP (exponent) columns were combined to give a CropDamage column.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest <- "FStormData.csv.bz2"
if (!file.exists(dest)) {
    download.file(url = url, destfile = dest)    
}
stormdata <- read.csv(dest, header = TRUE, stringsAsFactors = FALSE)
stormdata$Date <- mdy_hms(stormdata$BGN_DATE)
stormdata$Year <- year(stormdata$Date)
stormdata <- subset(stormdata, Year >= 1996 & Year <= 2010)
stormdata$EVTYPE[stormdata$EVTYPE == "TSTM WIND"] <- "THUNDERSTORM WIND"
stormdata$EVTYPE[stormdata$EVTYPE == "HURRICANE"] <- "HURRICANE/TYPHOON"
stormdata[stormdata$REFNUM == 605943,]$PROPDMGEXP <- "M"
EXP <- c(" " = 1, K = 1000, "M" = 1E6, "B" = 1E9)
stormdata$PROPDMGEXP[stormdata$PROPDMGEXP == ""] <- " "
stormdata$PropertyDamage <- EXP[stormdata$PROPDMGEXP]*stormdata$PROPDMG
stormdata$CROPDMGEXP[stormdata$CROPDMGEXP == ""] <- " "
stormdata$CropDamage <- EXP[stormdata$CROPDMGEXP]*stormdata$CROPDMG

Results

Events most harmful to population health?

The top 5 events causing fatalities and the top 5 events causing injuries were calculated over the entire 15 year period, 1996-2010.

topfatal <- stormdata %>%
    group_by(EVTYPE) %>%
    summarise(Fatalaties = sum(FATALITIES)) %>%
    arrange(desc(Fatalaties))

topfatalEVTYPE <- topfatal$EVTYPE[1:5]

topinjury <- stormdata %>%
    group_by(EVTYPE) %>%
    summarise(Injuries = sum(INJURIES)) %>%
    arrange(desc(Injuries))

topinjuryEVTYPE <- topinjury$EVTYPE[1:5]    

topEVTYPE <- unique(c(topfatalEVTYPE, topinjuryEVTYPE))

harmpopwide <- stormdata %>%
    filter(EVTYPE %in% topEVTYPE) %>%
    group_by(EVTYPE, Year) %>%
    summarise(Fatalaties = sum(FATALITIES), Injuries = sum(INJURIES))

Top Five Events cauaing fatalities

print(xtable(topfatal[1:5,], digits = 0), type = "html")
EVTYPE Fatalaties
1 EXCESSIVE HEAT 1761
2 TORNADO 924
3 FLASH FLOOD 819
4 LIGHTNING 625
5 FLOOD 356

Top Five Events cauaing injuries

print(xtable(topinjury[1:5,], digits = 0), type = "html")
EVTYPE Injuries
1 TORNADO 14504
2 FLOOD 6748
3 EXCESSIVE HEAT 6253
4 THUNDERSTORM WIND 4656
5 LIGHTNING 3947

As may be expected, there is degree of overlap in the events causing fatalities and injuries. Although the order changes.

For comparative purposes, the combined events were used to generate the plot of total casualties by year for the various events.

harmpop <- harmpopwide %>% 
    gather(harm , Total, -EVTYPE, -Year)

ggplot(harmpop, aes(x = Year, y = Total, col = EVTYPE)) +
    geom_line() + facet_grid(harm ~ ., scales = "free_y") +
    labs(title = "Total casulties vs. Year for Weather Phenonema")

Events with the greatest economic consequences?

The top 5 events causing property damage and the top 5 events causing crop damage were calculated over the entire 15 year period, 1996-2010.

topprop <- stormdata %>%
    group_by(EVTYPE) %>%
    summarise(Property = sum(PropertyDamage)) %>%
    arrange(desc(Property))

toppropEVTYPE <- topprop$EVTYPE[1:5]

topcrop <- stormdata %>%
    group_by(EVTYPE) %>%
    summarise(Crop = sum(CropDamage)) %>%
    arrange(desc(Crop))

topcropEVTYPE <- topcrop$EVTYPE[1:5]    

Top Five Events cauaing property damage

print(xtable(topprop[1:5,], digits = 0), type = "html")
EVTYPE Property
1 HURRICANE/TYPHOON 81108159010
2 STORM SURGE 43193536000
3 FLOOD 21342156100
4 TORNADO 14797345010
5 HAIL 14143813870

Top Five Events cauaing crop damage

print(xtable(topcrop[1:5,], digits = 0), type = "html")
EVTYPE Crop
1 DROUGHT 13336292000
2 HURRICANE/TYPHOON 5338782800
3 FLOOD 4819906400
4 HAIL 2393695450
5 EXTREME COLD 1288973000

Unlike fatalities and injuries, there is little overlap, so the damage was then considered separately.

Katerina

The devastating storms of 2005 account (Katrina) caused a huge amount of damage. So that the variation in other event types may be seen, Property damage more than $7 Billion is listed separately and was excluded from the Property Damage Plot.

katerina <- which(stormdata$PropertyDamage > 7e9)
print(xtable(stormdata[katerina,c("Year", "EVTYPE", "PropertyDamage")], digits = 0), type = "html")
Year EVTYPE PropertyDamage
569308 2005 HURRICANE/TYPHOON 10000000000
577675 2005 HURRICANE/TYPHOON 16930000000
577676 2005 STORM SURGE 31300000000
581533 2005 HURRICANE/TYPHOON 7350000000
581535 2005 STORM SURGE 11260000000
damagepropwide <- stormdata[-katerina,] %>%
    filter(EVTYPE %in% toppropEVTYPE) %>%
    group_by(EVTYPE, Year) %>%
    summarise(Property = sum(PropertyDamage))

ggplot(damagepropwide, aes(x = Year, y = Property, col = EVTYPE)) +
    geom_line() +
    labs(title = "Total Property Damage vs. Year for Weather Phenonema", subtitle = "excluding damage more than $7 Billion.", y = "Total Damage ($)")

damagecropwide <- stormdata %>%
    filter(EVTYPE %in% topcropEVTYPE) %>%
    group_by(EVTYPE, Year) %>%
    summarise(Crop = sum(CropDamage))

ggplot(damagecropwide, aes(x = Year, y = Crop, col = EVTYPE)) +
    geom_line() +
    labs(title = "Total Crop Damage vs. Year for Weather Phenonema", y = "Total Damage ($)")