Synopsis

This is an analysis of sample data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. In this brief analysis I will seek to present the greatest impacts of these weather events from a health and economical perspective.Two dichotomous plots are used for the aforementioned perspective.Both the source code and results are shown in this document.

  library(knitr)
  library(dplyr)
  library(plyr)
  opts_chunk$set(echo=TRUE)

Data Processing

Pre-Processing

This block of code is used to ensure full automation. The dataset will be downloaded from the online source only if it does not exist in the current working directory. The data is also loaded into a R Data Frame.

  if (! file.exists('stormData.csv.bz2')){
    download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2',destfile =
                    'stormData.csv.bz2')
  }

  if(!exists('stormData')){
    storm <- read.csv(bzfile("stormData.csv.bz2"))
  }

Population Health

This block of code creates the matricies that will be used to analyze the fatalities and injuries caused by the weather events. The first step is to sum by the Event Types and then store in 2 matricies. I have only taken the top 5 so we can focus on the most dangerous ones.

  casualties <- ddply(storm, .(EVTYPE), summarize,
                      fatalities = sum(FATALITIES),
                      injuries = sum(INJURIES))
  
  # Find events that caused most death and injury
  fatal_events <- head(casualties[order(casualties$fatalities, decreasing = T), ], 5)
  injury_events <- head(casualties[order(casualties$injuries, decreasing = T), ], 5)

Economic Consequences

  1. Standardize the values for analysis
  2. Sum by event
  3. Store top 5 in matricies
#1
exp_transform <- function(e) {
  # h -> hundred, k -> thousand, m -> million, b -> billion
  if (e %in% c('h', 'H'))
    return(2)
  else if (e %in% c('k', 'K'))
    return(3)
  else if (e %in% c('m', 'M'))
    return(6)
  else if (e %in% c('b', 'B'))
    return(9)
  else if (!is.na(as.numeric(e))) # if a digit
    return(as.numeric(e))
  else if (e %in% c('', '-', '?', '+'))
    return(0)
  else {
    stop("Invalid exponent value.")
  }
}

prop_dmg_exp <- sapply(storm$PROPDMGEXP, FUN=exp_transform)
storm$prop_dmg <- storm$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(storm$CROPDMGEXP, FUN=exp_transform)
storm$crop_dmg <- storm$CROPDMG * (10 ** crop_dmg_exp)


#2
econ_loss <- ddply(storm, .(EVTYPE), summarize,
                   prop_dmg = sum(prop_dmg),
                   crop_dmg = sum(crop_dmg))

#3
econ_loss <- econ_loss[(econ_loss$prop_dmg > 0 | econ_loss$crop_dmg > 0), ]
prop_dmg_events <- head(econ_loss[order(econ_loss$prop_dmg, decreasing = T), ], 5)
crop_dmg_events <- head(econ_loss[order(econ_loss$crop_dmg, decreasing = T),  ], 5)

Results

Population Health

Show data in pie chart

    #Generate the plot
par(mfrow = c(2, 1), mar = c(0, 0, 2, 0), oma = c(0, 0, 0, 0))
pie(fatal_events$fatalities, main="Top 5 Most Fatal Events", labels=fatal_events$EVTYPE
      , col=c(2:6))
pie(injury_events$injuries, main="Top 5 Events Resulting in Injury", labels=injury_events$EVTYPE
     ,col=c(7:12))
box(lty = '1373', which="outer")

Figure 1 - Top 5 fatalities and injuries

Results show that tornado is by far the most dangerous natural disaster. There are atleast 50% more fatalities and injuries caused by tornadoes than that of any other event.

Economic Consequences

Show data as horizontal barplot

#most_econ_dmg <- rbind(prop_dmg_events, crop_dmg_events)
par(mfrow = c(2, 1), mar = c(4.5, 11, 2, 0.5), oma = c(0, 0, 2, 0))
barplot(log10(prop_dmg_events$prop_dmg), names.arg=prop_dmg_events$EVTYPE
        ,main="Property Damage"
        ,col="purple"
        ,las=1
        ,horiz=T)
barplot(crop_dmg_events$crop_dmg, names.arg=crop_dmg_events$EVTYPE
        ,main="Crop Damage"
        ,las=2
        ,col="green"
        , horiz=T)
title(main="Events with the greatest economic consequences", outer=T)
box(lty = '1373', which="outer")

Figure 2 - Top 5 property and crop damages

Property damages are given in logarithmic scale due to large range of values. The property damages are perhaps the least skewed of this analysis. Flashflood has done the most damage but there is not a major difference in the loss incurred by the top 5 events. Unsurprisngly drought has caused the most damage to crops.

Conslusion

If you see a tornado coming, RUN! Analysis has shown that tornadoes are the most dangerous natural disaster in the US.