Synopsys

This document aims to assist in the decision making for the preparation for severe weather events and the prioritization of resources for different types of events. In a nutshell we are looking to answer these questions:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

This analysis is based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

# Load required libraries
library(ggplot2)
library(dplyr)
library(reshape2)
library(lubridate)

The Data for this analysis was obtained from the Coursera.org website as part of the Reproducible Research Course.

Storm Data [47Mb]

First we need to load the data into R.

# Download the data if file does not already exists and load a dataframe
if (!file.exists("data.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile="data.csv.bz2")
    
}

data <- read.table("data.csv.bz2", header = TRUE, sep = ",", stringsAsFactors = FALSE, na.strings="NA")

Premiliminary look at the data available:

dim(data)
## [1] 902297     37

The data has 37 columns with a differentiated value for each column. The data has 902297 observations.

The key values we will sue for this analysis are:

As a simple analysis will be findign the top events based on the highest health consecuences (FATALITIES AND INJURIES) and economic consecuences (PROPERTY AND CROP DAMAGE) of the events reported in the data.

First we need to agregate the EVTYPE value to generalize events like Hurricanes as a group and not only look at each hurricane by its name. Also standirize the names of events to group them together approperly.

data %>%
    mutate (EVENT = as.character(EVTYPE)) %>%
    mutate (EVENT = ifelse (grepl("Hurricane|HURRICANE", EVTYPE), "HURRICANE", EVENT)) %>%
    mutate (EVENT = ifelse (grepl("TSTM WIND", EVTYPE), "THUNDERSTORM WIND", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "THUNDERSTORM WINDS", "THUNDERSTORM WIND", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "HIGH WINDS", "HIGH WIND", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "WILD/FOREST FIRE", "WILDFIRE", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "FLASH FLOODING", "FLASH FLOOD", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "URBAN/SML STREAM FLD", "FLOOD", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "EXTREME COLD", "EXTREME COLD/WIND CHILL", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "FLOODING", "FLOOD", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "RIP CURRENTS", "RIP CURRENT", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "HEAT WAVE", "EXCESSIVE HEAT", EVENT)) %>%
    mutate (EVENT = ifelse (EVTYPE == "EXTREME HEAT", "EXCESSIVE HEAT", EVENT)) %>%
    select (EVENT,FATALITIES,INJURIES,PROPDMG,CROPDMG) -> cleanData

Results

With the event names standarized we can now count the total human and economica cost for each event.

casualties <- aggregate(cleanData$FATALITIES, by = list(cleanData$EVENT), sum)
injuries <- aggregate(cleanData$INJURIES, by = list(cleanData$EVENT), sum)
casualties[,3] <- injuries[,2]
casualties[,4] <- (casualties[,2] + casualties[,3])
colnames(casualties) <- c("Event", "Fatalities", "Injuries", "Casualties")
casualties <- casualties[sort.list(casualties[,4], decreasing = TRUE),]
topHuman <- head(casualties[,1:3], 10)

economic <- aggregate(cleanData$PROPDMG, by = list(cleanData$EVENT), sum)
crop <- aggregate(cleanData$CROPDMG, by = list(cleanData$EVENT), sum)
economic[,3] <- crop[,2]
economic[,4] <- economic[,2] + economic[,3]
colnames(economic) <- c("Event", "Property", "Crop", "Total")
economic <- economic[sort.list(economic[,4], decreasing = TRUE),]
topEconomic <- head(economic[,1:3], 10)

Human Cost

The top ten storm events with the highest human cost are:

print(topHuman)
##                 Event Fatalities Injuries
## 812           TORNADO       5633    91346
## 739 THUNDERSTORM WIND        741     9461
## 128    EXCESSIVE HEAT       2171     6989
## 165             FLOOD        504     6870
## 447         LIGHTNING        816     5230
## 269              HEAT        937     2100
## 149       FLASH FLOOD        997     1785
## 410         ICE STORM         89     1975
## 352         HIGH WIND        283     1439
## 913          WILDFIRE         87     1456
topHuman2 <- melt(topHuman, id.var="Event")

ggplot(topHuman2, aes(x = Event, y = value, fill = variable)) + geom_bar(stat = "identity")

Economic Cost

The top ten storm events with the highest economic cost are:

topEconomic
##                 Event  Property      Crop
## 812           TORNADO 3212258.2 100018.52
## 739 THUNDERSTORM WIND 2670571.7 199038.13
## 149       FLASH FLOOD 1448621.7 184326.51
## 238              HAIL  688693.4 579596.28
## 165             FLOOD  931454.3 174192.68
## 447         LIGHTNING  603351.8   3580.61
## 352         HIGH WIND  380356.6  19042.81
## 928      WINTER STORM  132720.6   1978.99
## 913          WILDFIRE  123804.3   8553.74
## 303        HEAVY SNOW  122252.0   2165.72
topEconomic2 <- melt(topEconomic, id.var="Event")

ggplot(topEconomic2, aes(x = Event, y = value, fill = variable)) + geom_bar(stat = "identity")

Since tornados are clearly the most damaging in human and economica cost we will look at the States most affected by tornados since 2000

par (pin=c(15,10))
data<- mutate (data,YEAR = year(strptime(BGN_DATE, "%m/%d/%Y %H:%M:%S")))
k <- ggplot(data = filter (data, YEAR >= 2000), aes(STATE, fill=factor(YEAR)))
k + labs(title = "Tornados by State since 2000", y = "Torandos")+theme(legend.position="bottom")+geom_bar()

Texas in Kansas are the states most affected by Tornados