This document aims to assist in the decision making for the preparation for severe weather events and the prioritization of resources for different types of events. In a nutshell we are looking to answer these questions:
This analysis is based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
# Load required libraries
library(ggplot2)
library(dplyr)
library(reshape2)
library(lubridate)
The Data for this analysis was obtained from the Coursera.org website as part of the Reproducible Research Course.
Storm Data [47Mb]
First we need to load the data into R.
# Download the data if file does not already exists and load a dataframe
if (!file.exists("data.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile="data.csv.bz2")
}
data <- read.table("data.csv.bz2", header = TRUE, sep = ",", stringsAsFactors = FALSE, na.strings="NA")
Premiliminary look at the data available:
dim(data)
## [1] 902297 37
The data has 37 columns with a differentiated value for each column. The data has 902297 observations.
The key values we will sue for this analysis are:
As a simple analysis will be findign the top events based on the highest health consecuences (FATALITIES AND INJURIES) and economic consecuences (PROPERTY AND CROP DAMAGE) of the events reported in the data.
First we need to agregate the EVTYPE value to generalize events like Hurricanes as a group and not only look at each hurricane by its name. Also standirize the names of events to group them together approperly.
data %>%
mutate (EVENT = as.character(EVTYPE)) %>%
mutate (EVENT = ifelse (grepl("Hurricane|HURRICANE", EVTYPE), "HURRICANE", EVENT)) %>%
mutate (EVENT = ifelse (grepl("TSTM WIND", EVTYPE), "THUNDERSTORM WIND", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "THUNDERSTORM WINDS", "THUNDERSTORM WIND", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "HIGH WINDS", "HIGH WIND", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "WILD/FOREST FIRE", "WILDFIRE", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "FLASH FLOODING", "FLASH FLOOD", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "URBAN/SML STREAM FLD", "FLOOD", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "EXTREME COLD", "EXTREME COLD/WIND CHILL", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "FLOODING", "FLOOD", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "RIP CURRENTS", "RIP CURRENT", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "HEAT WAVE", "EXCESSIVE HEAT", EVENT)) %>%
mutate (EVENT = ifelse (EVTYPE == "EXTREME HEAT", "EXCESSIVE HEAT", EVENT)) %>%
select (EVENT,FATALITIES,INJURIES,PROPDMG,CROPDMG) -> cleanData
With the event names standarized we can now count the total human and economica cost for each event.
casualties <- aggregate(cleanData$FATALITIES, by = list(cleanData$EVENT), sum)
injuries <- aggregate(cleanData$INJURIES, by = list(cleanData$EVENT), sum)
casualties[,3] <- injuries[,2]
casualties[,4] <- (casualties[,2] + casualties[,3])
colnames(casualties) <- c("Event", "Fatalities", "Injuries", "Casualties")
casualties <- casualties[sort.list(casualties[,4], decreasing = TRUE),]
topHuman <- head(casualties[,1:3], 10)
economic <- aggregate(cleanData$PROPDMG, by = list(cleanData$EVENT), sum)
crop <- aggregate(cleanData$CROPDMG, by = list(cleanData$EVENT), sum)
economic[,3] <- crop[,2]
economic[,4] <- economic[,2] + economic[,3]
colnames(economic) <- c("Event", "Property", "Crop", "Total")
economic <- economic[sort.list(economic[,4], decreasing = TRUE),]
topEconomic <- head(economic[,1:3], 10)
The top ten storm events with the highest human cost are:
print(topHuman)
## Event Fatalities Injuries
## 812 TORNADO 5633 91346
## 739 THUNDERSTORM WIND 741 9461
## 128 EXCESSIVE HEAT 2171 6989
## 165 FLOOD 504 6870
## 447 LIGHTNING 816 5230
## 269 HEAT 937 2100
## 149 FLASH FLOOD 997 1785
## 410 ICE STORM 89 1975
## 352 HIGH WIND 283 1439
## 913 WILDFIRE 87 1456
topHuman2 <- melt(topHuman, id.var="Event")
ggplot(topHuman2, aes(x = Event, y = value, fill = variable)) + geom_bar(stat = "identity")
The top ten storm events with the highest economic cost are:
topEconomic
## Event Property Crop
## 812 TORNADO 3212258.2 100018.52
## 739 THUNDERSTORM WIND 2670571.7 199038.13
## 149 FLASH FLOOD 1448621.7 184326.51
## 238 HAIL 688693.4 579596.28
## 165 FLOOD 931454.3 174192.68
## 447 LIGHTNING 603351.8 3580.61
## 352 HIGH WIND 380356.6 19042.81
## 928 WINTER STORM 132720.6 1978.99
## 913 WILDFIRE 123804.3 8553.74
## 303 HEAVY SNOW 122252.0 2165.72
topEconomic2 <- melt(topEconomic, id.var="Event")
ggplot(topEconomic2, aes(x = Event, y = value, fill = variable)) + geom_bar(stat = "identity")
Since tornados are clearly the most damaging in human and economica cost we will look at the States most affected by tornados since 2000
par (pin=c(15,10))
data<- mutate (data,YEAR = year(strptime(BGN_DATE, "%m/%d/%Y %H:%M:%S")))
k <- ggplot(data = filter (data, YEAR >= 2000), aes(STATE, fill=factor(YEAR)))
k + labs(title = "Tornados by State since 2000", y = "Torandos")+theme(legend.position="bottom")+geom_bar()
Texas in Kansas are the states most affected by Tornados