This document aims to enhance decision makers with analyzed information on the impacts of different natural disasters. In particular, this document aims to answer two questions: 1. Which types of events are most harmful with respect to population health? 2. Which types of events have the greatest economic consequences?
The analysis is based on the NOAA Storm Database, which contains detailed information on fatalities, injuries and economic damage of various events between 1950-2011. The raw data can be accessed at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. The document at hand contains two sections: 1. Data processing, i.e. How the data has been accessed and transformed to be suitable for analysis 2. Results, i.e. How we can come to our conclusions
The first block of code just makes sure, you have the downloader package installed, and if not it proceeds to install it. Note that the reusults of this code chunk are deliberately hidden, as it is just so boring and takes half of the document.
list.of.packages <- c("downloader", "ggplot2", "timeDate", "gridExtra", "iotools", "dplyr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos="http://cran.rstudio.com/")
library(downloader)
library(ggplot2)
library(timeDate)
library(gridExtra)
library(iotools)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
##
## combine
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
In this section we will be reading downloading the data from the internet address (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2).
destfile="./Sdata.csv.bz2"
if (!file.exists(destfile)){
tempdata<-download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile, method="curl")
}
data <- read.csv("Sdata.csv.bz2")
data<-as.data.frame(data)
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
This question breaks into several sub-questions: 1) What type of events cause the most casualities 2) What type of events cause the most injuries 3) What types of events happen most often?
From below, it is pretty clear that tornados are the most harmful events with respect to population health, as they cause fatalities and injuries the most. (And they happens pretty often as well.) The second most dangerous event is excessive heat, as it also causes fatalities and injuries. (And it happen pretty often too.)
#1) What type of events cause fatalities the most
fatalitiesData<-aggregate(data$FATALITIES, by=list(EVTYPE=data$EVTYPE), sum)
fatalitiesGraphData<-head(arrange(fatalitiesData, desc(x)))
fatalitiesGraphData
## EVTYPE x
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
#2) What type of events cause injuries the most
injuriesData<-aggregate(data$INJURIES, by=list(EVTYPE=data$EVTYPE), sum)
injuriesGraphData<-head(arrange(injuriesData, desc(x)))
injuriesGraphData
## EVTYPE x
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
#3) What types of events happen most often?
eventCount<-aggregate(data$EVTYPE, by=list(EVTYPE=data$EVTYPE), length)
head(arrange(eventCount, desc(x)))
## EVTYPE x
## 1 HAIL 288661
## 2 TSTM WIND 219940
## 3 THUNDERSTORM WIND 82563
## 4 TORNADO 60652
## 5 FLASH FLOOD 54277
## 6 FLOOD 25326
Let’s illustrate our findings with a double bar chart:
library(gridExtra)
fatalitiesGraph<-ggplot(fatalitiesGraphData, aes(EVTYPE, x)) + ggtitle("Total Fatalities: 1950-2011") + geom_bar(stat="identity") +
xlab("event type") + ylab("total fatalities")
injuriesGraph<-ggplot(injuriesGraphData, aes(EVTYPE, x)) + ggtitle("Total Injuries: 1950-2011") + geom_bar(stat="identity") +
xlab("event type") + ylab("total injuries")
grid.arrange(fatalitiesGraph, injuriesGraph, ncol=1, nrow=2)
Historically, floods has caused damages raising as high as 150 billion US dollars and we can therefore conclude flood having had the greatest economic consequences. Slightly behind are trailing hurricanes/typoons, tornados, storm surges, hails, flash floods, droughts, hurricanes and river floods, which over history have each caused damage over 10 billion US Dollars.
To answer the questin, we must first answer two sub-questins: 1) What type of events cause the most property damage(PROPDMG) 2) What type of events cause the most crop damage (CROPDMG)
It would be pretty easy to extract that information from the database if they were not using symbolic indicators of magnitude. They use the following symbols: -K: Thousands (1,000) -M: Millions (1,000,000) -B: Billions (1,000,000,000)
So here we go:
#Let's first create a function, whicn will transform the damage scale indicator into numerical values
transform<-function(x){
if(x=="K") y<-1000 else
if(x=="M") y<-1000000 else
if(x=="B") y <-1000000000 else y<-1
return(y)
}
#Let's transform the datatypes into the correct ones
data$PROPDMGEXP<-as.character(data$PROPDMGEXP)
data$CROPDMGEXP<-as.character(data$CROPDMGEXP)
data$PROPDMG<-as.numeric(data$PROPDMG)
data$CROPDMG<-as.numeric(data$CROPDMG)
#And let's then create two extra columns where we have now numeric values as needed
data$PROPDMG_NUMEXP<-lapply(data$PROPDMGEXP, transform)
data$CROPDMG_NUMEXP<-lapply(data$CROPDMGEXP, transform)
data$PROPDMG_NUMEXP<-as.numeric(data$PROPDMG_NUMEXP)
data$CROPDMG_NUMEXP<-as.numeric(data$CROPDMG_NUMEXP)
#Let's then calculate the total damage by summing the property damage and crop damage:
data$DMG_DOL<-(data$PROPDMG*data$PROPDMG_NUMEXP)+(data$CROPDMG*data$CROPDMG_NUMEXP)
#And now we group and sort the values to see what causes the most economic damage
dmgData<-aggregate(data$DMG_DOL, by=list(EVTYPE=data$EVTYPE), sum)
economicDamageGraphData<-head(arrange(dmgData, desc(x)))
economicDamageGraphData
## EVTYPE x
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57340614060
## 4 STORM SURGE 43323541000
## 5 HAIL 18752904943
## 6 FLASH FLOOD 17562129167
A bar chart summarizes our findings
ggplot(economicDamageGraphData, aes(EVTYPE, x)) + ggtitle("Total economic consequences: 1950-2011") + geom_bar(stat="identity") +
xlab("event type") + ylab("total economic damage ($)")