Synopsis

This document aims to enhance decision makers with analyzed information on the impacts of different natural disasters. In particular, this document aims to answer two questions: 1. Which types of events are most harmful with respect to population health? 2. Which types of events have the greatest economic consequences?

The analysis is based on the NOAA Storm Database, which contains detailed information on fatalities, injuries and economic damage of various events between 1950-2011. The raw data can be accessed at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. The document at hand contains two sections: 1. Data processing, i.e. How the data has been accessed and transformed to be suitable for analysis 2. Results, i.e. How we can come to our conclusions

Data Processing

Install packages, if needed

The first block of code just makes sure, you have the downloader package installed, and if not it proceeds to install it. Note that the reusults of this code chunk are deliberately hidden, as it is just so boring and takes half of the document.

list.of.packages <- c("downloader", "ggplot2", "timeDate", "gridExtra", "iotools", "dplyr")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, repos="http://cran.rstudio.com/")
library(downloader)
library(ggplot2)
library(timeDate)
library(gridExtra)
library(iotools)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Download the data

In this section we will be reading downloading the data from the internet address (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2).

destfile="./Sdata.csv.bz2"
if (!file.exists(destfile)){
tempdata<-download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile, method="curl")
}

Load the data

data <- read.csv("Sdata.csv.bz2")

Transform the data

data<-as.data.frame(data)
head(data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Results

Which types of events are most harmful with respect to population health?

This question breaks into several sub-questions: 1) What type of events cause the most casualities 2) What type of events cause the most injuries 3) What types of events happen most often?

From below, it is pretty clear that tornados are the most harmful events with respect to population health, as they cause fatalities and injuries the most. (And they happens pretty often as well.) The second most dangerous event is excessive heat, as it also causes fatalities and injuries. (And it happen pretty often too.)

#1) What type of events cause fatalities the most
fatalitiesData<-aggregate(data$FATALITIES, by=list(EVTYPE=data$EVTYPE), sum)
fatalitiesGraphData<-head(arrange(fatalitiesData, desc(x)))
fatalitiesGraphData
##           EVTYPE    x
## 1        TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3    FLASH FLOOD  978
## 4           HEAT  937
## 5      LIGHTNING  816
## 6      TSTM WIND  504
#2) What type of events cause injuries the most
injuriesData<-aggregate(data$INJURIES, by=list(EVTYPE=data$EVTYPE), sum)
injuriesGraphData<-head(arrange(injuriesData, desc(x)))
injuriesGraphData
##           EVTYPE     x
## 1        TORNADO 91346
## 2      TSTM WIND  6957
## 3          FLOOD  6789
## 4 EXCESSIVE HEAT  6525
## 5      LIGHTNING  5230
## 6           HEAT  2100
#3) What types of events happen most often?
eventCount<-aggregate(data$EVTYPE, by=list(EVTYPE=data$EVTYPE), length)
head(arrange(eventCount, desc(x)))
##              EVTYPE      x
## 1              HAIL 288661
## 2         TSTM WIND 219940
## 3 THUNDERSTORM WIND  82563
## 4           TORNADO  60652
## 5       FLASH FLOOD  54277
## 6             FLOOD  25326

Let’s illustrate our findings with a double bar chart:

library(gridExtra)
fatalitiesGraph<-ggplot(fatalitiesGraphData, aes(EVTYPE, x)) + ggtitle("Total Fatalities: 1950-2011") + geom_bar(stat="identity") +
   xlab("event type") + ylab("total fatalities")
injuriesGraph<-ggplot(injuriesGraphData, aes(EVTYPE, x)) + ggtitle("Total Injuries: 1950-2011") + geom_bar(stat="identity") +
   xlab("event type") + ylab("total injuries")
grid.arrange(fatalitiesGraph, injuriesGraph, ncol=1, nrow=2)

Which types of events have the greatest economic consequences?

Historically, floods has caused damages raising as high as 150 billion US dollars and we can therefore conclude flood having had the greatest economic consequences. Slightly behind are trailing hurricanes/typoons, tornados, storm surges, hails, flash floods, droughts, hurricanes and river floods, which over history have each caused damage over 10 billion US Dollars.

To answer the questin, we must first answer two sub-questins: 1) What type of events cause the most property damage(PROPDMG) 2) What type of events cause the most crop damage (CROPDMG)

It would be pretty easy to extract that information from the database if they were not using symbolic indicators of magnitude. They use the following symbols: -K: Thousands (1,000) -M: Millions (1,000,000) -B: Billions (1,000,000,000)

So here we go:

#Let's first create a function, whicn will transform the damage scale indicator into numerical values
transform<-function(x){
                if(x=="K") y<-1000 else
                if(x=="M") y<-1000000 else
                if(x=="B") y <-1000000000 else y<-1
                return(y)
}

#Let's transform the datatypes into the correct ones
data$PROPDMGEXP<-as.character(data$PROPDMGEXP)
data$CROPDMGEXP<-as.character(data$CROPDMGEXP)
data$PROPDMG<-as.numeric(data$PROPDMG)
data$CROPDMG<-as.numeric(data$CROPDMG)

#And let's then create two extra columns where we have now numeric values as needed
data$PROPDMG_NUMEXP<-lapply(data$PROPDMGEXP, transform)
data$CROPDMG_NUMEXP<-lapply(data$CROPDMGEXP, transform)
data$PROPDMG_NUMEXP<-as.numeric(data$PROPDMG_NUMEXP)
data$CROPDMG_NUMEXP<-as.numeric(data$CROPDMG_NUMEXP)

#Let's then calculate the total damage by summing the property damage and crop damage:
data$DMG_DOL<-(data$PROPDMG*data$PROPDMG_NUMEXP)+(data$CROPDMG*data$CROPDMG_NUMEXP)

#And now we group and sort the values to see what causes the most economic damage
dmgData<-aggregate(data$DMG_DOL, by=list(EVTYPE=data$EVTYPE), sum)
economicDamageGraphData<-head(arrange(dmgData, desc(x)))
economicDamageGraphData
##              EVTYPE            x
## 1             FLOOD 150319678257
## 2 HURRICANE/TYPHOON  71913712800
## 3           TORNADO  57340614060
## 4       STORM SURGE  43323541000
## 5              HAIL  18752904943
## 6       FLASH FLOOD  17562129167

A bar chart summarizes our findings

ggplot(economicDamageGraphData, aes(EVTYPE, x)) + ggtitle("Total economic consequences: 1950-2011") + geom_bar(stat="identity") +
   xlab("event type") + ylab("total economic damage ($)")