Analysis

R libraries

First we load the libraries that we are going to use through the analysis

library(proto) # Required dependence
library(bitops) # Required dependence
library(RCurl) # Get file by HTTP
library(ggplot2) # For visualizations
library(plyr) # Data manipulation
library(gsubfn) # Find and sub strings - cleaning data

## Loading required namespace: tcltk

First of all you can click in order to get the following files:

Before starting remember to set your working directory with the next command:

setwd(“/Path/Where/You/Execute/Code”)

Now we download those files to our workspace:

destfile="StormData.csv.bz2"
fileURL="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists(destfile)){download.file(url=fileURL, destfile=destfile, method='curl')}

destfile="01016005curr.pdf"
fileURL="https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf"
if(!file.exists(destfile)){download.file(url=fileURL,destfile=destfile,method='curl')}

destfile="FAQ_Page.pdf"
fileURL="https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf"
if(!file.exists(destfile)){download.file(url=fileURL,destfile=destfile,method='curl')}

Data processing and cleaning

First, load the data into the workspace:

events <- read.csv2(bzfile("StormData.csv.bz2"), sep = ",", na.strings = "NA")

Now we have to clean it as right now is hard to put everything together. For that purpose what we do is two main actions:

Remove spaces at the beginning and end of the fields
Put records in lowerspace

trim <- function(x) {
    gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
}

events <- transform(events, EVTYPE = trim(EVTYPE) )
events <- transform(events, EVTYPE = gsubfn("\\B.", tolower, EVTYPE, perl = TRUE ) )

The next step is to put the data in a summary format that is easily accessible to show results. Therefore first we calculate the total number of injuries and fatalities per type of events:

injuries_summary <- ddply(events, .(EVTYPE), summarise, tot_fatalities = sum(as.numeric(FATALITIES)), tot_injuries = sum(as.numeric(INJURIES)))
injuries_summary <- transform(injuries_summary, total = tot_fatalities + tot_injuries)

And then the total damages on a economic level per type of event:

economic_summary <- ddply(events, .(EVTYPE), summarise, tot_properties_dmg = sum(as.numeric(PROPDMG)), tot_crop_dmg = sum(as.numeric(CROPDMG)))
economic_summary <- transform(economic_summary, total = tot_properties_dmg + tot_crop_dmg )

Now we are ready to jump to results

Results

For results we focus on analyzing the impact of these events in two main issues:

The impact on public
The economic impact

Public Health Impact

The top 10 most harmful events are the next:

most_harmful <- injuries_summary[sort(injuries_summary$total, decreasing = TRUE, index.return=TRUE)$ix,][1:10,]
most_harmful

##                 EVTYPE tot_fatalities tot_injuries  total
## 767            Tornado          79987       586102 666089
## 213               Hail         288719       301257 589976
## 788          Tstm Wind         221439       334853 556292
## 694  Thunderstorm Wind          82944       103332 186276
## 134        Flash Flood          58316        70707 129023
## 421          Lightning          17273        96160 113433
## 150              Flood          27283        34159  61442
## 322          High Wind          21123        38413  59536
## 720 Thunderstorm Winds          21021        34499  55520
## 276         Heavy Snow          16196        24309  40505

Lets do a chart to represent the most harmful events:

p_health <- ggplot(data = most_harmful, aes(x = EVTYPE)) + geom_point(aes(y=total/1000, color="Total"), size = 5) + geom_point(aes(y=tot_injuries/1000, color="Injuries"), size = 5, shape = 24) + geom_point(aes(y=tot_fatalities/1000, color="Fatalities"), size = 5, shape=8) + ylab("Thousands of People") + xlab("Event Type") + theme_bw(base_family = "Times")+ ggtitle("Top 10 most harmful events for health")
plot(p_health)

Economic Impact

The 10 event with more economic impact

most_economic <- economic_summary[sort(economic_summary$total, decreasing = TRUE, index.return=TRUE)$ix,][1:10,]
most_economic

##                 EVTYPE tot_properties_dmg tot_crop_dmg    total
## 788          Tstm Wind           30962589       831227 31793816
## 694  Thunderstorm Wind           21693662       248867 21942529
## 767            Tornado           20043516       318943 20362459
## 213               Hail           11965969      2135137 14101106
## 134        Flash Flood           11759910       456151 12216061
## 720 Thunderstorm Winds            7928514       181908  8110422
## 150              Flood            5727502       359155  6086657
## 421          Lightning            5960289        33401  5993690
## 322          High Wind            3045928        58634  3104562
## 612        Strong Wind            1400798        16728  1417526

Lets represent those events in a graph

p_economy <- ggplot(data = most_economic, aes(x = EVTYPE)) + geom_point(aes(y=total/10^6, color="Total Damages"), size = 5) + geom_point(aes(y=tot_properties_dmg/10^6, color="Properties Damages"), size = 5, shape = 24) + geom_point(aes(y=tot_crop_dmg/10^6, color="Crop Damages"), size = 8, shape=4) + ylab("Event Damages [Millions of $]") + xlab("Event Type") + theme_bw(base_family = "Times") + ggtitle("The 10 events with more economic impact")
plot(p_economy)

Conclusions

We found that the Tstm Wind, the Tunderstorm Wind, Tornados and Hail are the top 5 most harmful events for people but also the ones with more economic impact. Although they do not occupy the exact same position in the ranking for both criterias, it is clear that there is a correlation between the economical and people damage due to these events. Our finding suggest that we should focus our efforts towards improving the response when these phenomenom occur.

RepData_PeerAssessment2

Jose A. Ruiperez Valiente

23/10/2015

Synopsis