NOAA Storm database analysis

Synopsis

This report provides a brief analysis on the US NOAA storm database focused on those events with greater impact on population health and economy. Data correspond to the 1950-2011 period and are available in this link

Data processing

The first step is to download the source file (it it is not available) and to store the rawdata in a dataframe

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

setwd("C:/Users/agustin.izquierdo/Documents/R/coursera/Reproducible Research/week4")
#wet locale to English
Sys.setlocale("LC_ALL","English")

## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

file<- "repdata_data_StormData.csv"
if (!file.exists(file)) 
{
  download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile=file)
}

rawdata<-read.csv(file, header=T, stringsAsFactors=F)
if(!file.exists(file))
{
unlink(file)
}

From the raw data we observe we only need some of the observations for the analysis: -EVTYPE: event type (type of storm) -FATALITIES: number of deceases -INJURIES: number of personal injuries -PROPDMG: property damages (base) -PROPDMGEXP: property damages (exponent) -CROPDMG: crop damage (base) -CROPDMGEXP: crop damage (exponent)

Therefore we create the base dataframe with the columns we need:

data<-rawdata[,c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Let’s make a sanity check to verify NAs

# check for missing values
sum(is.na(data$FATALITIES),is.na(data$INJURIES),is.na(data$PROPDMG),is.na(data$CROPDMG))

## [1] 0

So there are no NAs.

If we take a look at the exponents of CROPDMG and PROPDMG we see we have to make a conversion to get the real data according to the documentation found in this link

unique(data$PROPDMGEXP)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

unique(data$CROPDMGEXP)

## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Therefore, we switch k to 10^3, M to 10^6, B to 10*^9 and H to 10^2. In case there is a number in the exponential representation.
We make the direct exponential transformation so the PROPDMG and CROPDMG vaues in the dataset contain the numeric data we need for te analysis

for (i in 1:length(data$PROPDMGEXP))
{
  if (data$PROPDMGEXP[i]=='k' | data$PROPDMGEXP[i]=="K")
  {data$PROPDMG[i]=data$PROPDMG[i]*10^3}
  if (data$PROPDMGEXP[i]=='m' |data$PROPDMGEXP[i]=='M')
  {data$PROPDMG[i]=data$PROPDMG[i]*10^6}  
  if (data$PROPDMGEXP[i]=='B')
  {data$PROPDMG[i]=data$PROPDMG[i]*10^9}
  if (data$PROPDMGEXP[i]=='h' | data$PROPDMGEXP[i]=='H')
  {data$PROPDMG[i]=data$PROPDMG[i]*10^2}
  
  if (is.numeric(data$PROPDMGEXP[i]))
  {data$PROPDMG[i]=data$PROPDMG[i]*10^data$PROPDMGEXP[i]}
}
for(i in 1:length(data$CROPDMGEXP))
{ 
  if (data$CROPDMGEXP[i]=='m' |data$CROPDMGEXP[i]=='M')
  {data$CROPDMG[i]=data$CROPDMG[i]*10^6}
  if (data$CROPDMGEXP[i]=='k' | data$CROPDMGEXP[i]=="K")
  {data$CROPDMG[i]=data$CROPDMG[i]*10^3}
  if (data$CROPDMGEXP[i]=='B')
  {data$CROPDMG[i]=data$CROPDMG[i]*10^9}
  
  if (is.numeric(data$CROPDMGEXP[i]))
  {data$CROPDMG[i]=data$CROPDMG[i]*10^data$CROPDMGEXP[i]}
}

Results

We’ll focus our analysis only in the top 10 events (most dangerous or economically hazardous)

Storm effect on population health

Let’s first aggregate fatalities and injuries by event type and select ony the top 10 events, sorting them descencing

health_fatalities<-aggregate(FATALITIES ~ EVTYPE,data,sum)
top_health_effect_fatalities<-arrange(health_fatalities, desc(health_fatalities$FATALITIES))[1:10,]
top_health_effect_fatalities <- top_health_effect_fatalities[order(top_health_effect_fatalities$FATALITIES, decreasing = F), ]

health_injuries<-aggregate(INJURIES ~ EVTYPE,data,sum)
top_health_effect_injuries<-arrange(health_injuries, desc(health_injuries$INJURIES))[1:10,]
top_health_effect_injuries <- top_health_effect_injuries[order(top_health_effect_injuries$INJURIES, decreasing = F), ]

And plot the result:

From this what we get is that Tornados are, by far, the most harmful event. Effectively, we get 5633 fatalities and 91346 injuries caused by Tornados in the period

top_health_effect_fatalities[10,]

##    EVTYPE FATALITIES
## 1 TORNADO       5633

top_health_effect_injuries[10,]

##    EVTYPE INJURIES
## 1 TORNADO    91346

Storm effect on economics

Following the same rationale, we agreggate crop and property damages by event type and select ony the top 10 events.

Let’s first aggregate fatalities and injuries by event type and select ony the top 10 events, sorting them descencing

eco_crop<-aggregate(CROPDMG~EVTYPE, data, sum)
top_eco_effect_crop<-arrange(eco_crop, desc(eco_crop$CROPDMG))[1:10,]
top_eco_effect_crop <- top_eco_effect_crop[order(top_eco_effect_crop$CROPDMG, decreasing = F), ]

eco_prop<-aggregate(PROPDMG  ~ EVTYPE, data, sum)
top_eco_effect_prop<-arrange(eco_prop, desc(eco_prop$PROPDMG))[1:10,]
top_eco_effect_prop <- top_eco_effect_prop[order(top_eco_effect_prop$PROPDMG, decreasing = F), ]

And plot the result:

From this what we get is that Floods are the event with most economic impact in overall (150,319,678,257 USD in the period). However, Drouhgts have a bigger impact than floods in crop damage, but the biggest amount for ecomomic impact is due to floods.

Summary

From the data analyzed we can conclude thatin the period analyzed across the US: - Tornados are the most harmful storm event with 96979 fatalities and injuries - Floods are the storm event with biggest economic impact 150,319,678,257 USD