Peer Graded Assignment: Course Project 2

Synopsis

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The goal of the assignment to answer the following questions.

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

This section describes how the raw data was loaded and transformed into analysis data.

The data used in the analysis was downloaded from the below link:

storm data

The data was downloaded and the file read from the respective directory.

  1. The data set was reduced to only include the columns of interest (i.e. event type, fatalities, injuries and property damage)
  2. Only complete cases were used
  3. Factor variables where transformed to numeric variables excluding event type.This was done as the summation of the total injuries, property and crop damage and fatalities per event type needed to be computed.
  4. The storm data relating damages was then prepared according to documentation provided for by National Weather Service
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.7.1 (2016-02-15) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.20.0 (2016-02-17) successfully loaded. See ?R.oo for help.
## 
## Attaching package: 'R.oo'
## The following objects are masked from 'package:methods':
## 
##     getClasses, getMethods
## The following objects are masked from 'package:base':
## 
##     attach, detach, gc, load, save
## R.utils v2.4.0 (2016-09-13) successfully loaded. See ?R.utils for help.
## 
## Attaching package: 'R.utils'
## The following object is masked from 'package:utils':
## 
##     timestamp
## The following objects are masked from 'package:base':
## 
##     cat, commandArgs, getOption, inherits, isOpen, parse, warnings
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
setwd("~/Data Science Coursera/Reporducible research/Peer Assignment 2")
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "stormData.csv.bz2")
bunzip2("stormData.csv.bz2", overwrite=T, remove=F)
df <- read.csv("stormData.csv", sep = ",")

z <- df[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
z <- z[complete.cases(z),]

#sort the property exponent
z$PROPEXP[z$PROPDMGEXP == "K"] <- 1000
z$PROPEXP[z$PROPDMGEXP == "M"] <- 1e+06
z$PROPEXP[z$PROPDMGEXP == ""] <- 1
z$PROPEXP[z$PROPDMGEXP == "B"] <- 1e+09
z$PROPEXP[z$PROPDMGEXP == "m"] <- 1e+06
z$PROPEXP[z$PROPDMGEXP == "0"] <- 1
z$PROPEXP[z$PROPDMGEXP == "5"] <- 1e+05
z$PROPEXP[z$PROPDMGEXP == "6"] <- 1e+06
z$PROPEXP[z$PROPDMGEXP == "4"] <- 10000
z$PROPEXP[z$PROPDMGEXP == "2"] <- 100
z$PROPEXP[z$PROPDMGEXP == "3"] <- 1000
z$PROPEXP[z$PROPDMGEXP == "h"] <- 100
z$PROPEXP[z$PROPDMGEXP == "7"] <- 1e+07
z$PROPEXP[z$PROPDMGEXP == "H"] <- 100
z$PROPEXP[z$PROPDMGEXP == "1"] <- 10
z$PROPEXP[z$PROPDMGEXP == "8"] <- 1e+08
#assign zero to invalid exponent data
z$PROPEXP[z$PROPDMGEXP == "+"] <- 0
z$PROPEXP[z$PROPDMGEXP == "-"] <- 0
z$PROPEXP[z$PROPDMGEXP == "?"] <- 0
#calculate the property damage value
z$PROPDMG <- as.numeric(z$PROPDMG)
z$PROPDMGVAL <- z$PROPDMG * z$PROPEXP

#Transform the crop damage data

# Sort the property exponent data
z$CROPEXP[z$CROPDMGEXP == "M"] <- 1e+06
z$CROPEXP[z$CROPDMGEXP == "K"] <- 1000
z$CROPEXP[z$CROPDMGEXP == "m"] <- 1e+06
z$CROPEXP[z$CROPDMGEXP == "B"] <- 1e+09
z$CROPEXP[z$CROPDMGEXP == "0"] <- 1
z$CROPEXP[z$CROPDMGEXP == "k"] <- 1000
z$CROPEXP[z$CROPDMGEXP == "2"] <- 100
z$CROPEXP[z$CROPDMGEXP == ""] <- 1
# assign zero to invalid exponent data
z$CROPEXP[z$CROPDMGEXP == "?"] <- 0
# compute the crop damage value
z$CROPDMG <-as.numeric(z$CROPDMG)
z$CROPDMGVAL <- z$CROPDMG * z$CROPEXP

z$FATALITIES <- as.numeric(z$FATALITIES)
z$INJURIES <- as.numeric(z$INJURIES)

Data Analysis

Asssessing most harmful event with respect to population health

Using the analysis data we aim to answer the following question:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

In order to achieve this we need to compute the total fatalities caused by each event type as well as the injuries. Once the total for each event is computed the event with the highest sum is the one most harmful with respect to population health.

#obtain sum of max fatalities
t <- z %>% group_by(EVTYPE) %>% summarise(sum(FATALITIES))
## Warning: failed to assign NativeSymbolInfo for env since env is already
## defined in the 'lazyeval' namespace
colnames(t) <- c("EVTYPE","SUM_FATALITIES")
t <- arrange(t, desc(SUM_FATALITIES), EVTYPE)
t[which.max(t$SUM_FATALITIES),]
## # A tibble: 1 x 2
##    EVTYPE SUM_FATALITIES
##    <fctr>          <dbl>
## 1 TORNADO           5633
g <- t[1:10,]

 
#obtain sum of max injuries
m <- z %>% group_by(EVTYPE) %>% summarise(sum(INJURIES))
colnames(m) <- c("EVTYPE","INJURIES")
m <- arrange(m, desc(INJURIES), EVTYPE)
m[which.max(m$INJURIES),]
## # A tibble: 1 x 2
##    EVTYPE INJURIES
##    <fctr>    <dbl>
## 1 TORNADO    91346
# plot top 10 event type
p <- m[1:10,]

#plot graphs
par(mfrow=c(1,2))
# plot top 10 event type
barplot(g$SUM_FATALITIES, names.arg = g$EVTYPE,  cex.axis=1,  cex.names=0.3,main = "Top 10 Fatalities", ylab = "Sum fatalities", col = "red", las=3)
barplot(p$INJURIES, names.arg = p$EVTYPE,  cex.axis=1,  cex.names=0.5,main = "Top 10 Injuries", ylab = "Sum Injuries", col = "red",las = 3)

From the figures above it is evident the Tornado has the highest value relative to the number of fatalities as well as number of injuries

Asssessing most harmful event with respect economic consequences

Using the analysis data we aim to answer the following question:

  1. Across the United States, which types of events have the greatest economic consequences?

In order to achieve this we need to compute the totals for property damage expediture caused by each event type. Once the total for each event is computed the event with the highest sum is the one most harmful with respect to economic consequence.

#Economic consequence
#property damage
n <- z %>% group_by(EVTYPE) %>% summarise(sum(PROPDMGVAL))
colnames(n) <- c("EVTYPE","PROPDMGVAL")
n <- arrange(n, desc(PROPDMGVAL), EVTYPE)
n[which.max(n$PROPDMGVAL),]
## # A tibble: 1 x 2
##   EVTYPE   PROPDMGVAL
##   <fctr>        <dbl>
## 1  FLOOD 144657709807
# plot top 10 event type
q <- n[1:10,]

#crop damage
h <- z %>% group_by(EVTYPE) %>% summarise(sum(CROPDMGVAL))
colnames(n) <- c("EVTYPE","CROPDMGVA")
h <- arrange(n, desc(CROPDMGVA), EVTYPE)
h[which.max(h$CROPDMGVA),]
## # A tibble: 1 x 2
##   EVTYPE    CROPDMGVA
##   <fctr>        <dbl>
## 1  FLOOD 144657709807
# plot top 10 event type
l <- n[1:10,]
# bar plot
par(mfrow=c(1,2))
barplot(q$PROPDMGVAL, names.arg = q$EVTYPE,  cex.axis=1,  cex.names=0.5,main = "Top 10 Property Damage", ylab = "Sum Property Damage", col = "blue",las = 3)
barplot(l$CROPDMGVA, names.arg = l$EVTYPE,  cex.axis=1,  cex.names=0.5,main = "Top 10 Crop Damage", ylab = "Sum Crop Damage", col = "blue",las = 3)

From the figure it is evident Floods has the highest value of propertuy and crop damage

Discussion of Results

From the analysis,the tornado is the most harmful storm type with respect to population health as it has resulted in the highest number of fatalities and injuries.

Floods are the most harmful storm type with respect to property and crop damage.