Synopsis

The aim of this assignment is to analyze the NOAA Storm database with respect to severe weather events across United States. The analysis is split into 2 parts.
Part 1 explores which type of weather events have the greater impact on the population health based on the columns named “EVTYPE”,Injuries" and “Fatalities” in the Storm database. Part 2 explores which type of weather events cause greater damage to the property and crops based on the columns named EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG and CROPDMGEXP in the Storm database.

Loading and processing data

This section describes how data was loaded into R and processed for analysis.The data from the NOAA Storm database is used for analysis.

##Loading libraries used in processing of data
suppressMessages(library(R.utils))
suppressMessages(library(dplyr))
library(car)
##Downloading data and unzipping 
#download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile="stormdata.csv.bz2")
#bunzip2("stormdata.csv.bz2", overwrite=T, remove=F)

Reading raw data

data<-read.csv("C:\\Users\\shenoy\\Desktop\\John Hopkins Data Science Course\\5 Reproducible Research\\repdata-data-StormData.csv\\stormdata.csv")

Processing Data for analysis

Processing data for calculating the Total Injuries and Fatalities for the top ten weather event type

##Calculating top ten total injuries

totalinjury<-tapply(data$INJURIES,data$EVTYPE,sum)
injury<-sort.int(totalinjury,decreasing=TRUE)
topinjury<-injury[1:10]

##Calculating top ten total fatalities
totalfatality<-tapply(data$FATALITIES,data$EVTYPE,sum)
fatality<-sort.int(totalfatality,decreasing=TRUE)
topfatality<-fatality[1:10]

Processing data for calculating the Total cost of damage for both Property and crops.A clean subset of the original datatsetis created, by removing rows with blanks and unknown values.Then replacing appropriate values in the columns named “PROPDMGEXP” and “CROPDMGEXP”.

## cleaning data for calculating economic damage
## Creating a cleaned subset, by selecting the required columns

ecodata<-select(data,EVTYPE,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

## removing rows with blanks and unknown values

propdata<-ecodata[-which(ecodata$PROPDMGEXP %in% c("","-","+","?")),1:3]
cropdata<-ecodata[-which(ecodata$CROPDMGEXP %in% c("","-","+","?")),c(1,4,5)]


propdata$PROPDMGEXP<-recode(propdata$PROPDMGEXP,"'h'=100;'H'=100;'k'=1000;'K'=1000;'m'=1000000;'M'=1000000;'b'=1000000000;'B'=1000000000 ")
cropdata$CROPDMGEXP<-recode(cropdata$CROPDMGEXP,"'h'=100;'H'=100;'k'=1000;'K'=1000;'m'=1000000;'M'=1000000;'b'=1000000000;'B'=1000000000 ")

Calculating cost of Property and Crop damage of top ten events

##Calculating cost of Property Damage & Crop Damage

propdata<-mutate(propdata,Propcost=PROPDMG*10^as.numeric(PROPDMGEXP))
cropdata<-mutate(cropdata,Cropcost=CROPDMG*10^as.numeric(CROPDMGEXP))

##Total Cost of damage

totalprop<-tapply(propdata$Propcost,propdata$EVTYPE,sum)
p<-sort.int(totalprop,decreasing=TRUE)
topprop<-head(p,10)
totalcrop<-tapply(cropdata$Cropcost,cropdata$EVTYPE,sum)
c<-sort.int(totalcrop,decreasing=TRUE)
topcrop<-head(c,10)

Results

Plots for top ten Event Type Vs Population Health

inj <- barplot(topinjury,axisnames = FALSE ,ylab= "Total Injuries",main="Effect of Event type based on Injuries",col="#AFB927")
text(inj, par("usr")[3], labels = names(topinjury), srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=.6)

fat <- barplot(topfatality,axisnames = FALSE ,ylab= "Total Fatalities",main="Effect of Event type based on Fatalities",col="#85A8F9")
text(fat, par("usr")[3], labels = names(topinjury), srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=.6)

Inference

From the above plots it is seen that “TORNADO” causes maximum Injuries and Fatalities.

Plots for Event Type Vs Economic Damage

##Plot for Property and Crop damage
par(mfrow=c(2,1))
propplot <- barplot(topprop,axisnames = FALSE ,ylab= "Total Damage Cost",main="Total cost of Property damage",col="#F7265E")
text(propplot, par("usr")[3], labels = c(names(topprop)), srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=.6)

cropplot <- barplot(topcrop,axisnames = FALSE ,ylab= "Total Damage Cost",main="Total cost of Crop damage",col="#8249E4")
text(cropplot, par("usr")[3], labels = c(names(topcrop)), srt = 45, adj = c(1.1,1.1), xpd = TRUE, cex=.6)

Inference

From the above plots , its is seen that the FLASH FLOODS cause the maximum Property Damage and HAIL causes the maximum Crop Damage.