Author: Praveen K Sharma
Date: Jan 25, 2015
U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm maintains the information about major storms and weather events in US. Such events cause fatalities, injuries along with economic loss. As part of this assigment, we will dive into the dataset published by NOAA and study the effect of weather events on population health and economic loss.
(Describes how the data was loaded into R and processed for analysis)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
One time download of the data
setwd("C:/apk/Education/Coursera/Rwork/RepData_PeerAssessment2")
if (!file.exists("data")) {
dir.create("data")
}
fileurl="http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./data/StormData.csv.bz2")) {
download.file(fileurl,"./data/StormData.csv.bz2", mode="wb")
}
if (!file.exists("./data/StormData.csv")) {
bunzip2("./data/StormData.csv.bz2", exdir = "./data", overwrite=FALSE, remove=FALSE)
}
Read the file and cache it for future use
stormdat <- read.csv("./data/StormData.csv")
Tidy for furhter analysis and draw the histogram to get some idea about the distribution
stormdat <- mutate(stormdat, BGN_YEAR = as.numeric(format(as.Date(stormdat$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y")))
hist(stormdat$BGN_YEAR, main="Number observations by year", xlab="Calendar Year", ylab="Number of Observations", breaks=60)
As we can see from the histogram above, there are not enough data points in the earlier years eventhough the data collections started in 1950. Hence, I can arrive at a reasonable conclusion if I consider the data from 1975 onwards.
In order to find the effect of events on population health, I will consider FATALITIES and INJURIES variables.
##datset <- stormdat[stormdat$BGN_YEAR >= 1975, ]
stormdat <- mutate(stormdat, POP_HEALTH = (stormdat$FATALITIES + stormdat$INJURIES))
by_evtype <- group_by(stormdat, EVTYPE)
ResultPopHealth <- summarise(by_evtype, totalhealth=sum(POP_HEALTH))
ResultPopHealth <-arrange(ResultPopHealth, desc(totalhealth))
In order to find the effect of events on economy, I will consider PROPDMG and CROPDMG variables adjusted according to PROPDMGEXP and CROPDMGEXP respectively.
# Take Property Exponents into account
stormdat$PROPEXP[stormdat$PROPDMGEXP == "B"] <- 1e+09
stormdat$PROPEXP[stormdat$PROPDMGEXP == "8"] <- 1e+08
stormdat$PROPEXP[stormdat$PROPDMGEXP == "7"] <- 1e+07
stormdat$PROPEXP[stormdat$PROPDMGEXP == "M"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "m"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "6"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "5"] <- 1e+05
stormdat$PROPEXP[stormdat$PROPDMGEXP == "4"] <- 10000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "2"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "3"] <- 1000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "K"] <- 1000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "h"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "H"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "1"] <- 10
stormdat$PROPEXP[stormdat$PROPDMGEXP == ""] <- 1
stormdat$PROPEXP[stormdat$PROPDMGEXP == "0"] <- 1
stormdat$PROPEXP[stormdat$PROPDMGEXP == "+"] <- 0
stormdat$PROPEXP[stormdat$PROPDMGEXP == "-"] <- 0
stormdat$PROPEXP[stormdat$PROPDMGEXP == "?"] <- 0
# Take Crop Exponents into account
stormdat$CROPEXP[stormdat$CROPDMGEXP == "B"] <- 1e+09
stormdat$CROPEXP[stormdat$CROPDMGEXP == "M"] <- 1e+06
stormdat$CROPEXP[stormdat$CROPDMGEXP == "m"] <- 1e+06
stormdat$CROPEXP[stormdat$CROPDMGEXP == "K"] <- 1000
stormdat$CROPEXP[stormdat$CROPDMGEXP == "k"] <- 1000
stormdat$CROPEXP[stormdat$CROPDMGEXP == "2"] <- 100
stormdat$CROPEXP[stormdat$CROPDMGEXP == "0"] <- 1
stormdat$CROPEXP[stormdat$CROPDMGEXP == ""] <- 1
stormdat$CROPEXP[stormdat$CROPDMGEXP == "?"] <- 0
stormdat <- mutate(stormdat, ECONLOSS = (stormdat$PROPDMG*stormdat$PROPEXP + stormdat$CROPDMG*stormdat$CROPEXP))
by_evtype <- group_by(stormdat, EVTYPE)
ResultEcon <- summarise(by_evtype, totaleconloss=sum(ECONLOSS/1000000))
ResultEcon <-arrange(ResultEcon, desc(totaleconloss))
PopHealthTop10 <- head(ResultPopHealth, n=10)
PopHealthBot10 <-tail(ResultPopHealth, n=10)
par(mfrow = c(2, 1))
barplot(PopHealthTop10$totalhealth, names.arg=PopHealthTop10$EVTYPE, main="Top 10 events affecting Population Health", cex.names = 0.80, las=2)
barplot(PopHealthBot10$totalhealth, names.arg=PopHealthBot10$EVTYPE, main="Bottom 10 events affecting Poplulation Health", cex.names = 0.80, las=2)
As we can see from the above graps, Tornados have been the leading cause of Health issues. Infact, Tornados are responsible for more harm to population health then the next 9 events combined in the top 10 list.
On the other end of spectrum, it is interesting to note (from the results show above) that Wind and Winter events have virtually no harmful effect on the population health.
The following plot displays the top ten events that cause the most economic loss in Million USD.
ResultEconTop10 <- head(ResultEcon, n=10)
barplot(ResultEconTop10$totaleconloss, names.arg=ResultEconTop10$EVTYPE, main="Top 10 events for economic loss", ylab="Total Loss in USD(million)", cex.names = 0.80, las=2)
As we can see the floods cause the most economic damage.