Exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database

Author: Praveen K Sharma
Date: Jan 25, 2015

Synopsis

U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm maintains the information about major storms and weather events in US. Such events cause fatalities, injuries along with economic loss. As part of this assigment, we will dive into the dataset published by NOAA and study the effect of weather events on population health and economic loss.

Data Processing

(Describes how the data was loaded into R and processed for analysis)

## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

One time download of the data

setwd("C:/apk/Education/Coursera/Rwork/RepData_PeerAssessment2")
if (!file.exists("data")) {
  dir.create("data")
}

fileurl="http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists("./data/StormData.csv.bz2")) {
  download.file(fileurl,"./data/StormData.csv.bz2", mode="wb")
}

if (!file.exists("./data/StormData.csv")) {
    bunzip2("./data/StormData.csv.bz2", exdir = "./data", overwrite=FALSE, remove=FALSE)
}

Read the file and cache it for future use

stormdat <- read.csv("./data/StormData.csv")

Tidy for furhter analysis and draw the histogram to get some idea about the distribution

stormdat <- mutate(stormdat, BGN_YEAR = as.numeric(format(as.Date(stormdat$BGN_DATE, format = "%m/%d/%Y %H:%M:%S"), "%Y")))

hist(stormdat$BGN_YEAR, main="Number observations by year", xlab="Calendar Year", ylab="Number of Observations", breaks=60)

As we can see from the histogram above, there are not enough data points in the earlier years eventhough the data collections started in 1950. Hence, I can arrive at a reasonable conclusion if I consider the data from 1975 onwards.

In order to find the effect of events on population health, I will consider FATALITIES and INJURIES variables.

##datset <- stormdat[stormdat$BGN_YEAR >= 1975, ]
stormdat <- mutate(stormdat, POP_HEALTH = (stormdat$FATALITIES + stormdat$INJURIES))
by_evtype <- group_by(stormdat, EVTYPE)
ResultPopHealth <- summarise(by_evtype, totalhealth=sum(POP_HEALTH))

ResultPopHealth <-arrange(ResultPopHealth, desc(totalhealth))

In order to find the effect of events on economy, I will consider PROPDMG and CROPDMG variables adjusted according to PROPDMGEXP and CROPDMGEXP respectively.

# Take Property Exponents into account
stormdat$PROPEXP[stormdat$PROPDMGEXP == "B"] <- 1e+09
stormdat$PROPEXP[stormdat$PROPDMGEXP == "8"] <- 1e+08
stormdat$PROPEXP[stormdat$PROPDMGEXP == "7"] <- 1e+07
stormdat$PROPEXP[stormdat$PROPDMGEXP == "M"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "m"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "6"] <- 1e+06
stormdat$PROPEXP[stormdat$PROPDMGEXP == "5"] <- 1e+05
stormdat$PROPEXP[stormdat$PROPDMGEXP == "4"] <- 10000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "2"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "3"] <- 1000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "K"] <- 1000
stormdat$PROPEXP[stormdat$PROPDMGEXP == "h"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "H"] <- 100
stormdat$PROPEXP[stormdat$PROPDMGEXP == "1"] <- 10
stormdat$PROPEXP[stormdat$PROPDMGEXP == ""] <- 1
stormdat$PROPEXP[stormdat$PROPDMGEXP == "0"] <- 1
stormdat$PROPEXP[stormdat$PROPDMGEXP == "+"] <- 0
stormdat$PROPEXP[stormdat$PROPDMGEXP == "-"] <- 0
stormdat$PROPEXP[stormdat$PROPDMGEXP == "?"] <- 0

# Take Crop Exponents into account
stormdat$CROPEXP[stormdat$CROPDMGEXP == "B"] <- 1e+09
stormdat$CROPEXP[stormdat$CROPDMGEXP == "M"] <- 1e+06
stormdat$CROPEXP[stormdat$CROPDMGEXP == "m"] <- 1e+06
stormdat$CROPEXP[stormdat$CROPDMGEXP == "K"] <- 1000
stormdat$CROPEXP[stormdat$CROPDMGEXP == "k"] <- 1000
stormdat$CROPEXP[stormdat$CROPDMGEXP == "2"] <- 100
stormdat$CROPEXP[stormdat$CROPDMGEXP == "0"] <- 1
stormdat$CROPEXP[stormdat$CROPDMGEXP == ""] <- 1
stormdat$CROPEXP[stormdat$CROPDMGEXP == "?"] <- 0

stormdat <- mutate(stormdat, ECONLOSS = (stormdat$PROPDMG*stormdat$PROPEXP + stormdat$CROPDMG*stormdat$CROPEXP))
by_evtype <- group_by(stormdat, EVTYPE)
ResultEcon <- summarise(by_evtype, totaleconloss=sum(ECONLOSS/1000000))

ResultEcon <-arrange(ResultEcon, desc(totaleconloss))

Results

Q1: Weather events that affect population health most adversely.

PopHealthTop10 <- head(ResultPopHealth, n=10)
PopHealthBot10 <-tail(ResultPopHealth, n=10)

par(mfrow = c(2, 1))

barplot(PopHealthTop10$totalhealth, names.arg=PopHealthTop10$EVTYPE, main="Top 10 events affecting Population Health", cex.names = 0.80, las=2)

barplot(PopHealthBot10$totalhealth, names.arg=PopHealthBot10$EVTYPE, main="Bottom 10 events affecting Poplulation Health", cex.names = 0.80, las=2)

As we can see from the above graps, Tornados have been the leading cause of Health issues. Infact, Tornados are responsible for more harm to population health then the next 9 events combined in the top 10 list.

On the other end of spectrum, it is interesting to note (from the results show above) that Wind and Winter events have virtually no harmful effect on the population health.

Q2: Weather Events that cause most Economic Loss

The following plot displays the top ten events that cause the most economic loss in Million USD.

ResultEconTop10 <- head(ResultEcon, n=10)
barplot(ResultEconTop10$totaleconloss, names.arg=ResultEconTop10$EVTYPE, main="Top 10 events for economic loss", ylab="Total Loss in USD(million)", cex.names = 0.80, las=2)

As we can see the floods cause the most economic damage.