Synopsis:

The purpose of this document is to do explore on the National Weather Service (NOAA) Storm Database and answer some basic questions about severe weather events.

Libraries:

We need a few libraries for the data exploration:

library(knitr)
## Warning: package 'knitr' was built under R version 3.1.2
library(plyr)
library(ggplot2)
library(bitops)
library(RCurl)
## Warning: package 'RCurl' was built under R version 3.1.1

Data Processing:

Assuming you have downloaded the file and uncompressed it using bunzip2 or similar, you need to load the data. The data can be found here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 [47Mb]

Loading of the data and pre-processing:

data <- read.csv("repdata-data-StormData.csv")

Data Summary and data for exploration:

A summary of the loaded data:

#summary(data)

Output suppressed, but feel free to look it up yourself. I left the code in, as it was relevant to my data exploration. (knitr was not paying attention to my out.hight=10 option and other attepts to collapse the output?)

We will concentrate on total HUMAN damage and ECONOMIC impact.

First let’s put the cost estimates on property and crops in a useful format:

# Clean up the property damage scalars to something usable adding it to the end
data$real_propdmg<-data$PROPDMG*(1*(data$PROPDMGEXP=='') + 1000*(data$PROPDMGEXP=='K') + 1000000*(data$PROPDMGEXP=='M'))
data$real_cropdmg<-data$CROPDMG*(1*(data$CROPDMGEXP=='') + 1000*(data$CROPDMGEXP=='K') + 1000000*(data$CROPDMGEXP=='M'))

Now create something that helps us explore the data we care about – HUMAN damage and ECONOMIC impact (PropDMG + CropDMG) – :

sumdata <- ddply(data, .(EVTYPE), summarise, sum_fatal = sum(FATALITIES, na.rm=TRUE), sum_injury = sum(INJURIES, na.rm=TRUE), prop_damage = sum(real_propdmg, na.rm=TRUE), crop_damage = sum(real_cropdmg, na.rm=TRUE))

Results:

One must ask, across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

# Order the data and take the top 5 to determine MOST HARMFUL to human life
orderHuman <- sumdata[with(sumdata, order(-sum_fatal,-sum_injury)),]

# Output the data summary
smallSubset <- subset(orderHuman[1:5,])
smallSubset
##             EVTYPE sum_fatal sum_injury prop_damage crop_damage
## 834        TORNADO      5633      91346 51625660483   414953110
## 130 EXCESSIVE HEAT      1903       6525     7753700   492402000
## 153    FLASH FLOOD       978       1777 15140811717  1421317100
## 275           HEAT       937       2100     1797000     1461500
## 464      LIGHTNING       816       5230   928659283    12092090

Let’s take a look at that visually:

qplot(EVTYPE,factor(sum_injury+sum_fatal), data=smallSubset, geom="bar", fill=EVTYPE, stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Event Type") + ylab("Fatality and Injury")

It is clear that tornados, excessive heat, flash floods, heat, and lightning are most damaging to human life and health in order.

Related and similar, across the United States, which types of events have the greatest economic consequences?

# Order the data and take the top 5 to determine MOST HARMFUL to economic interests
orderEconomy <- sumdata[with(sumdata, order(-prop_damage,-crop_damage)),]

# Output the data summary
smallSubset2 <- subset(orderEconomy[1:5,])
smallSubset2
##          EVTYPE sum_fatal sum_injury prop_damage crop_damage
## 834     TORNADO      5633      91346 51625660483   414953110
## 170       FLOOD       470       6789 22157709807  5661968450
## 153 FLASH FLOOD       978       1777 15140811717  1421317100
## 244        HAIL        15       1361 13927366777  3025537453
## 402   HURRICANE        61         46  6168319010  2741910000

And to see that visually:

qplot(EVTYPE,factor(prop_damage), data=smallSubset2, geom="bar", fill=EVTYPE, stat="identity") + theme(axis.text.x=element_text(angle=45, hjust=1)) + xlab("Event Type") + ylab("Loss in $ dollars")

It is clear that impact Economically/financially, TORNADO, FLOOD, FLASH FLOOD, HAIL, and HURRICANE have the most impact on property and crop destruction (aka. DAMAGE).

There is no need to make any specific recommendations, given the purpose for this report.

Conclusion:

It is clear that TORNADO and EXCESSIVE HEAT are most damaging to human life and TORNADO and FLOOD(s) are the most damaging to Property and Crops.