Synopsis

This document summarizes some of the findings through a data analysis from the NOAA database. The most harmful event in the US on the population are tornadoes, with over 5600 fatalities and 91000 injuries repertoried, over the period 1950-2011. The events with the biggest economic impact are also tornadoes, with over B$ 3 of property damage over the same period. This study is based on the consists of the NOAA database, which lists severe weather events from 1950 to 2011. Data in the earlier years is however partial, due to lack of good records.

Data Processing

The data was downloaded from this link (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2), then unzipped and loaded into R.

The following code was used:

#download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2", mode="wb")
#unz("StormData.csv.bz2", filename="StormData.csv")
stormData<-read.csv("repdata_data_StormData.csv", stringsAsFactor = FALSE)

The data was then processed with this code to create a summary table with impact per event type.

library(reshape2)
names(stormData)<-tolower(names(stormData))
print(names(stormData))
##  [1] "state__"    "bgn_date"   "bgn_time"   "time_zone"  "county"    
##  [6] "countyname" "state"      "evtype"     "bgn_range"  "bgn_azi"   
## [11] "bgn_locati" "end_date"   "end_time"   "county_end" "countyendn"
## [16] "end_range"  "end_azi"    "end_locati" "length"     "width"     
## [21] "f"          "mag"        "fatalities" "injuries"   "propdmg"   
## [26] "propdmgexp" "cropdmg"    "cropdmgexp" "wfo"        "stateoffic"
## [31] "zonenames"  "latitude"   "longitude"  "latitude_e" "longitude_"
## [36] "remarks"    "refnum"
library(reshape2)
#There are  duplicates names in EVTYPE. Bringing them all to lowercase should help.
stormData$evtype<-tolower(stormData$evtype)
#TODO cleanup on the event type would be necessary, but did not have the time.
stormData = transform(stormData, evtype = factor(evtype), county = factor(county), countyname = factor(countyname), state = factor(state), propdmgexp = factor(propdmgexp), cropdmgexp = factor(cropdmgexp))
stormMelt<-melt(stormData,id=c("evtype"),measure.vars=c("length","width","fatalities","injuries","propdmg","cropdmg"))
stormCast<-dcast(stormMelt, evtype~variable, sum)

Results

Across the United States, which types of events are most harmful with respect to population health?

Tornadoes are the biggest cause of fatalities, and are therefore the most harmful to the population.

library(ggplot2)
stormCast<-stormCast[order(stormCast$fatalities,decreasing=TRUE),]
p<-ggplot(stormCast[1:20,],aes(x=evtype, y=fatalities))
p<-p+geom_bar(stat = "identity")
p<-p+ggtitle("Number of fatalities, Top 20 Severe Weather Event Types")
p<-p+coord_flip()
update_labels(p, list(x = "Event Type", y = "Fatalities"))

Across the United States, which types of events have the greatest economic consequences?

Tornadoes come first again, based on property damage.

stormCast<-stormCast[order(stormCast$propdmg,decreasing=TRUE),]
head(stormCast)
##                evtype    length   width fatalities injuries   propdmg
## 758           tornado 198984.23 6764780       5633    91346 3212258.2
## 138       flash flood      1.00       0        978     1777 1420124.6
## 779         tstm wind      1.40     125        504     6957 1335995.6
## 154             flood      0.00       0        470     6789  899938.5
## 685 thunderstorm wind      0.00       0        133     1488  876844.2
## 212              hail   4457.05     205         15     1361  688693.4
##       cropdmg
## 758 100018.52
## 138 179200.46
## 779 109202.60
## 154 168037.88
## 685  66791.45
## 212 579596.28
p<-ggplot(stormCast[1:20,], aes(x=evtype, y=propdmg))
p<-p+geom_bar(stat = "identity")
p<-p+ggtitle("Property Damage, Top 20 Severe Weather Event Types")
p<-p+coord_flip()
update_labels(p, list(x = "Event Type", y = "Property Damage"))

Post Scriptum

I have not had anywhere near enough time to dedicate to this assignment. Cleanup of the event types, more thorough analysis of the costs etc…. would have been great. I however wanted to kept the study simple.