This document summarizes some of the findings through a data analysis from the NOAA database. The most harmful event in the US on the population are tornadoes, with over 5600 fatalities and 91000 injuries repertoried, over the period 1950-2011. The events with the biggest economic impact are also tornadoes, with over B$ 3 of property damage over the same period. This study is based on the consists of the NOAA database, which lists severe weather events from 1950 to 2011. Data in the earlier years is however partial, due to lack of good records.
The data was downloaded from this link (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2), then unzipped and loaded into R.
The following code was used:
#download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","StormData.csv.bz2", mode="wb")
#unz("StormData.csv.bz2", filename="StormData.csv")
stormData<-read.csv("repdata_data_StormData.csv", stringsAsFactor = FALSE)
The data was then processed with this code to create a summary table with impact per event type.
library(reshape2)
names(stormData)<-tolower(names(stormData))
print(names(stormData))
## [1] "state__" "bgn_date" "bgn_time" "time_zone" "county"
## [6] "countyname" "state" "evtype" "bgn_range" "bgn_azi"
## [11] "bgn_locati" "end_date" "end_time" "county_end" "countyendn"
## [16] "end_range" "end_azi" "end_locati" "length" "width"
## [21] "f" "mag" "fatalities" "injuries" "propdmg"
## [26] "propdmgexp" "cropdmg" "cropdmgexp" "wfo" "stateoffic"
## [31] "zonenames" "latitude" "longitude" "latitude_e" "longitude_"
## [36] "remarks" "refnum"
library(reshape2)
#There are duplicates names in EVTYPE. Bringing them all to lowercase should help.
stormData$evtype<-tolower(stormData$evtype)
#TODO cleanup on the event type would be necessary, but did not have the time.
stormData = transform(stormData, evtype = factor(evtype), county = factor(county), countyname = factor(countyname), state = factor(state), propdmgexp = factor(propdmgexp), cropdmgexp = factor(cropdmgexp))
stormMelt<-melt(stormData,id=c("evtype"),measure.vars=c("length","width","fatalities","injuries","propdmg","cropdmg"))
stormCast<-dcast(stormMelt, evtype~variable, sum)
Tornadoes are the biggest cause of fatalities, and are therefore the most harmful to the population.
library(ggplot2)
stormCast<-stormCast[order(stormCast$fatalities,decreasing=TRUE),]
p<-ggplot(stormCast[1:20,],aes(x=evtype, y=fatalities))
p<-p+geom_bar(stat = "identity")
p<-p+ggtitle("Number of fatalities, Top 20 Severe Weather Event Types")
p<-p+coord_flip()
update_labels(p, list(x = "Event Type", y = "Fatalities"))
Tornadoes come first again, based on property damage.
stormCast<-stormCast[order(stormCast$propdmg,decreasing=TRUE),]
head(stormCast)
## evtype length width fatalities injuries propdmg
## 758 tornado 198984.23 6764780 5633 91346 3212258.2
## 138 flash flood 1.00 0 978 1777 1420124.6
## 779 tstm wind 1.40 125 504 6957 1335995.6
## 154 flood 0.00 0 470 6789 899938.5
## 685 thunderstorm wind 0.00 0 133 1488 876844.2
## 212 hail 4457.05 205 15 1361 688693.4
## cropdmg
## 758 100018.52
## 138 179200.46
## 779 109202.60
## 154 168037.88
## 685 66791.45
## 212 579596.28
p<-ggplot(stormCast[1:20,], aes(x=evtype, y=propdmg))
p<-p+geom_bar(stat = "identity")
p<-p+ggtitle("Property Damage, Top 20 Severe Weather Event Types")
p<-p+coord_flip()
update_labels(p, list(x = "Event Type", y = "Property Damage"))
I have not had anywhere near enough time to dedicate to this assignment. Cleanup of the event types, more thorough analysis of the costs etc…. would have been great. I however wanted to kept the study simple.