The purpose of this report is to study the NOAA Storm Database to discover the effects of various types of severe weather on health and economic wellbeing in the United States. Specifically, the following two questions will be addressed:
Accross the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Data is loaded from the coursera website in the form of a bzip2 csv file
#first accessed on 6/16/2015 at 8:41am central time (US)
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
rawData <- read.csv(temp)
unlink(temp)
And we should at least somewhat care about whether there are NAs?
And there are none:
length(complete.cases(rawData))/length(rawData[[1]])
## [1] 1
Of course, we might expect that the best predictor of future events is past events (that’s all we have!), but our data quality decreases with age. So maybe just the recent history is necessary. As such we will truncate the history at the year 2000. Further data transformations are unnecessary as the data is already in a clean (no NA) format:
#we loaded things as factor so fix that...
rawData$BGN_DATE<-as.Date(rawData$BGN_DATE,"%m/%d/%Y")
#this is maybe bad practice, since these data frames are so big... but we have enough ram so it's ok
data<-subset(rawData,rawData$BGN_DATE>as.Date("20000101","%Y%m%d"))
A simple summation across the various event types within the applicable date range presents the ordering of event types by either health effects or property damage below. When handling health effects in total, some sort of “indexing” is necessary to handle both injuries and fatalities together. While it is simplistic, below we are simply weighting fatalities at twice the impact of injuries. Artificial, yes, but convenient, and not necessarily too damaging to the ordering of events’ impact on health when dealing with such limited data.
#the boring tables... wait, do tables count as charts?
library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
dt<-data.table(data) #more useless replication/taking up memory
MEDICAL_RESULT<-dt[,list(sum_fatalities=sum(FATALITIES),sum_injuries=sum(INJURIES)),by=EVTYPE]
#I'd hate to have to weight these... maybe we should just sum them... i dunno though, medical
#science is getting pretty cool... maybe injuries don't matter... Let's assume they matter half
#as much.
MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE<-MEDICAL_RESULT$sum_fatalities+0.5*MEDICAL_RESULT$sum_injuries
MEDICAL_RESULT[order(-MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE),]
## EVTYPE sum_fatalities sum_injuries
## 1: TORNADO 1193 15213
## 2: EXCESSIVE HEAT 1013 3708
## 3: LIGHTNING 466 2993
## 4: FLASH FLOOD 600 812
## 5: TSTM WIND 116 1753
## ---
## 192: LAKE-EFFECT SNOW 0 0
## 193: DENSE SMOKE 0 0
## 194: LAKESHORE FLOOD 0 0
## 195: ASTRONOMICAL LOW TIDE 0 0
## 196: VOLCANIC ASHFALL 0 0
## TOTALBADTHINGSHEALTHWISE
## 1: 8799.5
## 2: 2867.0
## 3: 1962.5
## 4: 1006.0
## 5: 992.5
## ---
## 192: 0.0
## 193: 0.0
## 194: 0.0
## 195: 0.0
## 196: 0.0
Property and crop damage, both being presented in dollars can simply be summed. Presumably markets have normalized the values in dollars well enough that summing the various damage can be thought of (simply, at the level of this analysis/data certainly) as comparing apples to apples.
#should look up the EXP fields for damage... probably not "experience". Dood that tornado is lvl 12!
PROPERTY_RESULT<-dt[,list(sum_grassroofedcottages=sum(PROPDMG),sum_cornandstuff=sum(CROPDMG)),by=EVTYPE]
#as an economist I'm going to note that crops and property damage are already rated in terms of
#dollars therefore no weighting is needed... assumptions are useful.
PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE<-PROPERTY_RESULT$sum_grassroofedcottages+PROPERTY_RESULT$sum_cornandstuff
PROPERTY_RESULT[with(PROPERTY_RESULT,order(-PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE)),]
## EVTYPE sum_grassroofedcottages sum_cornandstuff
## 1: FLASH FLOOD 999333.4 132381.63
## 2: TORNADO 907111.7 73634.91
## 3: THUNDERSTORM WIND 862257.4 66663.00
## 4: TSTM WIND 811528.2 53758.70
## 5: HAIL 452533.5 363279.18
## ---
## 192: GUSTY THUNDERSTORM WIND 0.0 0.00
## 193: HIGH SURF ADVISORIES 0.0 0.00
## 194: SLEET STORM 0.0 0.00
## 195: COLD WIND CHILL TEMPERATURES 0.0 0.00
## 196: VOLCANIC ASHFALL 0.0 0.00
## TOTALBADTHINGSOWNINGSTUFFWISE
## 1: 1131715.1
## 2: 980746.6
## 3: 928920.4
## 4: 865286.9
## 5: 815812.6
## ---
## 192: 0.0
## 193: 0.0
## 194: 0.0
## 195: 0.0
## 196: 0.0
As can be clearly seen, the results of this study imply quite strongly that people shouldn’t live in oklahoma. It may also be useful to examine the geographic distribution of the data. As such we will filter on the most damaging event types and plot.
library(googleVis)
## Warning: package 'googleVis' was built under R version 3.0.3
##
## Welcome to googleVis version 0.5.8
##
## Please read the Google API Terms of Use
## before you start using the package:
## https://developers.google.com/terms/
##
## Note, the plot method of googleVis will by default use
## the standard browser to display its output.
##
## See the googleVis package vignettes for more details,
## or visit http://github.com/mages/googleVis.
##
## To suppress this message use:
## suppressPackageStartupMessages(library(googleVis))
op <- options(gvis.plot.tag="chart")
tornado<-subset(dt,dt$EVTYPE=="TORNADO")
TORNADOINJURIES<-dt[,list(sum_fatalities=sum(FATALITIES),sum_injuries=sum(INJURIES)),by=STATE]
TORNADOINJURIES$TOTALBADTHINGSHEALTHWISE<-TORNADOINJURIES$sum_fatalities+0.5*TORNADOINJURIES$sum_injuries
GeoStates <- gvisGeoChart(TORNADOINJURIES, "STATE", "TOTALBADTHINGSHEALTHWISE",
options=list(region="US",
displayMode="regions",
resolution="provinces",
width=600, height=400))
plot(GeoStates)
Plot of tornado fatalities+0.5*injuries:
library(googleVis)
flood<-subset(dt,dt$EVTYPE=="FLASH FLOOD")
floodDamage<-dt[,list(sum_grassroofedcottages=sum(PROPDMG),sum_cornandstuff=sum(CROPDMG)),by=STATE]
floodDamage$DollarDamages<-floodDamage$sum_grassroofedcottages+0.5*floodDamage$sum_cornandstuff
GeoStates <- gvisGeoChart(floodDamage, "STATE", "DollarDamages",
options=list(region="US",
displayMode="regions",
resolution="provinces",
width=600, height=400))
plot(GeoStates)
Plot of flood Property Damage+Crop Damage
options(op)