Synopsis

The purpose of this report is to study the NOAA Storm Database to discover the effects of various types of severe weather on health and economic wellbeing in the United States. Specifically, the following two questions will be addressed:

  1. Accross the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

Data is loaded from the coursera website in the form of a bzip2 csv file

#first accessed on 6/16/2015 at 8:41am central time (US)
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
rawData <- read.csv(temp)
unlink(temp)

And we should at least somewhat care about whether there are NAs?

And there are none:

length(complete.cases(rawData))/length(rawData[[1]])
## [1] 1

Of course, we might expect that the best predictor of future events is past events (that’s all we have!), but our data quality decreases with age. So maybe just the recent history is necessary. As such we will truncate the history at the year 2000. Further data transformations are unnecessary as the data is already in a clean (no NA) format:

#we loaded things as factor so fix that...
rawData$BGN_DATE<-as.Date(rawData$BGN_DATE,"%m/%d/%Y")

#this is maybe bad practice, since these data frames are so big... but we have enough ram so it's ok
data<-subset(rawData,rawData$BGN_DATE>as.Date("20000101","%Y%m%d"))

Results (surely we can do more than this?)

A simple summation across the various event types within the applicable date range presents the ordering of event types by either health effects or property damage below. When handling health effects in total, some sort of “indexing” is necessary to handle both injuries and fatalities together. While it is simplistic, below we are simply weighting fatalities at twice the impact of injuries. Artificial, yes, but convenient, and not necessarily too damaging to the ordering of events’ impact on health when dealing with such limited data.

#the boring tables... wait, do tables count as charts?
library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
dt<-data.table(data) #more useless replication/taking up memory
MEDICAL_RESULT<-dt[,list(sum_fatalities=sum(FATALITIES),sum_injuries=sum(INJURIES)),by=EVTYPE]

#I'd hate to have to weight these... maybe we should just sum them... i dunno though, medical
#science is getting pretty cool... maybe injuries don't matter... Let's assume they matter half
#as much.

MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE<-MEDICAL_RESULT$sum_fatalities+0.5*MEDICAL_RESULT$sum_injuries

MEDICAL_RESULT[order(-MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE),]
##                     EVTYPE sum_fatalities sum_injuries
##   1:               TORNADO           1193        15213
##   2:        EXCESSIVE HEAT           1013         3708
##   3:             LIGHTNING            466         2993
##   4:           FLASH FLOOD            600          812
##   5:             TSTM WIND            116         1753
##  ---                                                  
## 192:      LAKE-EFFECT SNOW              0            0
## 193:           DENSE SMOKE              0            0
## 194:       LAKESHORE FLOOD              0            0
## 195: ASTRONOMICAL LOW TIDE              0            0
## 196:      VOLCANIC ASHFALL              0            0
##      TOTALBADTHINGSHEALTHWISE
##   1:                   8799.5
##   2:                   2867.0
##   3:                   1962.5
##   4:                   1006.0
##   5:                    992.5
##  ---                         
## 192:                      0.0
## 193:                      0.0
## 194:                      0.0
## 195:                      0.0
## 196:                      0.0

Property and crop damage, both being presented in dollars can simply be summed. Presumably markets have normalized the values in dollars well enough that summing the various damage can be thought of (simply, at the level of this analysis/data certainly) as comparing apples to apples.

#should look up the EXP fields for damage... probably not "experience". Dood that tornado is lvl 12!
PROPERTY_RESULT<-dt[,list(sum_grassroofedcottages=sum(PROPDMG),sum_cornandstuff=sum(CROPDMG)),by=EVTYPE]

#as an economist I'm going to note that crops and property damage are already rated in terms of
#dollars therefore no weighting is needed... assumptions are useful.
PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE<-PROPERTY_RESULT$sum_grassroofedcottages+PROPERTY_RESULT$sum_cornandstuff

PROPERTY_RESULT[with(PROPERTY_RESULT,order(-PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE)),]
##                            EVTYPE sum_grassroofedcottages sum_cornandstuff
##   1:                  FLASH FLOOD                999333.4        132381.63
##   2:                      TORNADO                907111.7         73634.91
##   3:            THUNDERSTORM WIND                862257.4         66663.00
##   4:                    TSTM WIND                811528.2         53758.70
##   5:                         HAIL                452533.5        363279.18
##  ---                                                                      
## 192:      GUSTY THUNDERSTORM WIND                     0.0             0.00
## 193:         HIGH SURF ADVISORIES                     0.0             0.00
## 194:                  SLEET STORM                     0.0             0.00
## 195: COLD WIND CHILL TEMPERATURES                     0.0             0.00
## 196:             VOLCANIC ASHFALL                     0.0             0.00
##      TOTALBADTHINGSOWNINGSTUFFWISE
##   1:                     1131715.1
##   2:                      980746.6
##   3:                      928920.4
##   4:                      865286.9
##   5:                      815812.6
##  ---                              
## 192:                           0.0
## 193:                           0.0
## 194:                           0.0
## 195:                           0.0
## 196:                           0.0

As can be clearly seen, the results of this study imply quite strongly that people shouldn’t live in oklahoma. It may also be useful to examine the geographic distribution of the data. As such we will filter on the most damaging event types and plot.

library(googleVis)
## Warning: package 'googleVis' was built under R version 3.0.3
## 
## Welcome to googleVis version 0.5.8
## 
## Please read the Google API Terms of Use
## before you start using the package:
## https://developers.google.com/terms/
## 
## Note, the plot method of googleVis will by default use
## the standard browser to display its output.
## 
## See the googleVis package vignettes for more details,
## or visit http://github.com/mages/googleVis.
## 
## To suppress this message use:
## suppressPackageStartupMessages(library(googleVis))
op <- options(gvis.plot.tag="chart")
tornado<-subset(dt,dt$EVTYPE=="TORNADO")
TORNADOINJURIES<-dt[,list(sum_fatalities=sum(FATALITIES),sum_injuries=sum(INJURIES)),by=STATE]

TORNADOINJURIES$TOTALBADTHINGSHEALTHWISE<-TORNADOINJURIES$sum_fatalities+0.5*TORNADOINJURIES$sum_injuries


GeoStates <- gvisGeoChart(TORNADOINJURIES, "STATE", "TOTALBADTHINGSHEALTHWISE",
                          options=list(region="US", 
                                       displayMode="regions", 
                                       resolution="provinces",
                                       width=600, height=400))
plot(GeoStates)

Plot of tornado fatalities+0.5*injuries:

library(googleVis)

flood<-subset(dt,dt$EVTYPE=="FLASH FLOOD")
floodDamage<-dt[,list(sum_grassroofedcottages=sum(PROPDMG),sum_cornandstuff=sum(CROPDMG)),by=STATE]

floodDamage$DollarDamages<-floodDamage$sum_grassroofedcottages+0.5*floodDamage$sum_cornandstuff


GeoStates <- gvisGeoChart(floodDamage, "STATE", "DollarDamages",
                          options=list(region="US", 
                                       displayMode="regions", 
                                       resolution="provinces",
                                       width=600, height=400))
plot(GeoStates)

Plot of flood Property Damage+Crop Damage

options(op)