The purpose of this report is to study the NOAA Storm Database to discover the effects of various types of severe weather on health and economic wellbeing in the United States. Specifically, the following two questions will be addressed:
Accross the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Data is loaded from the coursera website in the form of a bzip2 csv file
#first accessed on 6/16/2015 at 8:41am central time (US)
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
rawData <- read.csv(temp)
unlink(temp)
And we should at least somewhat care about whether there are NAs?
And there are none:
length(complete.cases(rawData))/length(rawData[[1]])
## [1] 1
Of course, we might expect that the best predictor of future events is past events (that’s all we have!), but our data quality decreases with age. So maybe just the recent history is necessary:
#we loaded things as factor so fix that...
rawData$BGN_DATE<-as.Date(rawData$BGN_DATE,"%m/%d/%Y")
#this is maybe bad practice, since these data frames are so big... but we have enough ram so it's ok
data<-subset(rawData,rawData$BGN_DATE>as.Date("20000101","%Y%m%d"))
Probably shouldn’t write all this jazz like I do comments in my code since this is for a govt big whig. And since this is for a big whig, why show something in a table you could show in a pretty graph???
#the boring tables... wait, do tables count as charts?
library(data.table)
## Warning: package 'data.table' was built under R version 3.1.3
dt<-data.table(data) #more useless replication/taking up memory
MEDICAL_RESULT<-dt[,list(sum_fatalities=sum(FATALITIES),sum_injuries=sum(INJURIES)),by=EVTYPE]
#I'd hate to have to weight these... maybe we should just sum them... i dunno though, medical
#science is getting pretty cool... maybe injuries don't matter... Let's assume they matter half
#as much.
MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE<-MEDICAL_RESULT$sum_fatalities+0.5*MEDICAL_RESULT$sum_injuries
MEDICAL_RESULT[order(-MEDICAL_RESULT$TOTALBADTHINGSHEALTHWISE)]
## EVTYPE sum_fatalities sum_injuries
## 1: TORNADO 1193 15213
## 2: EXCESSIVE HEAT 1013 3708
## 3: LIGHTNING 466 2993
## 4: FLASH FLOOD 600 812
## 5: TSTM WIND 116 1753
## ---
## 192: LAKE-EFFECT SNOW 0 0
## 193: DENSE SMOKE 0 0
## 194: LAKESHORE FLOOD 0 0
## 195: ASTRONOMICAL LOW TIDE 0 0
## 196: VOLCANIC ASHFALL 0 0
## TOTALBADTHINGSHEALTHWISE
## 1: 8799.5
## 2: 2867.0
## 3: 1962.5
## 4: 1006.0
## 5: 992.5
## ---
## 192: 0.0
## 193: 0.0
## 194: 0.0
## 195: 0.0
## 196: 0.0
It’s not fair to say this should be for a govt big whig but also that I should let all my code print out. Like I’d ever show a big whig my code… they might get bored. Or worse, they may read my comments. That’d be no good I’m sure.
In any case, we should proably look at property and crop damage too (wait, crops aren’t property?)
#should look up the EXP fields for damage... probably not "experience". Dood that tornado is lvl 12!
PROPERTY_RESULT<-dt[,list(sum_grassroofedcottages=sum(PROPDMG),sum_cornandstuff=sum(CROPDMG)),by=EVTYPE]
#as an economist I'm going to note that crops and property damage are already rated in terms of
#dollars therefore no weighting is needed... assumptions are useful.
PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE<-PROPERTY_RESULT$sum_grassroofedcottages+PROPERTY_RESULT$sum_cornandstuff
PROPERTY_RESULT[order(-PROPERTY_RESULT$TOTALBADTHINGSOWNINGSTUFFWISE)]
## EVTYPE sum_grassroofedcottages sum_cornandstuff
## 1: FLASH FLOOD 999333.4 132381.63
## 2: TORNADO 907111.7 73634.91
## 3: THUNDERSTORM WIND 862257.4 66663.00
## 4: TSTM WIND 811528.2 53758.70
## 5: HAIL 452533.5 363279.18
## ---
## 192: GUSTY THUNDERSTORM WIND 0.0 0.00
## 193: HIGH SURF ADVISORIES 0.0 0.00
## 194: SLEET STORM 0.0 0.00
## 195: COLD WIND CHILL TEMPERATURES 0.0 0.00
## 196: VOLCANIC ASHFALL 0.0 0.00
## TOTALBADTHINGSOWNINGSTUFFWISE
## 1: 1131715.1
## 2: 980746.6
## 3: 928920.4
## 4: 865286.9
## 5: 815812.6
## ---
## 192: 0.0
## 193: 0.0
## 194: 0.0
## 195: 0.0
## 196: 0.0
As you can quite clearly see, the results of this study imply quite strongly that people shouldn’t live in oklahoma. I’ll ad pretty graphs later.