The basic goal of this report is to explore the NOAA Storm Database (including events starting in the year 1950 and ending in November 2011) and answer some basic questions about the impact of severe weather events on public health and economy. Based on this data, we draw the conclusion that on average across the U.S., the most harmful on people's health (as measured by fatalities and injuries) event type is tornado. The most harmful on economy (as measured by property damage and crop damage) event type is also tornado.
The data come from NOAA Storm Database in the form of a comma-separated-value file compressed via the bzip2 algorithm: Storm Data [47Mb] The documentation of the database is located at
library(bitops)
library(RCurl)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
bin <- getBinaryURL(url, ssl.verifypeer = FALSE)
## download.file(url, 'repdata-data-StormData.csv.bz2', method='curl', mode =
## 'wb')
con <- file("repdata-data-StormData.csv.bz2", open = "wb")
writeBin(bin, con)
close(con)
library(R.utils)
## Loading required package: R.oo
## Loading required package: R.methodsS3
## R.methodsS3 v1.6.1 (2014-01-04) successfully loaded. See ?R.methodsS3 for help.
## R.oo v1.18.0 (2014-02-22) successfully loaded. See ?R.oo for help.
##
## Attaching package: 'R.oo'
##
## The following object is masked from 'package:RCurl':
##
## clone
##
## The following objects are masked from 'package:methods':
##
## getClasses, getMethods
##
## The following objects are masked from 'package:base':
##
## attach, detach, gc, load, save
##
## R.utils v1.32.4 (2014-05-14) successfully loaded. See ?R.utils for help.
##
## Attaching package: 'R.utils'
##
## The following object is masked from 'package:RCurl':
##
## reset
##
## The following object is masked from 'package:utils':
##
## timestamp
##
## The following objects are masked from 'package:base':
##
## cat, commandArgs, getOption, inherits, isOpen, parse, warnings
bunzip2("repdata-data-StormData.csv.bz2", destname = "repdata-data-StormData.csv",
overwrite = FALSE, remove = TRUE, BFR.SIZE = 1e+07)
StormData <- read.csv("repdata-data-StormData.csv")
In order to answer the question which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health, we aggregate the sum of fatalities and injuries per event type, produce a summary and sort them in descending order.
library(data.table)
storm <- data.table(StormData)
HealthImpactPerEvent <- storm[, sum(FATALITIES) + sum(INJURIES), by = EVTYPE]
# names(HealthImpactPerEvent) <- c('event_type','health_impact')
setnames(HealthImpactPerEvent, "EVTYPE", "event_type")
setnames(HealthImpactPerEvent, "V1", "health_impact")
summary(HealthImpactPerEvent)
## event_type health_impact
## HIGH SURF ADVISORY: 1 Min. : 0
## COASTAL FLOOD : 1 1st Qu.: 0
## FLASH FLOOD : 1 Median : 0
## LIGHTNING : 1 Mean : 158
## TSTM WIND : 1 3rd Qu.: 0
## TSTM WIND (G45) : 1 Max. :96979
## (Other) :979
library(plyr)
TopHealthImpactPerEvent <- arrange(HealthImpactPerEvent, desc(health_impact))
In order to answer the question which types of events have the greatest economic consequences (in dollar amount), we aggregate the sum of property and crop damage per event type, produce a summary and sort them in descending order.
EconomyImpactPerEvent <- storm[, sum(PROPDMG) + sum(CROPDMG), by = EVTYPE]
setnames(EconomyImpactPerEvent, "EVTYPE", "event_type")
setnames(EconomyImpactPerEvent, "V1", "economy_impact")
summary(EconomyImpactPerEvent)
## event_type economy_impact
## HIGH SURF ADVISORY: 1 Min. : 0
## COASTAL FLOOD : 1 1st Qu.: 0
## FLASH FLOOD : 1 Median : 0
## LIGHTNING : 1 Mean : 12449
## TSTM WIND : 1 3rd Qu.: 50
## TSTM WIND (G45) : 1 Max. :3312277
## (Other) :979
TopEconomyImpactPerEvent <- arrange(EconomyImpactPerEvent, desc(economy_impact))
The top 10 event types per impact category are:
head(TopHealthImpactPerEvent, 10)
## event_type health_impact
## 1: TORNADO 96979
## 2: EXCESSIVE HEAT 8428
## 3: TSTM WIND 7461
## 4: FLOOD 7259
## 5: LIGHTNING 6046
## 6: HEAT 3037
## 7: FLASH FLOOD 2755
## 8: ICE STORM 2064
## 9: THUNDERSTORM WIND 1621
## 10: WINTER STORM 1527
head(TopEconomyImpactPerEvent, 10)
## event_type economy_impact
## 1: TORNADO 3312277
## 2: FLASH FLOOD 1599325
## 3: TSTM WIND 1445168
## 4: HAIL 1268290
## 5: FLOOD 1067976
## 6: THUNDERSTORM WIND 943636
## 7: LIGHTNING 606932
## 8: THUNDERSTORM WINDS 464978
## 9: HIGH WIND 342015
## 10: WINTER STORM 134700
The figures below display an overview of the impact on health and economy of storm event types:
TopHealthEconomy <- merge(TopHealthImpactPerEvent[1:10, ], TopEconomyImpactPerEvent[1:10,
], by = "event_type")
TopHealthEconomy
## event_type health_impact economy_impact
## 1: FLASH FLOOD 2755 1599325
## 2: FLOOD 7259 1067976
## 3: LIGHTNING 6046 606932
## 4: THUNDERSTORM WIND 1621 943636
## 5: TORNADO 96979 3312277
## 6: TSTM WIND 7461 1445168
## 7: WINTER STORM 1527 134700
library(lattice)
par(mfrow = c(1, 2))
barchart(event_type ~ health_impact, data = TopHealthEconomy, ylab = "Event Types",
xlab = "Number of Incidents")
barchart(event_type ~ (economy_impact)/1000, data = TopHealthEconomy, ylab = "Event Types",
xlab = "Damages (in thousand $)")