url <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
This report analyzes NOAA Storm Data from 1950 to 2011 to determine the impacts of various types of events. The report will look at impact to population health as well as economic impact for various event types.
zipfile <- 'repdata-data-StormData.csv.bz2'
csvfile <- 'repdata-data-StormData.csv'
Program will automatically download data file and unzip it into a CSV file.
However, this can be done by hand by these steps: 1. Download data from here to create local file repdata-data-StormData.csv.bz2 2. Unzip file repdata-data-StormData.csv.bz2 to create repdata-data-StormData.csv
if (!file.exists(csvfile)) {
if (!file.exists(zipfile)) {
download.file(url, destfile=zipfile)
}
library(R.utils, quietly=T)
bunzip2(zipfile, csvfile)
}
if (!file.exists(csvfile)) {
stop("Error in downloading/unzipping data file")
}
There are more complete records beginning in 1995, so filtering out all records before 1995:
library(lubridate)
noaa <- read.csv(csvfile, header=T)
noaa$year <- year(mdy_hms(noaa$BGN_DATE))
noaa <- noaa[noaa$year >= 1995,]
More information about National Weather Service Storm Data and FAQ on storm events.
Population heatlh impact includes injuries and casualties. The following figure shows injuries and casualties caused by various event types:
noaa$popHealth <- noaa$FATALITIES + noaa$INJURIES
popHealth.event <- aggregate(popHealth ~ EVTYPE, noaa, sum)
popHealth.event <- popHealth.event[order(-popHealth.event$popHealth),]
popHealth.event.top <- popHealth.event[1:5,]
with(popHealth.event.top,
barplot(popHealth, names.arg=EVTYPE,
xlab="Event Type",
ylab="Injuries + Fatalities",
main="Top 5 Population Health vs Event Types",
ylim=c(0,30000)
))
popHealth.event.top
## EVTYPE popHealth
## 666 TORNADO 23310
## 112 EXCESSIVE HEAT 8428
## 144 FLOOD 7192
## 358 LIGHTNING 5360
## 683 TSTM WIND 3871
The top 5 events as a proportion of total population health impact:
top.five <- sum(popHealth.event.top$popHealth)
total <- sum(popHealth.event$popHealth)
top.five / total
## [1] 0.6626627
Economic impact includes property damages and crop damages. The following figure shows damages caused by various event types:
normalizeExp <- function(nums, exps) {
f <- function(n, e) {
if (n == 0 || is.na(e)) {
n
} else if (e == "M") {
n * 1000 * 1000
} else if (e == "K") {
n * 1000
} else {
n
}
}
mapply(f, nums, exps)
}
noaa$econImp <- normalizeExp(noaa$PROPDMG, noaa$PROPDMGEXP) +
normalizeExp(noaa$CROPDMG, noaa$CROPDMGEXP)
econImp.event <- aggregate(econImp ~ EVTYPE, noaa, sum)
econImp.event <- econImp.event[order(-econImp.event$econImp),]
econImp.event.top <- econImp.event[1:5,]
with(econImp.event.top,
barplot(econImp / (1000 * 1000 * 1000), names.arg=EVTYPE,
xlab="Event Type",
ylab="Property Damage + Crop Damage in Billions",
main="Top 5 Economic Impact vs Event Types",
ylim=c(0, 30)))
econImp.event.top
## EVTYPE econImp
## 144 FLOOD 26944847580
## 666 TORNADO 19911815433
## 206 HAIL 15854599064
## 134 FLASH FLOOD 15709847661
## 84 DROUGHT 13468172002
The top 5 events as a proportion of total economic impact:
top.five <- sum(econImp.event.top$econImp)
total <- sum(econImp.event$econImp)
top.five / total
## [1] 0.6278942
Out of all events since 1995, tornado has the highest cumulative population health impact. As for economic impact, flood has the highest impact.