Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. In this report we aim to describe the severe weather events from 1950 to November 2011 in order to understand its implications on population health and economic growth. In specific, we will answer two questions: 1. which types of events are most harmful to population health in the US? and 2. which types of events have the greatest economic consequences in the US? We obtained the data from the US National Oceanic and Atmospheric Administration’s (NOAA) storm database. From this data, we have found that tornado is the, by far, the most damaging weather event for population health; and that flood, is the most damaging weather event in terms of total economic damage.
if(!require(ggplot2)){install.packages("ggplot2")}
## Loading required package: ggplot2
if(!require(knitr)){install.packages("knitr")}
## Loading required package: knitr
if(!require(car)){install.packages("car")}
## Loading required package: car
if(!require(reshape)){install.packages("reshape")}
## Loading required package: reshape
From the NOAA’s storm database, we obtained data on characteristics of major weather events in the US (i.e. where they occur, estimates of fatalities, injuries and property damage. Data start in the year 1950 and end in November 2011. The data come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.
Download & reading in the data:
fileurl = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filepath <- "/Users/hopewang/Coursera/05 Reproducible Research/RepData_PeerAssessment2/repdata-data-StormData.csv.bz2"
#download.file(url = file_url, destfile = filepath, method = 'wget')
dat <- read.csv(bzfile(filepath))
After reading in the data, we check out the data and the first & last few rows (there are 902,297 total) in this dataset:
dim(dat)
## [1] 902297 37
names(dat)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
head(dat)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
There is a huge amount of data available. For our purposes, we are only interested in a small subset the data which is related to human health and economic consequences.
To understand the health consequence of weather events, we will only include those weather events (i.e. rows) that resulted at least one fatality or injuries; additionally, we are only interested in the columns relevent to human health including:
dat.hea <- dat[dat$FATALITIES+dat$INJURIES>0,c("EVTYPE", "FATALITIES", "INJURIES")]
Similarly, to understand the economic consequence of weather events, we will take only those weather events (i.e. rows) that resulted in at least one dollar of damage; additionally, we are only interested in the columns revelent to economic damage including:
dat.eco <- dat[dat$PROPDMG+dat$CROPDMG >0,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Damages dollars are currently recorded in different units, we need convert all damage dollars to a consistent unit. The list of units are:
unique(dat.eco$PROPDMGEXP)
## [1] K M B m + 0 5 6 4 h 2 7 3 H -
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(dat.eco$CROPDMGEXP)
## [1] M K m B ? 0 k
## Levels: ? 0 2 B k K m M
Conversion to a consistent dollar unit:
dat.eco$PROPDMGUNIT <- recode(dat.eco$PROPDMGEXP,"
''=1;
'+'=1;
'-'=1;
'0'=1;
'2'=100;
'3'=1000;
'4'=10000;
'5'=100000;
'6'=1000000;
'7'=10000000;
'h'=100;
'H'=100;
'K'=1000;
'M'=1000000;
'm'=1000000;
'B'=1000000000;
else =1 ", as.factor.result = FALSE)
dat.eco$CROPDMGUNIT <- recode(dat.eco$CROPDMGEXP, "
''=1;
'?'=1;
'0'=1;
'k'=1000;
'K'=1000;
'M'=1000000;
'm'=1000000;
'B'=1000000000;
else =1 ", as.factor.result = FALSE)
dat.eco$PROPDMG <- dat.eco$PROPDMGUNIT * dat.eco$PROPDMG
dat.eco$CROPDMG <- dat.eco$CROPDMGUNIT * dat.eco$CROPDMG
We will first summarize fatalities and injuries data for the graph:
inj <- aggregate(INJURIES ~ EVTYPE, data = dat.hea, FUN = sum)
fat <- aggregate(FATALITIES ~ EVTYPE, data = dat.hea, FUN = sum)
Next we create a table with fatalities and injuries and total counts:
hea <- data.frame(evtype=fat$EVTYPE, fatality = fat$FATALITIES, injuries = inj$INJURIES, total = fat$FATALITIES+inj$INJURIES)
We will take the top 10 weather events in its total counts of fatalities and injuries:
hea.10 <- hea[order(hea$total,decreasing=T)[1:10],]
Reshape the data for graph:
hea.10.re <- melt(hea.10, id=c("evtype"))
There is a skewness toward large values in the human health impact graph. In order to effectively view and compare the values of the fatalities & injuries of all weather events on the y-axis, the y-axis is transformed by log base 2. Graphing the top 10 weather event impacts to human health:
#hea.10.re$value <- factor(hea.10.re$values,levels=hea.10.re$values, ordered=T)
qplot(evtype, value/log(2), data=hea.10.re, facets=.~variable, geom="bar",stat="identity", fill="red", main="Top 10 Weather Event Impacts to Human Health \n (based on data from 1950 to November 2011)",xlab="weather events",ylab="number of cases (ticked on log2 scale)")+theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))
Again, we will first summarize the property damage and crop damage data:
pro <- aggregate(PROPDMG ~ EVTYPE, data = dat.eco, FUN = sum)
cro <- aggregate(CROPDMG ~ EVTYPE, data = dat.eco, FUN = sum)
Next we create a table with property damage, crop damage, and total damage:
eco <- data.frame(evtype=pro$EVTYPE, prop_dmg = pro$PROPDMG, crop_dmg = cro$CROPDMG, total_dmg = pro$PROPDMG+cro$CROPDMG)
We will take the top 10 weather events in terms of its total damage dollars:
eco.10 <- eco[order(eco$total,decreasing=T)[1:10],]
Reshape the data for graph:
eco.10.re <- melt(eco.10, id=c("evtype"))
Graphing the top 10 weather event impacts in terms of economic cost:
#eco.10$evtype <- factor(eco.10$evtype,levels=eco.10$evtype, ordered=T)
qplot(evtype, value/10^9, data=eco.10.re, facets=.~variable, geom="bar",stat="identity", fill="red", main="Top 10 Weather Events with Greatest Economic Consequences \n (based on data from 1950 to November 2011)",xlab="weather events",ylab="total damage (in billions of dollars)")+theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))