Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. In this report we aim to describe the severe weather events from 1950 to November 2011 in order to understand its implications on population health and economic growth. In specific, we will answer two questions: 1. which types of events are most harmful to population health in the US? and 2. which types of events have the greatest economic consequences in the US? We obtained the data from the US National Oceanic and Atmospheric Administration’s (NOAA) storm database. From this data, we have found that tornado is the, by far, the most damaging weather event for population health; and that flood, is the most damaging weather event in terms of total economic damage.

Data Processing

Loading Required Packages

if(!require(ggplot2)){install.packages("ggplot2")}
## Loading required package: ggplot2
if(!require(knitr)){install.packages("knitr")}
## Loading required package: knitr
if(!require(car)){install.packages("car")}
## Loading required package: car
if(!require(reshape)){install.packages("reshape")}
## Loading required package: reshape

Loading and Processing the Raw Data

From the NOAA’s storm database, we obtained data on characteristics of major weather events in the US (i.e. where they occur, estimates of fatalities, injuries and property damage. Data start in the year 1950 and end in November 2011. The data come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.

Download & reading in the data:

fileurl = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
filepath <- "/Users/hopewang/Coursera/05 Reproducible Research/RepData_PeerAssessment2/repdata-data-StormData.csv.bz2"
#download.file(url = file_url, destfile = filepath, method = 'wget')
dat <- read.csv(bzfile(filepath))

After reading in the data, we check out the data and the first & last few rows (there are 902,297 total) in this dataset:

dim(dat)
## [1] 902297     37
names(dat)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
head(dat)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Subsetting Required Data Only

There is a huge amount of data available. For our purposes, we are only interested in a small subset the data which is related to human health and economic consequences.

To understand the health consequence of weather events, we will only include those weather events (i.e. rows) that resulted at least one fatality or injuries; additionally, we are only interested in the columns relevent to human health including:

  • EVTYPE: Event type (e.g. tornado, blizzard)
  • FATALITIES: Number of fatalities resulting from an event
  • INJURIES: Number of people “that require treatment by a first responder or a subsequent treatment at a medical facility”
dat.hea <- dat[dat$FATALITIES+dat$INJURIES>0,c("EVTYPE", "FATALITIES", "INJURIES")]

Similarly, to understand the economic consequence of weather events, we will take only those weather events (i.e. rows) that resulted in at least one dollar of damage; additionally, we are only interested in the columns revelent to economic damage including:

  • EVTYPE: Event type (e.g. tornado, blizzard)
  • PROPDMG: Property damage dollars resulting from an event
  • CROPDMG: Crop damage dollars resulting from an event
  • PROPDMGEXP: Property damage dollar amount units (i.e. “K” for thousands, “M” for millions, and “B” for billion)
  • CROPDMGEXP: Crop damage dollar amount units (i.e. “K” for thousands, “M” for millions, and “B” for billion)
dat.eco <- dat[dat$PROPDMG+dat$CROPDMG >0,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Damages dollars are currently recorded in different units, we need convert all damage dollars to a consistent unit. The list of units are:

unique(dat.eco$PROPDMGEXP)
##  [1] K M B m   + 0 5 6 4 h 2 7 3 H -
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(dat.eco$CROPDMGEXP)
## [1]   M K m B ? 0 k
## Levels:  ? 0 2 B k K m M

Conversion to a consistent dollar unit:

dat.eco$PROPDMGUNIT <- recode(dat.eco$PROPDMGEXP,"
                                ''=1;
                                '+'=1;
                                '-'=1;
                                '0'=1;
                                '2'=100;
                                '3'=1000;
                                '4'=10000;
                                '5'=100000;
                                '6'=1000000;
                                '7'=10000000;
                                'h'=100;
                                'H'=100;
                                'K'=1000;
                                'M'=1000000;
                                'm'=1000000;
                                'B'=1000000000;
                              else =1 ", as.factor.result = FALSE)

dat.eco$CROPDMGUNIT <- recode(dat.eco$CROPDMGEXP, "
                                ''=1;
                                '?'=1;
                                '0'=1;
                                'k'=1000;
                                'K'=1000;
                                'M'=1000000;
                                'm'=1000000;
                                'B'=1000000000;
                              else =1 ", as.factor.result = FALSE)

dat.eco$PROPDMG <- dat.eco$PROPDMGUNIT * dat.eco$PROPDMG
dat.eco$CROPDMG <- dat.eco$CROPDMGUNIT * dat.eco$CROPDMG

Results

Human Health Consequences from Weather Events

We will first summarize fatalities and injuries data for the graph:

inj <- aggregate(INJURIES ~ EVTYPE, data = dat.hea, FUN = sum)
fat <- aggregate(FATALITIES ~ EVTYPE, data = dat.hea, FUN = sum)

Next we create a table with fatalities and injuries and total counts:

hea <- data.frame(evtype=fat$EVTYPE, fatality = fat$FATALITIES, injuries = inj$INJURIES, total = fat$FATALITIES+inj$INJURIES)

We will take the top 10 weather events in its total counts of fatalities and injuries:

hea.10 <- hea[order(hea$total,decreasing=T)[1:10],]

Reshape the data for graph:

hea.10.re <- melt(hea.10, id=c("evtype"))

There is a skewness toward large values in the human health impact graph. In order to effectively view and compare the values of the fatalities & injuries of all weather events on the y-axis, the y-axis is transformed by log base 2. Graphing the top 10 weather event impacts to human health:

#hea.10.re$value <- factor(hea.10.re$values,levels=hea.10.re$values, ordered=T)
qplot(evtype, value/log(2), data=hea.10.re, facets=.~variable, geom="bar",stat="identity", fill="red", main="Top 10 Weather Event Impacts to Human Health \n (based on data from 1950 to November 2011)",xlab="weather events",ylab="number of cases (ticked on log2 scale)")+theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))

plot of chunk unnamed-chunk-12

Economic Consequences from Weather Events

Again, we will first summarize the property damage and crop damage data:

pro <- aggregate(PROPDMG ~ EVTYPE, data = dat.eco, FUN = sum)
cro <- aggregate(CROPDMG ~ EVTYPE, data = dat.eco, FUN = sum)

Next we create a table with property damage, crop damage, and total damage:

eco <- data.frame(evtype=pro$EVTYPE, prop_dmg = pro$PROPDMG, crop_dmg = cro$CROPDMG, total_dmg = pro$PROPDMG+cro$CROPDMG)

We will take the top 10 weather events in terms of its total damage dollars:

eco.10 <- eco[order(eco$total,decreasing=T)[1:10],]

Reshape the data for graph:

eco.10.re <- melt(eco.10, id=c("evtype"))

Graphing the top 10 weather event impacts in terms of economic cost:

#eco.10$evtype <- factor(eco.10$evtype,levels=eco.10$evtype, ordered=T)
qplot(evtype, value/10^9, data=eco.10.re, facets=.~variable, geom="bar",stat="identity", fill="red", main="Top 10 Weather Events with Greatest Economic Consequences \n (based on data from 1950 to November 2011)",xlab="weather events",ylab="total damage (in billions of dollars)")+theme(legend.position = "none", axis.text.x = element_text(angle = 45, hjust = 1))

plot of chunk unnamed-chunk-17