Relative Comparison of High Impact Weather Effects: Population & Economic Consequences

Prepared by: Ben Apple

Date prepared: Sepyember 14 2014

Prepared in fulfillment of Coursera Reproducable Research : Analysis of NOAA Storm Database for Assignment 2 (RepData_PeerAssessment2)

Synopsis

In this brief study, we undertake an analysis of the NOAA Storm Data in an attempt to answer the following questions:

1.Across the United States, which types of events are most harmful with respect to population health?

2.Across the United States, which types of events have the greatest economic consequences?

Our overall hypothesis is that certain wheather events impact the health of the population of the United States as well as have an adverse economic impact on the economy of the same region. We obtained the wheather data used in this study from the NOAA Storm Dtatbase.

Sumary of Findings

1.The health impact analysis implies that tornados are the weather events that have had the greatest impact on the population of the United States with Excessive heat as the second leading cause of health issues for the population. We feel the need to explain a possible anomally. While Excessive Heat seems to have a greater fatality rate than Tornados, when taken in the aggergate of injusy and fatality Tornados have a far greater health impact on the population. This may well be a source of confusion and contention between researchers as to the magnitude of the “Health Impact” betwen Tornados and Excessive Heat. We choose to look at the Healt Impact in the aggregate.

2.The economic consequences analysis implies that Floods cause the greatest economic damage with extreme water events like Floods, Hurricane/typhoons, and storm surges causeing significant economic damage.

It is worth noteing that that these conclusions are derived from mean averages and therefor may be distorted by event outliers.

Data Processing

In this process we load the libraries and the download the data from the identified URL. We then read in the CSV file and extract the pertinent variables.

## House keeping
proc_date <- date()
library(markdown)
library(Hmisc)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: survival
## Loading required package: splines
## Loading required package: Formula
## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, round.POSIXt, trunc.POSIXt, units
library(Rcmdr)
## Loading required package: RcmdrMisc
## Loading required package: car
## Loading required package: sandwich
## The Commander GUI is launched only in interactive sessions
library(ggplot2)
library(reshape2)

# Only the first time data is download, a 'strmdata' directory is created.
if (!file.exists("strmdata")) {
        dir.create("strmdata")
}
# and stores that file into 'strmdata' directory
if (!file.exists("strmdata/repdata-data-StormData.csv.bz2")) {
        fileURL <- "http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(fileURL, destfile = "./strmdata/repdata-data-StormData.csv.bz2")
}

## Unzip and read the dataset into the strom_dat table
storm_dat <- read.csv(bzfile("./strmdata/repdata-data-StormData.csv.bz2"), stringsAsFactors = FALSE)

## Process the storm_dat table for the pertinent variables
storm_dat$EVTYPE <- capitalize(tolower(storm_dat$EVTYPE))

After reading in the data set we will review a few attributes of the data. We see that the data set has 902,297 rows with 37 columns. WE then review the structure of the data.

dim(storm_dat)
## [1] 902297     37
head(storm_dat[,1:8])
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE
## 1 Tornado
## 2 Tornado
## 3 Tornado
## 4 Tornado
## 5 Tornado
## 6 Tornado
str(storm_dat)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "Tornado" "Tornado" "Tornado" "Tornado" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Results

In this section we present our results with supporting graphics.

Analysis of Harmful Health Impacts of Wheather to the Popultion of the United States.

To assess the population health impact we examine FATALITIES and INJURIES. First, we combine them and class them as CASUALTIES. next, we subset our data, sort it then generate a histogram.

damages <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, storm_dat, sum)
hum_dam <- melt(head(damages[order(-damages$FATALITIES, -damages$INJURIES), ], 10))

The below histogram supports the the inference that event type Tornado poses the greatest threat to human health (as an aggregation of fatalities and injuries) in the population of the United States.

## Graph of  human impacts
ggplot(hum_dam, aes(x = EVTYPE, y = value, fill = variable)) + geom_bar(stat = "identity") + 
        coord_flip() + ggtitle("Harmful events") + labs(x = "", y = "number of people impacted") + 
        scale_fill_manual(values = c("red", "orange"), labels = c("Deaths", "Injuries"))

plot of chunk unnamed-chunk-6

Analysis of the Economic (Nited States) Impact of Wheather Events.

To estimate the top ten economic impact events we use the same algorithm as above with modification to the variables such as the property (PROPDMG) and crop (CROPDMG) damage variables. In the supporting graphic these varaiables are experssed in thousands of dollars.

## Economic impact analysis
storm_dat$PROPDMG <- storm_dat$PROPDMG * as.numeric(Recode(storm_dat$PROPDMGEXP, "'0'=1;'1'=10;'2'=100;'3'=1000;'4'=10000;'5'=100000;'6'=1000000;'7'=10000000;'8'=100000000;'B'=1000000000;'h'=100;'H'=100;'K'=1000;'m'=1000000;'M'=1000000;'-'=0;'?'=0;'+'=0", as.factor.result = FALSE))
storm_dat$CROPDMG <- storm_dat$CROPDMG * as.numeric(Recode(storm_dat$CROPDMGEXP, "'0'=1;'2'=100;'B'=1000000000;'k'=1000;'K'=1000;'m'=1000000;'M'=1000000;''=0;'?'=0", as.factor.result = FALSE))
ecofact <- aggregate(cbind(PROPDMG, CROPDMG) ~ EVTYPE, storm_dat, sum)
eco_dam <- melt(head(ecofact[order(-ecofact$PROPDMG, -ecofact$CROPDMG), ], 10))

The below histogram supports the the inference that event type Flood appears to have had the greatest economic impact on the United States.

ggplot(eco_dam, aes(x = EVTYPE, y = value, fill = variable)) + geom_bar(stat = "identity") + 
        coord_flip() + ggtitle("Economic consequences") + labs(x = "", y = "cost of damages in dollars") + 
        scale_fill_manual(values = c("orange", "red"), labels = c("Property Damage", 
                                                                    "Crop Damage"))

plot of chunk unnamed-chunk-8