Synopsis

This work examines the cost health and economic cost of extreme weather events in the United States. The presented results is based on data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In the analysis of the NOAA storm database two questions were addressed:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

In addressing the the first question the of total injuries plus total fatalities is adopted as the measure of cost to population health of a weather event. The economic costed used in addressing the second question is taken to be the sum of the property and agricultural damage resulting from a weather event. The NOAA database contains a large number of uniquely coded weather events; for brevity only the top five most costly types of events are presented in the following analysis. The costs of these five event types are then considered both in aggregate and by year of event.

Data Processing

As a first stage in the data processing the data is downloaded from the class repository.

fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,"repdata_data_StormData.csv.bz2",method="curl")

Since the NOAA storm database include variable types not of interest to answering the questions considered in this work, only the relevant information is read from the data set. Once the data is loaded the year of the event is extracted as a separate variable.

usecols <- c("NULL","character",rep("NULL",5),"character",rep("NULL",14),rep("numeric",2),
             rep(c("numeric","character"),2),rep("NULL",9))
datIn <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),colClasses=usecols)
datIn$YEAR <- as.POSIXlt(datIn$BGN_DATE,format="%m/%d/%Y %H:%M:%S")$year+1900

Due the the way costs are coded for each weather event, they must be reconstituted to a standardized format to allow inter-event comparisons and calculations.

numCost <- function(factor,multiplier){
        out <- numeric(length(factor))
        for (i in seq_along(factor)) {
                if(multiplier[i] =="B"){
                        out[i] <- factor[i]*10^9
                }else if((multiplier[i] =="M") | (multiplier[i] == "m")){
                        out[i] <- factor[i]*10^6
                }else{
                out[i] <- factor[i]*10^3
                }
        }
        out
}
datIn$PROPDMG <- numCost(datIn$PROPDMG,datIn$PROPDMGEXP) 
datIn$CROPDMG <- numCost(datIn$CROPDMG,datIn$CROPDMGEXP) 

Event types in the NOAA database are coded in a non-standardized manner. A single type of event may be encoded in multiple ways across the data base, and the formatting of these encoding schemes is also variable from entry to entry. Much of the effort in this project comes in the form of data cleaning.

library(stringr)
datIn$EVTYPE <- str_trim(gsub("\\W"," ",datIn$EVTYPE))
datIn$EVTYPE <- str_trim(gsub("\\d","",datIn$EVTYPE))
datIn$EVTYPE <- tolower(datIn$EVTYPE)
datIn$EVTYPE <- gsub(" {2,}"," ",datIn$EVTYPE)
datIn[datIn$EVTYPE =="" ,"EVTYPE"] <- "NA"
datIn$EVTYPE <- sapply(datIn$EVTYPE, function(x) word(x,1,min(length(strsplit(x," ")[[1]]),2)))
datIn$EVTYPE <- sub("s$","",datIn$EVTYPE)

After a first pass at cleaning the event type character a number of specific event types must be cleaned using more specific procedures.

datIn$EVTYPE <- sub("hurricane [a-z]{1,}","hurricane", datIn$EVTYPE)
datIn$EVTYPE <- sub("wind [a-z]{1,}","wind", datIn$EVTYPE)
datIn$EVTYPE <- sub("flood [a-z]{1,}","flood", datIn$EVTYPE)
datIn$EVTYPE <- sub("thunderstorm [a-z]{1,}","thunderstorm", datIn$EVTYPE)
datIn$EVTYPE <- sub("tornado [a-z]{1,}","tornado", datIn$EVTYPE)
datIn$EVTYPE <- sub("snow [a-z]{1,}","snow", datIn$EVTYPE)
datIn$EVTYPE <- sub("sleet [a-z]{1,}","sleet", datIn$EVTYPE)
datIn$EVTYPE <- sub("lightning [a-z]{1,}","lightning", datIn$EVTYPE)
datIn$EVTYPE <- sub("hail [a-z]{1,}","hail", datIn$EVTYPE)
datIn$EVTYPE <- sub("cold [a-z]{1,}","cold", datIn$EVTYPE)
datIn$EVTYPE <- sub("ice [a-z]{1,}","ice", datIn$EVTYPE)
datIn$EVTYPE <- sub("blizzard [a-z]{1,}","blizzard", datIn$EVTYPE)
datIn$EVTYPE <- sub("tstm[ ]*[a-z]{1,}","tstm", datIn$EVTYPE)
datIn$EVTYPE <- sub("tstm","thunderstorm",datIn$EVTYPE)
datIn$EVTYPE <- sub("excessive heat", "heat", datIn$EVTYPE)
datIn$EVTYPE <- sub("flash flood", "flood", datIn$EVTYPE)
datIn$EVTYPE <- sub("high wind", "wind", datIn$EVTYPE)
datIn$EVTYPE <- factor(datIn$EVTYPE)

Once the data is cleaned, it is processed to find the five costliest event types, both in health and economic terms.

library(plyr)
colNames <- c('FATALITIES','INJURIES','PROPDMG','CROPDMG')
datEVT <- ddply(datIn,.(EVTYPE), function(x) colSums(x[colNames]))
datEVT$HEALTH <- datEVT$INJURIES + datEVT$FATALITIES
datEVT$ECON <- datEVT$PROPDMG + datEVT$CROPDMG
maxHealth <- head(datEVT[with(datEVT,order(-HEALTH)),],5)
maxEcon <- head(datEVT[with(datEVT,order(-ECON)),],5)

Next the economic costs of the event type are broken down by year, and the data selected for the yearly costs of the top five most expensive categories.

datOut <- ddply(datIn,.(EVTYPE,YEAR), function(x) colSums(x[colNames]))
idxHealth <- sapply(datOut, function(x) x %in% maxHealth$EVTYPE)
idxEcon <- sapply(datOut, function(x) x %in% maxEcon$EVTYPE)

Results

The table below contains the five event categories most costly to public health.

library(knitr)
plotHealth <- maxHealth[,c("EVTYPE","FATALITIES","INJURIES","HEALTH")]
colnames(plotHealth) <- c("Weather","Fatalites","Injuries","Total")
kable(plotHealth,format="markdown",row.names=FALSE)
Weather Fatalites Injuries Total
tornado 5633 91364 96997
heat 2840 8625 11465
thunderstorm 710 9480 10190
flood 1483 8581 10064
lightning 817 5232 6049

The second table provides the costliest severe weather type on an economic cost basis.

plotEcon <- maxEcon[,c("EVTYPE","PROPDMG","CROPDMG","ECON")]
colnames(plotEcon) <- c("Weather","Property Damage", "Crop Damage", "Total")
kable(plotEcon,format="markdown",row.names=FALSE)
Weather Property Damage Crop Damage Total
tornado 3.215e+09 100025720 3.315e+09
thunderstorm 2.673e+09 199288180 2.872e+09
flood 2.345e+09 349383840 2.694e+09
hail 6.895e+08 579736430 1.269e+09
lightning 6.034e+08 3580610 6.070e+08

Finally it is instructive to examine how the health costs of the costliest event types breaks down by year.

library(ggplot2)
qplot(YEAR,INJURIES+FATALITIES,data=datOut[idxHealth,],geom="line",color=EVTYPE,
      xlab="Year",ylab="Total Injuries and Fatalities",main="Heath Costs by Year")

plot of chunk unnamed-chunk-10

library(ggplot2)
qplot(YEAR,(PROPDMG+CROPDMG)/(10^6),data=datOut[idxEcon,],geom="line",color=EVTYPE,
      xlab="Year",ylab="Property and Agricultural Losses (Millions of Dollars)",
      main="Economic Costs by Year")

plot of chunk unnamed-chunk-11