This work examines the cost health and economic cost of extreme weather events in the United States. The presented results is based on data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. In the analysis of the NOAA storm database two questions were addressed:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
In addressing the the first question the of total injuries plus total fatalities is adopted as the measure of cost to population health of a weather event. The economic costed used in addressing the second question is taken to be the sum of the property and agricultural damage resulting from a weather event. The NOAA database contains a large number of uniquely coded weather events; for brevity only the top five most costly types of events are presented in the following analysis. The costs of these five event types are then considered both in aggregate and by year of event.
As a first stage in the data processing the data is downloaded from the class repository.
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl,"repdata_data_StormData.csv.bz2",method="curl")
Since the NOAA storm database include variable types not of interest to answering the questions considered in this work, only the relevant information is read from the data set. Once the data is loaded the year of the event is extracted as a separate variable.
usecols <- c("NULL","character",rep("NULL",5),"character",rep("NULL",14),rep("numeric",2),
rep(c("numeric","character"),2),rep("NULL",9))
datIn <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),colClasses=usecols)
datIn$YEAR <- as.POSIXlt(datIn$BGN_DATE,format="%m/%d/%Y %H:%M:%S")$year+1900
Due the the way costs are coded for each weather event, they must be reconstituted to a standardized format to allow inter-event comparisons and calculations.
numCost <- function(factor,multiplier){
out <- numeric(length(factor))
for (i in seq_along(factor)) {
if(multiplier[i] =="B"){
out[i] <- factor[i]*10^9
}else if((multiplier[i] =="M") | (multiplier[i] == "m")){
out[i] <- factor[i]*10^6
}else{
out[i] <- factor[i]*10^3
}
}
out
}
datIn$PROPDMG <- numCost(datIn$PROPDMG,datIn$PROPDMGEXP)
datIn$CROPDMG <- numCost(datIn$CROPDMG,datIn$CROPDMGEXP)
Event types in the NOAA database are coded in a non-standardized manner. A single type of event may be encoded in multiple ways across the data base, and the formatting of these encoding schemes is also variable from entry to entry. Much of the effort in this project comes in the form of data cleaning.
library(stringr)
datIn$EVTYPE <- str_trim(gsub("\\W"," ",datIn$EVTYPE))
datIn$EVTYPE <- str_trim(gsub("\\d","",datIn$EVTYPE))
datIn$EVTYPE <- tolower(datIn$EVTYPE)
datIn$EVTYPE <- gsub(" {2,}"," ",datIn$EVTYPE)
datIn[datIn$EVTYPE =="" ,"EVTYPE"] <- "NA"
datIn$EVTYPE <- sapply(datIn$EVTYPE, function(x) word(x,1,min(length(strsplit(x," ")[[1]]),2)))
datIn$EVTYPE <- sub("s$","",datIn$EVTYPE)
After a first pass at cleaning the event type character a number of specific event types must be cleaned using more specific procedures.
datIn$EVTYPE <- sub("hurricane [a-z]{1,}","hurricane", datIn$EVTYPE)
datIn$EVTYPE <- sub("wind [a-z]{1,}","wind", datIn$EVTYPE)
datIn$EVTYPE <- sub("flood [a-z]{1,}","flood", datIn$EVTYPE)
datIn$EVTYPE <- sub("thunderstorm [a-z]{1,}","thunderstorm", datIn$EVTYPE)
datIn$EVTYPE <- sub("tornado [a-z]{1,}","tornado", datIn$EVTYPE)
datIn$EVTYPE <- sub("snow [a-z]{1,}","snow", datIn$EVTYPE)
datIn$EVTYPE <- sub("sleet [a-z]{1,}","sleet", datIn$EVTYPE)
datIn$EVTYPE <- sub("lightning [a-z]{1,}","lightning", datIn$EVTYPE)
datIn$EVTYPE <- sub("hail [a-z]{1,}","hail", datIn$EVTYPE)
datIn$EVTYPE <- sub("cold [a-z]{1,}","cold", datIn$EVTYPE)
datIn$EVTYPE <- sub("ice [a-z]{1,}","ice", datIn$EVTYPE)
datIn$EVTYPE <- sub("blizzard [a-z]{1,}","blizzard", datIn$EVTYPE)
datIn$EVTYPE <- sub("tstm[ ]*[a-z]{1,}","tstm", datIn$EVTYPE)
datIn$EVTYPE <- sub("tstm","thunderstorm",datIn$EVTYPE)
datIn$EVTYPE <- sub("excessive heat", "heat", datIn$EVTYPE)
datIn$EVTYPE <- sub("flash flood", "flood", datIn$EVTYPE)
datIn$EVTYPE <- sub("high wind", "wind", datIn$EVTYPE)
datIn$EVTYPE <- factor(datIn$EVTYPE)
Once the data is cleaned, it is processed to find the five costliest event types, both in health and economic terms.
library(plyr)
colNames <- c('FATALITIES','INJURIES','PROPDMG','CROPDMG')
datEVT <- ddply(datIn,.(EVTYPE), function(x) colSums(x[colNames]))
datEVT$HEALTH <- datEVT$INJURIES + datEVT$FATALITIES
datEVT$ECON <- datEVT$PROPDMG + datEVT$CROPDMG
maxHealth <- head(datEVT[with(datEVT,order(-HEALTH)),],5)
maxEcon <- head(datEVT[with(datEVT,order(-ECON)),],5)
Next the economic costs of the event type are broken down by year, and the data selected for the yearly costs of the top five most expensive categories.
datOut <- ddply(datIn,.(EVTYPE,YEAR), function(x) colSums(x[colNames]))
idxHealth <- sapply(datOut, function(x) x %in% maxHealth$EVTYPE)
idxEcon <- sapply(datOut, function(x) x %in% maxEcon$EVTYPE)
The table below contains the five event categories most costly to public health.
library(knitr)
plotHealth <- maxHealth[,c("EVTYPE","FATALITIES","INJURIES","HEALTH")]
colnames(plotHealth) <- c("Weather","Fatalites","Injuries","Total")
kable(plotHealth,format="markdown",row.names=FALSE)
| Weather | Fatalites | Injuries | Total |
|---|---|---|---|
| tornado | 5633 | 91364 | 96997 |
| heat | 2840 | 8625 | 11465 |
| thunderstorm | 710 | 9480 | 10190 |
| flood | 1483 | 8581 | 10064 |
| lightning | 817 | 5232 | 6049 |
The second table provides the costliest severe weather type on an economic cost basis.
plotEcon <- maxEcon[,c("EVTYPE","PROPDMG","CROPDMG","ECON")]
colnames(plotEcon) <- c("Weather","Property Damage", "Crop Damage", "Total")
kable(plotEcon,format="markdown",row.names=FALSE)
| Weather | Property Damage | Crop Damage | Total |
|---|---|---|---|
| tornado | 3.215e+09 | 100025720 | 3.315e+09 |
| thunderstorm | 2.673e+09 | 199288180 | 2.872e+09 |
| flood | 2.345e+09 | 349383840 | 2.694e+09 |
| hail | 6.895e+08 | 579736430 | 1.269e+09 |
| lightning | 6.034e+08 | 3580610 | 6.070e+08 |
Finally it is instructive to examine how the health costs of the costliest event types breaks down by year.
library(ggplot2)
qplot(YEAR,INJURIES+FATALITIES,data=datOut[idxHealth,],geom="line",color=EVTYPE,
xlab="Year",ylab="Total Injuries and Fatalities",main="Heath Costs by Year")
library(ggplot2)
qplot(YEAR,(PROPDMG+CROPDMG)/(10^6),data=datOut[idxEcon,],geom="line",color=EVTYPE,
xlab="Year",ylab="Property and Agricultural Losses (Millions of Dollars)",
main="Economic Costs by Year")