Preliminary analysis of population health and economic impacts of weather events using NOAA the storm event database (Peer Assessment 2)

Synopsis:

A preliminary analysis of storm event data since 1996 suggests that the weather events that have the largest impact on population health are tornado and excessive heat events, closely followed by flood and flash flood. Property damage is mainly driven by hurricane, flood and storm surge events. Crop damage, however, is overwhemlingly due to drought events, followed by hurricane, flood and hail events. In summary, in the categories of fatalities, injuires, property damage and crop damage the top events are excessive heat, tornado, flood and drought respectively.

Data processing: retrieve data from online source

The NOAA storm event data file is retrieved from the online location https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. A description of this raw data can be found be found here https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.

# download raw data file
setwd("~/kim/hopkinsdata")

fileUrl<-"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
tmp<-"Repdata-data-StormData.csv.bz2"

if (!file.exists(tmp)) {  
download.file(fileUrl, destfile=tmp, method="curl")
}

Data processing: read in the raw data file

This file is then read into memory. The file is 49.2MB and this can take some time.

if(!exists("d")) {
d <- read.csv("Repdata-data-StormData.csv.bz2")
}

Data processing: date range and event type

As indicated by NOAA (http://www.ncdc.noaa.gov/stormevents/details.jsp?type=eventtype) the type of storm events recorded has changed through time. In particular, only TORNADO events were recorded from 1950-1954, only TORNANDO, THUNDERSTORM WIND and HAIL from 1955-1992 were translated from paper publications to digital, and from 1993 to 1995, only ORNANDO, THUNDERSTORM WIND and HAIL events were extracted from the unformatted text files.

As a result, summaries based on older data will underestimate the frequency, population health and economic impacts of storm events. Therefore, only data from 1996 to present will be used in this analysis.

In this preliminary analysis, no attempt was made to initially recode the 985 different event codings listed in EVTYPE from 1996 forward into the 48 official event types as defined in NWS Directive 10-1605, even though some of these codings were obviously inappropriate (e.g., “Summary of March 23” ). All recoding of event type was done only where it had bearing on the analysis. Of the most important event types for population health and economic damage, three were thought to be duplicate codings. THUNDERSTORM WIND events coded as TMSTM WIND were recoded. HEAT was lumped with the category EXCESSIVE HEAT. RIP CURRENTS were recoded as RIP CURRENT, and HURRICANE/TYPHOON as HURRICANE.

dm<-d
dm$BGN_DAT<-strptime(dm$BGN_DAT, "%m/%d/%Y %H:%M:%S")
tdate<-strptime("1/1/1996", "%m/%d/%Y")
inc<-dm$BGN_DAT>tdate
dm<-dm[inc,]
dm$EVTYPE<-as.character(dm$EVTYPE)
env<-(dm$EVTYPE=="TSTM WIND")
dm$EVTYPE[env]<-"THUNDERSTORM WIND"
enh<-(dm$EVTYPE=="HEAT")
dm$EVTYPE[enh]<-"EXCESSIVE HEAT"
enr<-(dm$EVTYPE=="RIP CURRENTS")
dm$EVTYPE[enr]<-"RIP CURRENT"
enh<-(dm$EVTYPE=="HURRICANE/TYPHOON")
dm$EVTYPE[enh]<-"HURRICANE"

Data processing: economic impact estimates

Indicators of property damage and crop damage were coded in two columns each in the raw data file: PROPDMG, PROPDMGEXP and CROPDMG, CROPDMGEXP, where the second column indicated the order of magnitude of the damage estimate with either a numeric code (e.g. 2 = a multiplier of 102), or an alpha code (e.g. “k” = a multipler of 103). Alpha codes of h or H; k or K, m or M, b or B were replaced with mupltiplers of 2, 3, 6, or 9 respectively. All other codes were replaced with 0 (e.g., 100=1), and two new data columns for property and crop damage, TPROP and TCROP, were created by muliplying PROPDMG by 10PROPDMGEXP and CROPDMG by 10CROPDMGEXP

dm$PROPDMGEXP<-as.character(dm$PROPDMGEXP)
inh<-(dm$PROPDMGEXP %in% c("h","H"))
      dm$PROPDMGEXP[inh]<-"2"
ink<-(dm$PROPDMGEXP %in% c("K"))
      dm$PROPDMGEXP[ink]<-"3"
inm<-(dm$PROPDMGEXP %in% c("m","M"))
      dm$PROPDMGEXP[inm]<-"6"
inB<-(dm$PROPDMGEXP %in% c("B"))
      dm$PROPDMGEXP[inB]<-"9"
ino<-(dm$PROPDMGEXP %in% c("","-", "?", "+"))
      dm$PROPDMGEXP[ino]<-"0"

dm$TPROP<-dm$PROPDMG*10^(as.numeric(dm$PROPDMGEXP))

dm$CROPDMGEXP<-as.character(dm$CROPDMGEXP)
cink<-(dm$CROPDMGEXP %in% c("k","K"))
      dm$CROPDMGEXP[cink]<-"3"
cinm<-(dm$CROPDMGEXP %in% c("m","M"))
      dm$CROPDMGEXP[cinm]<-"6"
cinB<-(dm$CROPDMGEXP %in% c("B"))
      dm$CROPDMGEXP[cinB]<-"9"
cino<-(dm$CROPDMGEXP %in% c("","?"))
      dm$CROPDMGEXP[cino]<-"0"
cna<-(is.na(dm$CROPDMGEXP))
  dm$CROPDMGEXP[cna]<-0
dm$TCROP<-dm$CROPDMG*10^(as.numeric(dm$CROPDMGEXP))

Results: events most harmful to population health

I summarize the effect of storm events by presenting only the top ten event types for injuries. Many events types may have no fatalities or injuries. This selection included the top 5 events for fatalities.

inj<-(aggregate(dm$INJURIES, list(dm$EVTYPE),  sum))
inj<-inj[inj$x>0,]
inj<-inj[with(inj, order(-x)), ]

dmax<-which.max(dm$FATALITIES)
ndeaths<-dm$FATALITIES[dmax]
deaths<-(aggregate(dm$FATALITIES, list(dm$EVTYPE),  sum))
deaths<-deaths[deaths$x>0,]
deaths<-deaths[with(deaths, order(-x)), ]
lb<-as.character(deaths[1:10,1])
totalhuman <- merge(deaths, inj,by="Group.1")
names(totalhuman)<-c("Event", "Fatalities", "Injuries")
totalhuman<-totalhuman[with(totalhuman, order(-Injuries)), ]

This health impact data is presented below where the log number of impacts is shown for the top ten event types only.

sinj<- t(data.frame(totalhuman$Fatalities[1:10], totalhuman$Injuries[1:10]))
x<-barplot(sinj, log="y",beside=TRUE, main="Weather events with largest health impacts", xlab="", ylab="Log number of health impacts", xaxt="n", col=c("black", "pink"))
legend("topright", legend =c("Fatalities", "Injuries"), fill=c("black", "pink"))
lb<-as.character(inj[1:10,1])
text(cex=.6, x=colMeans(x), y=50, lb, xpd=TRUE, srt=45, adj=1)

plot of chunk unnamed-chunk-6

The event type with the largest impact on fatalities is that of excessive heat, closely followed by tornados, and then flash floods. However, the most injuries are found for tornados, and the second most for excessive heat, followed by regular floods. The information suggests that tornados, excessive heat and flood are the the most important for health impacts.

Results: events with the greatest economic consequences

I summarize the effect of storm events by presenting only the top ten event types for property and crop damange. Many events types may have no damage and were not included in the analysis.

prp<-(aggregate(dm$TPROP, list(dm$EVTYPE),  sum))
prp<-prp[prp$x>0.e0,]
prp<-prp[!is.na(prp$x),]
prp<-prp[with(prp, order(-x)), ]

lb<-as.character(prp[1:10,1])
x<-barplot(prp$x[1:10],  main="Weather events with largest property damage", xlab="", ylab="Cost ($)", xaxt="n", col="blue")
text(cex=.6, x=x, y=-5e9, lb, xpd=TRUE, srt=45, adj=1)

plot of chunk unnamed-chunk-8

Examination of the events with the total largest property damage revealed that FLOOD, HURRICANE, STORM SURGE, and TORNADO are by far the largest contributors.

crp<-(aggregate(dm$TCROP, list(dm$EVTYPE),  sum))
crp<-crp[crp$x>0.e0,]
crp<-crp[!is.na(crp$x),]
crp<-crp[with(crp, order(-x)), ]

lb<-as.character(crp[1:10,1])
x<-barplot(crp$x[1:10],  main="Weather events with largest crop damage", xlab="", ylab="Cost ($)", xaxt="n", col="green")
text(cex=.6, x=x, y=-5e8, lb, xpd=TRUE, srt=45, adj=1)

plot of chunk unnamed-chunk-9

Crop damage, however, has DROUGHT as the overwhelmingly largest contributor to economic damage. The second most important contributor is HURRICANE, followed by FLOOD and HAIL events.