Fatalities, Injuries, Property Damage and Crop Damage by Event Type in the NOAA Storm Database

Synopsis

Weather events can have important economic and human impacts on a community. Events such as tonados, hurricanes, droughts and floods can cause devastating losses of both economic capital and lives. Thus, preparedness for weather-related events is important for federal, state and municipal goverments. It is the purpose of this document to explore the storm database of the National Oceanogrphic and Atmospheric Administration (NOAA) in order to determine which types of weather events have had the largest, and smallest, impacts in terms of fatalities, injuries, property damage and crop damage.This analysis demonstrates that between 1950 and 2011, constrained by the events included in this data, that tornados have caused the most injuries and deaths, floods/surges have caused the most property damage, and droughts have caused the most crop damage.

Data Processing

The data were downloaded from the coursera web site here. The raw data file was then loaded into R using the command:

x<-read.csv("repdata-data-StormData.csv.bz2")

Preliminary examination of the data revealed that there were only a few of the 37 variables which would be of interest, namely FATALITIES, INJURIES, EVTYPE (event type) and indicators of economic damage, PROPDMG (property damage), PROPDMGEXP (property damage exponent), CROPDMG (crop damage) and CROPDMGEXP (crop damage exponent). Examination of these variables revealed several issues which needed to be addressed before meaningful analysis could be carried out.
With regard to the economic variables, both property damage and crop damage were broken up in to a coefficient and an exponent. The coefficient was fairly self-explanatory, but the exponent was not.

table(x$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

This code reveals that the exponent field is usually blank, or it may contain a number, a letter, or a special character. The numbers(n) were taken to mean 10^n. The letter were taken to mean 10^2 (h/H), 10^3 (K), 10^6 (m/M) or 10^9 (B). The entries with 7, 8 and B in the exponent field were looked at individually to determine if these high levels of economic damage made sense. This could be most easily determined by examining the REMARKS field.

x[,"REMARKS"]<-as.character(x[,"REMARKS"])
x[which(x$PROPDMGEXP=="8"),"REMARKS"]
## [1] "Several homes and several hundered vehicle were damaged. "

This remark would seem to justify the $100 million figure for damages.

The special characters, blanks and zeroes were all taken to mean 10^0=1. To evaluate if this made sense, the coefficients were totaled, considering that if there were only a small number of coefficients, then these odd exponents were unlikely to meaningfully impact the overall results.

tapply(x$PROPDMG,x$PROPDMGEXP,FUN=sum)
##                   -         ?         +         0         1         2 
## 5.274e+02 1.500e+01 0.000e+00 1.170e+02 7.108e+03 0.000e+00 1.200e+01 
##         3         4         5         6         7         8         B 
## 2.000e+01 1.450e+01 2.105e+02 6.500e+01 8.200e+01 0.000e+00 2.759e+02 
##         h         H         K         m         M 
## 2.000e+00 2.500e+01 1.074e+07 3.890e+01 1.407e+05

Note that this table reveals another issue. The coefficient for the one case with 8 for an exponent is 0, yet that storm did a lot of damage. So, the data was explored for other instances where a coefficient of zero was accompanied by a non-blank exponent.

#where property damage is zero but exponent is not
oddexp<-x[which(x$PROPDMG==0 & x$PROPDMGEXP!=""),]
table(oddexp$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
##      0      0      8      0      7     25     12      3      0     10 
##      6      7      8      B      h      H      K      m      M 
##      1      3      1      0      0      0 197184      0     11

There were over 197,000 instances. And once again, the REMARKS determined that these storms did, in fact, do considerable damage.

oddexp[which(oddexp$PROPDMGEXP==7),"REMARKS"]
## [1] "Over 40 building suffered structural damage to them.  A few trees were knocked down.  A man was injured in Liberty Hill as he sought shelter in his camp trailer.  Eight power poles were blown down. "
## [2] "Several houses had windows broken and numerous cars were damaged. "                                                                                                                                    
## [3] "A home was destroyed by a fire started by lightning. "

These same issues held true for the CROPDMGEXP variable. The final strategy for dealing with this was to impute a coefficient of 1 for those rows with a zero coefficient and a blank exponent, impute numerical exponents as described above, and create two new variables “TotalPD” (total property damage) and “TotalCD” (total crop damage).

x$PDExpNum<-0
x$PDExpNum[which(x$PROPDMGEXP=="B")]<-9
x$PDExpNum[which(x$PROPDMGEXP=="h" | x$PROPDMGEXP=="H")]<-2
x$PDExpNum[which(x$PROPDMGEXP=="K")]<-3
x$PDExpNum[which(x$PROPDMGEXP=="m" | x$PROPDMGEXP=="M")]<-6
x$PDExpNum[which(x$PROPDMGEXP=="" | x$PROPDMGEXP=="?" |x$PROPDMGEXP=="-" | 
             x$PROPDMGEXP=="+")]<-0
x$PDExpNum[which(x$PROPDMGEXP=="2")]<-2
x$PDExpNum[which(x$PROPDMGEXP=="3")]<-3
x$PDExpNum[which(x$PROPDMGEXP=="4")]<-4
x$PDExpNum[which(x$PROPDMGEXP=="5")]<-5
x$PDExpNum[which(x$PROPDMGEXP=="6")]<-6
x$PDExpNum[which(x$PROPDMGEXP=="7")]<-7
x$PDExpNum[which(x$PROPDMGEXP=="8")]<-8

#Same for crop damage
x$CDExpNum<-0
x$CDExpNum[which(x$CROPDMGEXP=="B")]<-9
x$CDExpNum[which(x$CROPDMGEXP=="h" | x$CROPDMGEXP=="H")]<-2
x$CDExpNum[which(x$CROPDMGEXP=="K")]<-3
x$CDExpNum[which(x$CROPDMGEXP=="m" | x$CROPDMGEXP=="M")]<-6
x$CDExpNum[which(x$CROPDMGEXP=="" | x$CROPDMGEXP=="?" |x$CROPDMGEXP=="-" | 
                   x$CROPDMGEXP=="+")]<-0
x$CDExpNum[which(x$CROPDMGEXP=="2")]<-2
x$CDExpNum[which(x$CROPDMGEXP=="3")]<-3
x$CDExpNum[which(x$CROPDMGEXP=="4")]<-4
x$CDExpNum[which(x$CROPDMGEXP=="5")]<-5
x$CDExpNum[which(x$CROPDMGEXP=="6")]<-6
x$CDExpNum[which(x$CROPDMGEXP=="7")]<-7
x$CDExpNum[which(x$CROPDMGEXP=="8")]<-8

#imput a 1 for the coefficient for those with an exp but a 0 coef
x$PDCoef<-x$PROPDMG
x$PDCoef[which(x$PROPDMG==0 & x$PROPDMGEXP!="")]<-1
x$CDCoef<-x$CROPDMG
x$CDCoef[which(x$CROPDMG==0 & x$CROPDMGEXP!="")]<-1



#Calculate total damage in whole numbers
x$TotalPD<-x$PDCoef * 10^(x$PDExpNum)
x$TotalCD<-x$CDCoef * 10^(x$CDExpNum)

The next issue was the EVTYPE variable. Initial evaluation of the most common event types demonstrated considerable redundancies, inconsistent cases, misspellings and inconsisent abbreviations. For example “Hurricane” and “Hurricane/Tyhpoon” existed as separate types, as did “heat”, “excessive heat”, etc. To resolve this several steps were taken. First, the EVTYPE data was converted to all lower case. Then, after examining the most commonly used types, a heirarchy was constructed where more specific types of events took precedence over more general types. For instance, “hurricane” is more specific than “wind”, so “hurricane wind” and “hurricane rain” are both considered “hurricane” instead of “wind” or “rain”. This heirachy was enacted by creating a new variable “evcat” (event category) and using sequential grep statements on the EVTYPE variable. This heirachy was iteratively refined be examining the uncategorized EVTYPEs and incorporating them into the heirarchy. The end result was a 25-category factor which included the “Other” type. The most numerous type of event in the “Other” category was monthly event summaries - which could not be categorized by a specific weather event.

x$EVTYPE<-tolower(x$EVTYPE)
x$evcat<-"Other"

x$evcat[grep("dust",x$EVTYPE)]<-"dust"
x$evcat[grep("fog",x$EVTYPE)]<-"fog"
x$evcat[grep("landslide|mudslide",x$EVTYPE)]<-"landslide"
x$evcat[grep("drought|dry",x$EVTYPE)]<-"drought/dry"
x$evcat[grep("fire",x$EVTYPE)]<-"fire"
x$evcat[grep("heat|hot|warm",x$EVTYPE)]<-"heat"
x$evcat[grep("cold",x$EVTYPE)]<-"cold"
x$evcat[grep("ice|freez|frost|glaze|icy ",x$EVTYPE)]<-"ice/freeze"
x$evcat[grep("rain",x$EVTYPE)]<-"rain"
x$evcat[grep("flood|surge|surf|tide|fld",x$EVTYPE)]<-"flood/surge"
x$evcat[grep("tsunami",x$EVTYPE)]<-"tsunami"
x$evcat[grep("rip",x$EVTYPE)]<-"rip currents"
x$evcat[grep("hail",x$EVTYPE)]<-"hail"
x$evcat[grep("wind",x$EVTYPE)]<-"wind"
x$evcat[grep("lightning",x$EVTYPE)]<-"lightning"
x$evcat[grep("thunderstorm|tstm",x$EVTYPE)]<-"thunderstorm"
x$evcat[grep("spout|funnel",x$EVTYPE)]<-"funnel cloud/waterspout"
x$evcat[grep("microburst",x$EVTYPE)]<-"microburst"
x$evcat[grep("snow|blizzard|winter|wintry|sleet",x$EVTYPE)]<-"snow/blizzard"
x$evcat[grep("tornado",x$EVTYPE)]<-"tornado"
x$evcat[grep("tropical",x$EVTYPE)]<-"tropical storm/depression"
x$evcat[grep("hurricane|typhoon",x$EVTYPE)]<-"hurricane/typhoon"
x$evcat[grep("volcan",x$EVTYPE)]<-"volcano"
x$evcat[grep("avalanch",x$EVTYPE)]<-"avalanche"

uncat<-x[x$evcat=="Other",]

x$evcat<-factor(x$evcat)

Results

The processing now complete, the totals of deaths, injuries, crop damage and property damage were computed.

finalDeaths<-data.frame(tapply(x$FATALITIES,x$evcat,FUN=sum,na.rm=TRUE))
names(finalDeaths)[1]<-"deaths"

finalInj<-data.frame(tapply(x$INJURIES,x$evcat,FUN=sum,na.rm=TRUE))
names(finalInj)[1]<-"injuries"

finalProp<-data.frame(tapply(x$TotalPD,x$evcat,FUN=sum,na.rm=TRUE))
names(finalProp)[1]<-"PropDamage"

finalCrop<-data.frame(tapply(x$TotalCD,x$evcat,FUN=sum,na.rm=TRUE))
names(finalCrop)[1]<-"CropDamage"

display.results<-cbind(finalDeaths,finalInj,prettyNum(finalProp,scientific=TRUE, digits=3),prettyNum(finalCrop,scientific=TRUE,digits=3))

display.results
##                           deaths injuries PropDamage CropDamage
## avalanche                    224      171   8.86e+06   1.54e+05
## cold                         215      280   1.24e+08   1.41e+09
## drought/dry                    0        4   1.05e+09   1.60e+10
## dust                          24      483   6.47e+06   3.35e+06
## fire                          90     1608   8.50e+09   4.05e+08
## flood/surge                 1735     8972   2.16e+11   1.23e+10
## fog                           80     1076   2.37e+07   9.21e+05
## funnel cloud/waterspout        3       32   1.32e+07   3.44e+06
## hail                          15     1371   1.62e+10   3.12e+09
## heat                        3178     9243   2.16e+07   9.06e+08
## hurricane/typhoon            135     1333   8.54e+10   5.52e+09
## ice/freeze                   114     2417   3.98e+09   7.03e+09
## landslide                     44       55   3.26e+08   2.03e+07
## lightning                    817     5231   9.52e+08   1.72e+07
## microburst                     3       28   7.30e+06   1.50e+04
## Other                         54       55   7.81e+06   1.48e+08
## rain                         107      303   3.23e+09   8.12e+08
## rip currents                 577      529   4.42e+05   2.77e+05
## snow/blizzard                551     3938   8.49e+09   3.18e+08
## thunderstorm                 729     9544   1.12e+10   1.39e+09
## tornado                     5661    91407   5.86e+10   4.40e+08
## tropical storm/depression     66      383   7.72e+09   6.95e+08
## tsunami                       33      129   1.44e+08   3.80e+04
## volcano                        0        0   5.03e+05   3.00e+03
## wind                         690     1936   6.22e+09   9.08e+08

The following 3 plots demonstrate the human and economic impact of the most harmful 10 categories of events.

results<-cbind(finalDeaths,finalInj,finalProp,finalCrop)
barplot(sort(results$deaths,decreasing=TRUE)[1:10], las=2, cex.names=0.7,
        main="Deaths by Event Category")

plot of chunk unnamed-chunk-10

Plot 1: Total number of deaths in each category - only the 10 deadliest categories are displayed.

barplot(sort(results$injuries,decreasing=TRUE)[1:10], las=2, cex.names=0.7,
        main="Injuries by Event Category")

plot of chunk unnamed-chunk-11

Plot 2: Total number of injuries in each category - only the 10 most injurious categories are displayed.

barplot(sort(results$PropDamage+results$CropDamage,decreasing=TRUE)[1:10],
        las=2, cex.names=0.7, main = "Crop + Property Damage by Event Category")

plot of chunk unnamed-chunk-12

Plot 3: Total damage (property plus crop) in each category, in US Dollars - only the 10 most damaging categories are displayed.