An Analysis of US Severe Weather Events 1950-2011 - Their impact on public health and the financial cost of damage caused.

Synopsis

This report examines a set of raw data sourced from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database covering over 900,000 events from 1950 to 2011. Analysis of this data shows the most frequently occurring type of event has been Thunderstorm Wind, accounting for more than a third of the records (36%). However, these incidents are less severe in terms of impact on public health and economic consequences. Tornado events have caused more injuries (91,346) and fatalities (5,633) than any other event. The most intense events in terms of impact on health have been those categorised as Excessive Heat with each incident averaging 3.56 injuries. It is the cold that kills though, as measured by the ratio of deaths to injuries with Cold/Wind Chill events resulting in 7.92 fatalities for every non-fatal injury recorded. The most damaging events from a financial perspective were Floods, costing an estimated $160.6bn (consisting of $149.9bn in damage to property and $10.7bn through crop loss). Although realtively rare, individual Storm Surge/Tide events were the most expensive with each averaging $117m in damages.

Data Processing

The source of the data used in the analysis is a link provided in the Coursera Reproducible Research Assignment Storm Data. It is stated that this is a version of the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. It has not been possible to independently validate this.

# necessary libraries
library(plyr)
library(ggplot2)
library(RColorBrewer)

# download data and read few rows
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)
data <- read.table(bzfile(temp, "repdata-data-StormData.csv"),sep=",",header=T,nrows=100)
names(data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

An early look at this data shows not all of the 37 variables are required. A filter is applied before loading the complete dataset. Only the following variables are included:

  • BGN_DATE date the event began

  • STATE US state in which the event occurred

  • EVTYPE event type

  • FATALITIES number of fatalities attributed to the event

  • INJURIES number of non-fatal injuries attributed to the event

  • PROPDMG estimated $ value of damages done to property

  • PROPDMGEXP unit in which PROPDMG is expressed (e.g. K=1000s, M=1000000s)

  • CROPDMG estimated $ value of damages done to crops

  • CROPDMGEXP unit in which CROPDMG is expressed (e.g. K=1000s, M=1000000s)

req_f <- c("NULL","character",rep("NULL",4),"factor","character",rep("NULL",14),
           rep("numeric",3),"character","numeric","character",rep("NULL",9))
df <- read.table(bzfile(temp, "repdata-data-StormData.csv"),sep=",",colClasses=req_f,header=T)
unlink(temp)
df$BGN_DATE <- as.POSIXct(strptime(df$BGN_DATE,"%m/%d/%Y %H:%M:%S"))
df$yr <- as.factor(format(df$BGN_DATE,"%Y"))

Tidying up the Event Type

The data for Event Type is very “raw” with over 800 unique values, Many of these are minor variations (e.g. spelling mistakes, spacing, capitalization). To simplify the analysis, all spaces and the / character have been removed and everything has been capitalized. A bespoke table has been created (igr_evtype.csv) to map the most popular and obvious Event Types to the list given in the National Weather Service documentation. This mapping is reproduced at the end of this report in Appendix A.

df$EVTYPE <- toupper(df$EVTYPE)
df$EVTYPE <- gsub("/","",df$EVTYPE)
df$EVTYPE <- gsub(" ","",df$EVTYPE)
new_type_labels <- read.csv("igr_evtype.csv",header=T)
df <- join(df,new_type_labels,type="left")
## Joining by: EVTYPE

Tidying up the Economic Data

The columns (PROPDMGEXP and CROPDMGEXP) representing the scale of the estimated costs of the damage done by the events contain a relatively small number of unexplained values. Again after capitalizing everything, the following rules have been used:

  • 0 or blank -> U-> x1

  • H -> x100

  • K -> x1000

  • M -> x1000000

  • B -> x1000000000

Anything else is treated as NA,

units <- c("U","H","K","M","B")
df$PROPDMGEXP <- toupper(df$PROPDMGEXP)
df$PROPDMGEXP <- ifelse(df$PROPDMGEXP=="0"|df$PROPDMGEXP=="","U",df$PROPDMGEXP)
df$PROPDMGEXP <- ifelse(!(df$PROPDMGEXP %in% units),"NA",df$PROPDMGEXP)
df$CROPDMGEXP <- toupper(df$CROPDMGEXP)
df$CROPDMGEXP <- ifelse(df$CROPDMGEXP=="0"|df$CROPDMGEXP=="","U",df$CROPDMGEXP)
df$CROPDMGEXP <- ifelse(!(df$CROPDMGEXP %in% units),"NA",df$CROPDMGEXP)
unit_values <- c(1,100,1000,1000000,1000000000)
df$prop_damages <- df$PROPDMG * unit_values[match(df$PROPDMGEXP,units)]
df$crop_damages <- df$CROPDMG * unit_values[match(df$CROPDMGEXP,units)]

Results

Fatalities and Injuries

grpd_health <- ddply(df,"NWS_Category", function(x) c(incidents=nrow(x),
                            fatalities= sum(x$FATALITIES),injuries=sum(x$INJURIES)))
grpd_health$injperinc <- grpd_health$injuries / grpd_health$incidents
grpd_health$deathratio <- grpd_health$fatalities / grpd_health$injuries

Event Categories - Top 5 by Number of Injuries

head(grpd_health[order(-grpd_health$injuries),],5)
##          NWS_Category incidents fatalities injuries injperinc deathratio
## 34           Tornado      60652       5633    91346   1.50607    0.06167
## 33 Thunderstorm Wind     324390        706     9458   0.02916    0.07465
## 14             Flood      25871        478     6793   0.26257    0.07037
## 11    Excessive Heat       1832       1903     6525   3.56168    0.29165
## 26         Lightning      15755        816     5230   0.33196    0.15602

The most frequent category explains 35.9516% of the total incidents

Event Categories - Top 5 by Injuries per Incident

head(grpd_health[order(-grpd_health$injperinc),],5)
##       NWS_Category incidents fatalities injuries injperinc deathratio
## 11 Excessive Heat       1832       1903     6525    3.5617    0.29165
## 18           Heat        893        948     2117    2.3707    0.44780
## 34        Tornado      60652       5633    91346    1.5061    0.06167
## 10     Dust Storm        428         22      440    1.0280    0.05000
## 24      Ice Storm       2006         89     1975    0.9845    0.04506

Event Categories - Top 5 by Fatality:Injury ratio

head(grpd_health[order(-grpd_health$deathratio),],5)
##                NWS_Category incidents fatalities injuries injperinc
## 5          Cold/Wind Chill        539         95       12  0.022263
## 4            Coastal Flood        841          6        2  0.002378
## 23     Hurricane (Typhoon)        174         61       46  0.264368
## 2                Avalanche        386        224      170  0.440415
## 12 Extreme Cold/Wind Chill       1869        304      260  0.139112
##    deathratio
## 5       7.917
## 4       3.000
## 23      1.326
## 2       1.318
## 12      1.169

Chart showing severity of events

This chart plots each category

cutpoints <- quantile(grpd_health$deathratio, seq(0, 1, length = 6), na.rm = TRUE)
grpd_health$quintiles <- cut(grpd_health$deathratio, cutpoints)
gh <- ggplot(grpd_health,aes(log(injuries),injperinc))
gh <- gh + geom_point(aes(color=quintiles))
gh <- gh + scale_color_manual("Fatality:Injury ratio",values=brewer.pal(5, "Set2"))
gh <- gh + xlab("log(Total Number of Injuries)")
gh <- gh + ylab("Average Injuries per Incident")
gh <- gh + ggtitle("US Severe Weather Event Categories 1950-2011")
print(gh)

plot of chunk unnamed-chunk-9

Economic Consequences

grpd_cost <- ddply(df[complete.cases(df),],"NWS_Category", function(x) c(incidents=nrow(x),prop_damage= sum(x$prop_damages),crop_damage=sum(x$crop_damages)))
grpd_cost$total_damage <- grpd_cost$prop_damage + grpd_cost$crop_damage
grpd_cost$dmgperinc <- grpd_cost$total_damage / grpd_cost$incidents

Event Categories - Top 5 by Total Cost of Damage Caused

head(grpd_cost[order(-grpd_cost$total_damage),],5)
##         NWS_Category incidents prop_damage crop_damage total_damage
## 14            Flood      25868   1.499e+11   1.070e+10    1.606e+11
## 36      Unclassified      8684   8.106e+10   4.263e+09    8.532e+10
## 34          Tornado      60641   5.694e+10   4.150e+08    5.735e+10
## 31 Storm Surge/Tide        409   4.796e+10   8.550e+05    4.797e+10
## 17             Hail     288641   1.573e+10   3.001e+09    1.873e+10
##    dmgperinc
## 14   6208613
## 36   9824707
## 34    945765
## 31 117275254
## 17     64901

Event Categories - Top 5 by Cost of Damage Caused to Property

head(grpd_cost[order(-grpd_cost$prop_damage),],5)
##         NWS_Category incidents prop_damage crop_damage total_damage
## 14            Flood      25868   1.499e+11   1.070e+10    1.606e+11
## 36      Unclassified      8684   8.106e+10   4.263e+09    8.532e+10
## 34          Tornado      60641   5.694e+10   4.150e+08    5.735e+10
## 31 Storm Surge/Tide        409   4.796e+10   8.550e+05    4.797e+10
## 13      Flash Flood      55577   1.662e+10   1.531e+09    1.815e+10
##    dmgperinc
## 14   6208613
## 36   9824707
## 34    945765
## 31 117275254
## 13    326634

Event Categories - Top 5 by Cost of Damage Caused to Crops

head(grpd_cost[order(-grpd_cost$crop_damage),],5)
##    NWS_Category incidents prop_damage crop_damage total_damage dmgperinc
## 8      Drought       2488   1.046e+09   1.397e+10    1.502e+10   6036444
## 14       Flood      25868   1.499e+11   1.070e+10    1.606e+11   6208613
## 24   Ice Storm       2006   3.945e+09   5.022e+09    8.967e+09   4470110
## 36 Unclassified      8684   8.106e+10   4.263e+09    8.532e+10   9824707
## 17        Hail     288641   1.573e+10   3.001e+09    1.873e+10     64901

Event Categories - Top 5 by Total Cost of Damage per Incident

head(grpd_cost[order(-grpd_cost$dmgperinc),],5)
##            NWS_Category incidents prop_damage crop_damage total_damage
## 31    Storm Surge/Tide        409   4.796e+10   8.550e+05    4.797e+10
## 23 Hurricane (Typhoon)        174   1.187e+10   2.742e+09    1.461e+10
## 35      Tropical Storm        690   7.704e+09   6.783e+08    8.382e+09
## 36         Unclassified      8684   8.106e+10   4.263e+09    8.532e+10
## 14               Flood      25868   1.499e+11   1.070e+10    1.606e+11
##    dmgperinc
## 31 117275254
## 23  83966833
## 35  12148169
## 36   9824707
## 14   6208613

Appendix A

Mapping of Event Type entries in the source data to the categories in this analysis

After capitalizing and removing spaces and the / character…

new_type_labels[new_type_labels$NWS_Category!="Unclassified",]
##                     EVTYPE              NWS_Category
## 9      ASTRONOMICALLOWTIDE    Astronomical Low Tide 
## 11               AVALANCHE                Avalanche 
## 20                BLIZZARD                 Blizzard 
## 40            COASTALFLOOD            Coastal Flood 
## 41         COASTALFLOODING            Coastal Flood 
## 57           COLDWINDCHILL          Cold/Wind Chill 
## 67                DENSEFOG                Dense Fog 
## 73                 DROUGHT                  Drought 
## 92               DUSTDEVIL               Dust Devil 
## 94               DUSTSTORM               Dust Storm 
## 104          EXCESSIVEHEAT           Excessive Heat 
## 113            EXTREMECOLD  Extreme Cold/Wind Chill 
## 114   EXTREMECOLDWINDCHILL  Extreme Cold/Wind Chill 
## 118       EXTREMEWINDCHILL  Extreme Cold/Wind Chill 
## 125             FLASHFLOOD              Flash Flood 
## 130          FLASHFLOODING              Flash Flood 
## 139                  FLOOD                    Flood 
## 142        FLOODFLASHFLOOD              Flash Flood 
## 145               FLOODING                    Flood 
## 153                    FOG                Dense Fog 
## 170            FROSTFREEZE             Frost/Freeze 
## 172            FUNNELCLOUD             Funnel Cloud 
## 194                   HAIL                     Hail 
## 224                   HEAT                     Heat 
## 234              HEAVYRAIN               Heavy Rain 
## 254              HEAVYSNOW               Heavy Snow 
## 283      HEAVYSURFHIGHSURF                High Surf 
## 288               HIGHSURF                High Surf 
## 297               HIGHWIND                High Wind 
## 310              HIGHWINDS                High Wind 
## 337              HURRICANE      Hurricane (Typhoon) 
## 362               ICESTORM                Ice Storm 
## 370        LAKE-EFFECTSNOW         Lake-Effect Snow 
## 373              LANDSLIDE              Debris Flow 
## 387              LIGHTNING                Lightning 
## 413             MARINEHAIL              Marine Hail 
## 414         MARINEHIGHWIND         Marine High Wind 
## 417 MARINETHUNDERSTORMWIND Marine Thunderstorm Wind 
## 418         MARINETSTMWIND Marine Thunderstorm Wind 
## 490           RECORDWARMTH           Excessive Heat 
## 495             RIPCURRENT              Rip Current 
## 496            RIPCURRENTS              Rip Current 
## 499             RIVERFLOOD                    Flood 
## 532                   SNOW               Heavy Snow 
## 565             STORMSURGE         Storm Surge/Tide 
## 566         STORMSURGETIDE         Storm Surge/Tide 
## 570             STRONGWIND              Strong Wind 
## 572            STRONGWINDS              Strong Wind 
## 654       THUNDERSTORMWIND        Thunderstorm Wind 
## 676      THUNDERSTORMWINDS        Thunderstorm Wind 
## 713                TORNADO                  Tornado 
## 727          TROPICALSTORM           Tropical Storm 
## 735               TSTMWIND        Thunderstorm Wind 
## 751           TSTMWINDHAIL        Thunderstorm Wind 
## 763       UNSEASONABLYWARM                     Heat 
## 780             URBANFLOOD                    Flood 
## 805             WATERSPOUT               Waterspout 
## 819               WILDFIRE                 Wildfire 
## 821         WILDFORESTFIRE                 Wildfire 
## 823                   WIND              Strong Wind 
## 834            WINTERSTORM             Winter Storm 
## 838          WINTERWEATHER           Winter Weather 
## 839       WINTERWEATHERMIX           Winter Weather

All other entries were bracketed as Unclassifed