The National Weather Service collects a variety of narrative and quantitative data about significant weather events in its Storm Data product. This has been published for some event types since 1950. The NWS records both fatalities caused by the event as well as provides an estimate of property damage.
From 1993 to 2011 in the United States, the most harmful recorded weather events to population health were heat, tornadoes, floods, and lightning. In that time period, the most economically damaging event types are floods. Beyond these storm surge or tide and tornadoes caused large amounts of damage. Hurricanes or tropical storms, hail, and various wind events are also very damaging.
The United States National Weather Service (NWS) publishes the Storm Data publication and database. It is a record of severe weather and natural hazard events with measurements and narratives. The Coursera web site provides this data from 1950 to 2011. The first step of data processing is to unzip and read in the data. Given the summary nature of the questions at hand, only selected fields are carefully cleaned for further analysis. The documentation of this dataset describes the use of the property damage exponent (PROPDMGEXP), which has important ramifications for the interpretation of property damage estimates.
storm.data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"),colClasses="character")
nrow(storm.data)
## [1] 902297
storm.data$BeginDate <- as.POSIXct(storm.data$BGN_DATE, format="%m/%d/%Y %H:%M:%S")
storm.data$STATE <- as.factor(storm.data$STATE)
storm.data$FATALITIES <- as.numeric(storm.data$FATALITIES)
storm.data$INJURIES <- as.numeric(storm.data$INJURIES)
storm.data$PROPDMG <- as.numeric(storm.data$PROPDMG)
storm.data$PROPDMGEXP <- as.factor(storm.data$PROPDMGEXP)
exponents <- data.frame(exp=c("B","h","H","K","m","M"),
mult=c(1e9,1e2,1e2,1e3,1e6,1e6))
storm.data <- merge(storm.data,exponents,by.x="PROPDMGEXP", by.y="exp",all.x=T)
storm.data$PropertyDamage <- storm.data$PROPDMG * storm.data$mult
storm.data$EVTYPE <- as.factor(storm.data$EVTYPE)
head(levels(storm.data$EVTYPE),20)
## [1] " HIGH SURF ADVISORY" " COASTAL FLOOD"
## [3] " FLASH FLOOD" " LIGHTNING"
## [5] " TSTM WIND" " TSTM WIND (G45)"
## [7] " WATERSPOUT" " WIND"
## [9] "?" "ABNORMAL WARMTH"
## [11] "ABNORMALLY DRY" "ABNORMALLY WET"
## [13] "ACCUMULATED SNOWFALL" "AGRICULTURAL FREEZE"
## [15] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [17] "ASTRONOMICAL LOW TIDE" "AVALANCE"
## [19] "AVALANCHE" "BEACH EROSIN"
The next area of particular concern is the event type coding (EVTYPE), as the goal is to compare the consequences among event types. The selected strategy is to favor the exact event coding given in NWS Instruction 10-1605 and rectify the most common EVTYPE values to one of those. There are 48 event types specified in the documentation. There are 985 unique entries in the EVTYPE variable. As the sample above displays, there are misspellings, nonstandard entries, placeholders, excess information and other issues. The next code block shows that there are about 70 EVTYPE values with over 100 events recorded. These collectively cover well over 99% of the events. The code.EVTYPE data frame maps each of these most common EVTYPE values to a single NWSI value. It also contains each NSWI value even if that does not have over 100 events. Once this coding decision is implemented, the data is ready for the analysis questions at hand.
ett <- table(storm.data$EVTYPE)
sum(ett[ett>100]/nrow(storm.data))
## [1] 0.9947756
sum(ett>100)
## [1] 69
NWSI.event <- c("Astronomical Low Tide","Avalanche","Blizzard","Coastal Flood","Cold/Wind Chill","Debris Flow","Dense Fog","Dense Smoke","Drought","Dust Devil","Dust Storm","Excessive Heat","Extreme Cold/Wind Chill","Flash Flood","Flood","Frost/Freeze","Funnel Cloud","Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow","High Surf","High Wind","Hurricane (Typhoon)","Ice Storm","Lake-Effect Snow","Lakeshore Flood","Lightning","Marine Hail","Marine High Wind","Marine Strong Wind","Marine Thunderstorm Wind","Rip Current","Seiche","Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind","Tornado","Tropical Depression","Tropical Storm","Tsunami","Volcanic Ash","Waterspout","Wildfire","Winter Storm","Winter Weather" )
NWSI.event <- toupper(NWSI.event)
code.EVTYPE <- data.frame(EVTYPE = c("ASTRONOMICAL HIGH TIDE", "ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD", "COASTAL FLOODING", "COLD/WIND CHILL", "LANDSLIDE", "DEBRIS FLOW", "DENSE FOG", "FOG", "DENSE SMOKE", "DROUGHT", "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "EXTREME COLD", "EXTREME COLD/WIND CHILL", "EXTREME WINDCHILL", "FLASH FLOOD", "FLASH FLOODING", "FLOOD", "FLOOD/FLASH FLOOD", "FLOODING", "RIVER FLOOD", "URBAN FLOOD", "URBAN/SML STREAM FLD", "FREEZING FOG", "FROST/FREEZE", "FUNNEL CLOUD", "HAIL", "HEAT", "RECORD WARMTH", "UNSEASONABLY WARM", "HEAVY RAIN", "HEAVY SNOW", "LIGHT SNOW", "MODERATE SNOWFALL", "SNOW", "HEAVY SURF/HIGH SURF", "HIGH SURF", "HIGH WIND", "HIGH WINDS", "WIND", "HURRICANE", "HURRICANE (TYPHOON)", "FREEZING RAIN", "ICE STORM", "LAKE-EFFECT SNOW", "LAKESHORE FLOOD", "LIGHTNING", "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND", "MARINE TSTM WIND", "RIP CURRENT", "RIP CURRENTS", "SEICHE", "SLEET", "STORM SURGE", "STORM SURGE/TIDE", "STRONG WIND", "STRONG WINDS", "DRY MICROBURST", "THUNDERSTORM WIND", "THUNDERSTORM WINDS", "TSTM WIND", "TSTM WIND/HAIL", "TORNADO", "TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI", "VOLCANIC ASH", "WATERSPOUT", "WILD/FOREST FIRE", "WILDFIRE", "WINTER STORM", "WINTER WEATHER", "WINTER WEATHER/MIX"),
NWSI.EV = c("ASTRONOMICAL LOW TIDE", "ASTRONOMICAL LOW TIDE", "AVALANCHE", "BLIZZARD", "COASTAL FLOOD", "COASTAL FLOOD", "COLD/WIND CHILL", "DEBRIS FLOW", "DEBRIS FLOW", "DENSE FOG", "DENSE FOG", "DENSE SMOKE", "DROUGHT", "DUST DEVIL", "DUST STORM", "EXCESSIVE HEAT", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", "EXTREME COLD/WIND CHILL", "FLASH FLOOD", "FLASH FLOOD", "FLOOD", "FLOOD", "FLOOD", "FLOOD", "FLOOD", "FLOOD", "FREEZING FOG", "FROST/FREEZE", "FUNNEL CLOUD", "HAIL", "HEAT", "HEAT", "HEAT", "HEAVY RAIN", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HEAVY SNOW", "HIGH SURF", "HIGH SURF", "HIGH WIND", "HIGH WIND", "HIGH WIND", "HURRICANE (TYPHOON)", "HURRICANE (TYPHOON)", "ICE STORM", "ICE STORM", "LAKE-EFFECT SNOW", "LAKESHORE FLOOD", "LIGHTNING", "MARINE HAIL", "MARINE HIGH WIND", "MARINE STRONG WIND", "MARINE THUNDERSTORM WIND", "MARINE THUNDERSTORM WIND", "RIP CURRENT", "RIP CURRENT", "SEICHE", "SLEET", "STORM SURGE/TIDE", "STORM SURGE/TIDE", "STRONG WIND", "STRONG WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "THUNDERSTORM WIND", "TORNADO", "TROPICAL DEPRESSION", "TROPICAL STORM", "TSUNAMI", "VOLCANIC ASH", "WATERSPOUT", "WILDFIRE", "WILDFIRE", "WINTER STORM", "WINTER WEATHER", "WINTER WEATHER"), stringsAsFactors = FALSE)
code.EVTYPE
## EVTYPE NWSI.EV
## 1 ASTRONOMICAL HIGH TIDE ASTRONOMICAL LOW TIDE
## 2 ASTRONOMICAL LOW TIDE ASTRONOMICAL LOW TIDE
## 3 AVALANCHE AVALANCHE
## 4 BLIZZARD BLIZZARD
## 5 COASTAL FLOOD COASTAL FLOOD
## 6 COASTAL FLOODING COASTAL FLOOD
## 7 COLD/WIND CHILL COLD/WIND CHILL
## 8 LANDSLIDE DEBRIS FLOW
## 9 DEBRIS FLOW DEBRIS FLOW
## 10 DENSE FOG DENSE FOG
## 11 FOG DENSE FOG
## 12 DENSE SMOKE DENSE SMOKE
## 13 DROUGHT DROUGHT
## 14 DUST DEVIL DUST DEVIL
## 15 DUST STORM DUST STORM
## 16 EXCESSIVE HEAT EXCESSIVE HEAT
## 17 EXTREME COLD EXTREME COLD/WIND CHILL
## 18 EXTREME COLD/WIND CHILL EXTREME COLD/WIND CHILL
## 19 EXTREME WINDCHILL EXTREME COLD/WIND CHILL
## 20 FLASH FLOOD FLASH FLOOD
## 21 FLASH FLOODING FLASH FLOOD
## 22 FLOOD FLOOD
## 23 FLOOD/FLASH FLOOD FLOOD
## 24 FLOODING FLOOD
## 25 RIVER FLOOD FLOOD
## 26 URBAN FLOOD FLOOD
## 27 URBAN/SML STREAM FLD FLOOD
## 28 FREEZING FOG FREEZING FOG
## 29 FROST/FREEZE FROST/FREEZE
## 30 FUNNEL CLOUD FUNNEL CLOUD
## 31 HAIL HAIL
## 32 HEAT HEAT
## 33 RECORD WARMTH HEAT
## 34 UNSEASONABLY WARM HEAT
## 35 HEAVY RAIN HEAVY RAIN
## 36 HEAVY SNOW HEAVY SNOW
## 37 LIGHT SNOW HEAVY SNOW
## 38 MODERATE SNOWFALL HEAVY SNOW
## 39 SNOW HEAVY SNOW
## 40 HEAVY SURF/HIGH SURF HIGH SURF
## 41 HIGH SURF HIGH SURF
## 42 HIGH WIND HIGH WIND
## 43 HIGH WINDS HIGH WIND
## 44 WIND HIGH WIND
## 45 HURRICANE HURRICANE (TYPHOON)
## 46 HURRICANE (TYPHOON) HURRICANE (TYPHOON)
## 47 FREEZING RAIN ICE STORM
## 48 ICE STORM ICE STORM
## 49 LAKE-EFFECT SNOW LAKE-EFFECT SNOW
## 50 LAKESHORE FLOOD LAKESHORE FLOOD
## 51 LIGHTNING LIGHTNING
## 52 MARINE HAIL MARINE HAIL
## 53 MARINE HIGH WIND MARINE HIGH WIND
## 54 MARINE STRONG WIND MARINE STRONG WIND
## 55 MARINE THUNDERSTORM WIND MARINE THUNDERSTORM WIND
## 56 MARINE TSTM WIND MARINE THUNDERSTORM WIND
## 57 RIP CURRENT RIP CURRENT
## 58 RIP CURRENTS RIP CURRENT
## 59 SEICHE SEICHE
## 60 SLEET SLEET
## 61 STORM SURGE STORM SURGE/TIDE
## 62 STORM SURGE/TIDE STORM SURGE/TIDE
## 63 STRONG WIND STRONG WIND
## 64 STRONG WINDS STRONG WIND
## 65 DRY MICROBURST THUNDERSTORM WIND
## 66 THUNDERSTORM WIND THUNDERSTORM WIND
## 67 THUNDERSTORM WINDS THUNDERSTORM WIND
## 68 TSTM WIND THUNDERSTORM WIND
## 69 TSTM WIND/HAIL THUNDERSTORM WIND
## 70 TORNADO TORNADO
## 71 TROPICAL DEPRESSION TROPICAL DEPRESSION
## 72 TROPICAL STORM TROPICAL STORM
## 73 TSUNAMI TSUNAMI
## 74 VOLCANIC ASH VOLCANIC ASH
## 75 WATERSPOUT WATERSPOUT
## 76 WILD/FOREST FIRE WILDFIRE
## 77 WILDFIRE WILDFIRE
## 78 WINTER STORM WINTER STORM
## 79 WINTER WEATHER WINTER WEATHER
## 80 WINTER WEATHER/MIX WINTER WEATHER
storm.data$EVTYPE.char <- as.character(storm.data$EVTYPE)
storm.data$EVTYPE.char <- toupper(storm.data$EVTYPE.char)
storm.data <- merge(storm.data, code.EVTYPE, by.x = "EVTYPE.char", by.y="EVTYPE", all.x=T)
storm.data$EVTYPE.char <- NULL
storm.data$NWSI.EV <- as.factor(storm.data$NWSI.EV)
This event type coding covers a considerable proportion, 0.9953463, of the events in the data set.
Results are described in terms of two separate measures of consequence.
The chosen measure of public health outcomes is the total of direct fatalities over the period. As stated in section 2.6 of the National Weather Service Instruction 10-1605: “The determination of direct versus indirect causes of weather-related fatalities or injuries is one of the most difficult aspects of Storm Data preparation.” This is interpreted to mean that direct fatalities should overall be a more reliable than indirect, as these casualties are more closely linked to the storm event. Also consider that reporting of injuries may be less reliable than reporting of fatalities, which is a standard public health record. Further, comparison of injuries is dubious. An injury may result in a wide range of outcomes from temporary pain, costly medical intervention, or permanent disability.
Tornado seems to dominate the fatalities over the whole time period, because it has been collected the longest. Starting in 1993, the set of event types recorded in Storm Data expanded significantly. Observe the plot of direct fatalities over time for the various event types. For this reason the comparisons made here are from 1993 onward. The most harmful events to public health, as measured by total direct fatalities over the period, are summarized in the table below.
library(reshape2)
library(ggplot2)
library(plyr)
melted.storm <- melt(storm.data, measure.vars=c("FATALITIES","PropertyDamage"))
event.summary <- dcast(melted.storm, as.POSIXlt(BeginDate)$year + NWSI.EV ~ variable , sum, na.rm=T)
names(event.summary)[1] <- "year"
event.summary$year <- 1900 + event.summary$year
p <- ggplot(event.summary, aes(year,FATALITIES, group=NWSI.EV))
p + geom_line(aes(col=NWSI.EV),) + guides(color=FALSE)
Figure 1. Fatalities over time by type
startDate <- as.POSIXct("1993-01-01",format="%Y-%m-%d")
fatalities <- dcast(melted.storm, NWSI.EV ~ variable, sum, subset=.(BeginDate>=startDate), na.rm=T)
head(fatalities[order(fatalities$FATALITIES,decreasing=T),1:2],15)
## NWSI.EV FATALITIES
## 12 EXCESSIVE HEAT 1903
## 40 TORNADO 1621
## 14 FLASH FLOOD 997
## 20 HEAT 948
## 29 LIGHTNING 816
## 49 <NA> 698
## 34 RIP CURRENT 572
## 15 FLOOD 523
## 39 THUNDERSTORM WIND 446
## 24 HIGH WIND 306
## 13 EXTREME COLD/WIND CHILL 304
## 2 AVALANCHE 224
## 47 WINTER STORM 206
## 23 HIGH SURF 146
## 22 HEAVY SNOW 133
Note that Heat and Excessive Heat are both in the top five types of events for fatalities. This is reasonable because heat is dangerous for vulnerable populations. Thunderstorm Wind and High Wind would be just behind Lightning if combined. The nearly 700 fatalities caused by events with an incorrect coding (<NA>) demonstrate that more work could be done to resolve the event types provided.
Property damage estimates vary by method of preparation, preparer and data sources. This is clearly described in the National Weather Service Instruction. These estimates are characterized by the NWS as a best guess. However these values will be used as the best available measure of economic consequence. The table below shows the average annual property damage by event type in millions of dollars.
loss <- dcast(melted.storm, NWSI.EV ~ variable, sum, subset=.(BeginDate>=startDate), na.rm=T)
loss$FATALITIES<-NULL
loss$PropertyDamage <- loss$PropertyDamage / 19 / 1e6
head(loss[order(loss$PropertyDamage,decreasing=T),],15)
## NWSI.EV PropertyDamage
## 15 FLOOD 7901.87089
## 49 <NA> 4253.74864
## 37 STORM SURGE/TIDE 2524.45916
## 40 TORNADO 1386.26115
## 14 FLASH FLOOD 865.71448
## 19 HAIL 828.01406
## 25 HURRICANE (TYPHOON) 624.64837
## 39 THUNDERSTORM WIND 513.42645
## 46 WILDFIRE 408.78650
## 42 TROPICAL STORM 405.46792
## 47 WINTER STORM 352.02617
## 24 HIGH WIND 309.84550
## 26 ICE STORM 208.05654
## 9 DROUGHT 55.05821
## 22 HEAVY SNOW 49.99630
Missing event types are a bigger issue for this measure than for fatality data.
Flood and flash floods seem to be the most significant hazard to property. Of interest, though Storm Data describes some seismic events, it does not have an event type for earthquakes.
Storm surge and tide are also a great concern. The combination of hurricane and tropical storm damage results in an interesting group of hazards which would be in the top five. Consider that storm surge may sometimes be related to hurricane landing events, and there is perhaps cause to more deeply investigate the coding of these events.
Tornadoes place fairly high on both of these lists. They are sudden onset and violent events, making it difficult to mitigate their impacts on either life or property.
This open source analysis was prepared by @vpipkt. Contribute on github.