Over the 19 years from 1993-2011, National Weather Service records show that the top five most harmful categories of weather event types–to both population health and the economy–included floods, tornadoes and winds. However, the other two top- five categories differed. Heat and lightning were also among the top five most harmful categories for population health while hail and hurricanes were for the economy. Although there was a tendency for some categories most harmful to health also to be among the most harmful to the economy, others like heat, lightning, hurricanes and hail were most harmful only to one or the other, not both.
We start with the compressed file repdata_data_StormData.csv.bz2 from the National Weather Service Storm Database containing records from 124 Forecast Offices, 0.523 gigabits read in directly since this format need not be unzipped first.
library(readr)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggthemes)
(options = scipen = 9)
stormData <- read_csv("./repdata_data_StormData.csv.bz2", na = "?")
From the 37 avalable fields we select the seven covering date (BGN_TDATE), weather event type (EVTYPE), fatalities, injuries, dollar amount of damage to property and crops (PROPDMG, CROPDMG), as well the parallel “exponent” fields (PROPDMGEXP, CROPDMGEXP). Then we extract the variable Year from the date field and start reducing the large number of incomplete and inconsistent records.
While records of weather events in all 61 years from 1950-2011 include data for fatalities, injuries and damage to property, only the 19 most recent years from 1993-2011 include damage to crops, as well as to property. In order to capture the most complete measure of damage to the economy, this analysis focuses on those 19 years from 1993-2011, which turn out to include the vast majority of records: 714,738 of the original 902,297.
stormDamage <- select(stormData, Year = BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)
stormDamage$Year <- year(as.Date(stormDamage$Year, "%m/%d/%Y"))
stormDamage <- filter(stormDamage, Year > 1992)
For those 714,738 records from 1993-2011, we transform into factors the event type field, as well as the “exponent” fields indicating whether property and crop damage figures are recorded in thousands, millions or billions of dollars.
stormDamage$EVTYPE <- as.factor(str_to_lower(str_trim(stormDamage$EVTYPE, side = "both")))
stormDamage$PROPDMGEXP <- as.factor(str_to_upper(stormDamage$PROPDMGEXP))
stormDamage$CROPDMGEXP <- as.factor(str_to_upper(stormDamage$CROPDMGEXP))
Unfortunately, some of those records from 1993-2011 with complete economic damage figures do not consistently record the unit size. Since the distinction between thousands, millions or billions is crucial to measuring economic damage, this study further focuses only on those complete records that report either a fatality, an injury, or a property or crop damage figure including its “exponent” as K, M or B in PROPDMGEXP or CROPDMGEXP field.
Although all records of fatalities and injuries were retained, 327 records from 1994-1995 that reported damages but not with the offical exponent letter codes (K, M & B) were dropped as unreliable. So, in the results (below) damage totals for those two years may be slightly understated.
stormDamage <- filter(stormDamage, FATALITIES > 0 | INJURIES > 0 | PROPDMGEXP ==
c("B", "K", "M") | CROPDMGEXP == c("B", "K", "M"))
stormDamage[7542, ]$CROPDMGEXP <- 0
stormDamage[9143, ]$CROPDMGEXP <- 0
For the remaining 148,081 complete and clean records reporting fatalities, injuries or damage, we use the “exponent” codes to normalize he twin damage figures in new adjPropDmg and adjCropDMG fields, which add up to the new adjTotalDmg field.
stormDamage <- mutate(stormDamage, adjPropDmg = if_else(PROPDMGEXP == "B", PROPDMG,
if_else(PROPDMGEXP == "M", PROPDMG * 0.001, if_else(PROPDMGEXP == "K", PROPDMG *
1e-06, 0))))
stormDamage <- mutate(stormDamage, adjCropDmg = if_else(CROPDMGEXP == "B", CROPDMG,
if_else(CROPDMGEXP == "M", CROPDMG * 0.001, if_else(CROPDMGEXP == "K", CROPDMG *
1e-06, 0))))
stormDamage <- mutate(stormDamage, adjTotalDmg = adjPropDmg + adjCropDmg)
After adding fatalities and injuries together in the new Casualties field,we spot check annual totals for Casualties and adjTotalDmg, which vary plausibly.
annualDamage <- summarize(group_by(stormDamage, Year), Casualties = sum(FATALITIES) +
sum(INJURIES), Damage = round(sum(adjTotalDmg), 1))
annualDamage
## # A tibble: 19 × 3
## Year Casualties Damage
## <int> <dbl> <dbl>
## 1 1993 2447 10.3
## 2 1994 4505 7.8
## 3 1995 5971 5.1
## 4 1996 3259 4.8
## 5 1997 4401 4.6
## 6 1998 11864 7.6
## 7 1999 6056 5.2
## 8 2000 3280 3.8
## 9 2001 3190 7.9
## 10 2002 3653 2.9
## 11 2003 3374 7.9
## 12 2004 2796 20.8
## 13 2005 2303 40.2
## 14 2006 3967 5.2
## 15 2007 2612 5.9
## 16 2008 3191 13.3
## 17 2009 1687 3.9
## 18 2010 2280 7.5
## 19 2011 8794 15.3
Since, even after converting the original event type field to all lower case above, there are an unwieldy and overlapping 889 event types, we spot check out top 10 most harmful, to health and the economy, by Casualties and by Damage, which suggests the need to group together overlapping and related event types into a few broader categories. Clear candidates for combining include “excessive heat” and “heat,” “flood” and “flashflood,” “tstm wind” and “thunderstorm wind” in the Casualties table, for example, as well as “hurricane/typhoon” and “hurricane” and “tropical storm” in the Damage table.
eventDamage <- summarize(group_by(stormDamage, EVTYPE), Casualties = sum(FATALITIES) +
sum(INJURIES), Damage = round(sum(adjTotalDmg), 1))
eventDamage[order(eventDamage$Casualties, decreasing = TRUE), c(1, 2)]
## # A tibble: 326 × 2
## EVTYPE Casualties
## <fctr> <dbl>
## 1 tornado 24931
## 2 excessive heat 8428
## 3 flood 7259
## 4 lightning 6046
## 5 tstm wind 3872
## 6 heat 3037
## 7 flash flood 2755
## 8 ice storm 2064
## 9 thunderstorm wind 1621
## 10 winter storm 1527
## # ... with 316 more rows
eventDamage[order(eventDamage$Damage, decreasing = TRUE), c(1, 3)]
## # A tibble: 326 × 2
## EVTYPE Damage
## <fctr> <dbl>
## 1 hurricane/typhoon 42.1
## 2 tornado 24.2
## 3 flood 17.1
## 4 storm surge 11.4
## 5 flash flood 10.0
## 6 hail 9.0
## 7 tropical storm 7.3
## 8 ice storm 7.2
## 9 hurricane 6.4
## 10 winter storm 6.3
## # ... with 316 more rows
And, to check how much the scale of health harm caused by event types correlates with the scale of harm to the economy, we fit a simple linear regression line to the data points in a scatterplot (Fig. 1). Since there is a clear correlation between harmful health and economic effects, we anticipate that final results will show that some most harmful categories of weather event typess overlap.
ggplot(eventDamage) + aes(Casualties, Damage) + geom_point(color = "purple2",
alpha = 0.3) + geom_smooth(method = lm, se = FALSE, color = "grey2") + theme_few() +
labs(x = "Casualties (fatalities+injuries)", y = "Damage in $ billions",
title = "Fig. 1. Correlation of health & economic costs", subtitle = " of weather events (1993-2011)",
caption = "Caption: As the number of casualties increases--both fatalities and injuries--
the dollar amount of damages also increases--to both property and crops--
according to records from the National Weather Service Storm Database.") +
theme(plot.title = element_text(face = "bold", size = 16))
To cope with the large amount of overlap of inconsistently named event types, we compile a list of ten mostly non-overlapping keywords: “blizzard/snow/winter”, “flood,” “fog,” “hail,” “heat,” “wind,” “lightning,” “tornado,” “tropical/hurricane/tsunami,” and “fire”. Grouping together the 233 (of 889) event types containing a variation on one of those keywords produces a list of ten plausibly broad categories of most harmful weather event types with the number of types noted in parentheses:
Snow (45)
Floods (41)
Fog (4)
Hail (17)
Heat (9)
Winds (79),
Lightning (7)
Tornadoes (10)
Hurricanes+ (15)
Fires (6)
As we create each broad new weather event type category, we spot check its plausibility by listing some of its event types to make sure they meet common sense expectations, which all ten do. The amount of “double counting” when one event type contains keywords from two categories is insignificant.
snow <- mutate(eventDamage[grep("^blizzard|snow|^winter", eventDamage$EVTYPE),
], Category = "Snow")
snow
## # A tibble: 45 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 blizzard 906 0.7 Snow
## 2 blizzard/winter storm 0 0.0 Snow
## 3 blowing snow 16 0.0 Snow
## 4 cold and snow 14 0.0 Snow
## 5 excessive snow 2 0.0 Snow
## 6 falling snow/ice 2 0.0 Snow
## 7 freezing rain/snow 1 0.0 Snow
## 8 heavy rain/snow 0 0.0 Snow
## 9 heavy snow 1148 0.5 Snow
## 10 heavy snow and high winds 2 0.0 Snow
## # ... with 35 more rows
floods <- mutate(eventDamage[grep("flood", eventDamage$EVTYPE), ], Category = "Floods")
floods
## # A tibble: 41 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 coastal flood 5 0.1 Floods
## 2 coastal flooding 3 0.0 Floods
## 3 coastal flooding/erosion 5 0.0 Floods
## 4 flash flood 2755 10.0 Floods
## 5 flash flood - heavy rain 0 0.0 Floods
## 6 flash flood from ice jams 0 0.0 Floods
## 7 flash flood/flood 14 0.2 Floods
## 8 flash flood/landslide 0 0.0 Floods
## 9 flash flooding 27 0.2 Floods
## 10 flash flooding/flood 5 0.0 Floods
## # ... with 31 more rows
fog <- mutate(eventDamage[grep("fog", eventDamage$EVTYPE), ], Category = "Fog")
fog
## # A tibble: 4 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 dense fog 360 0 Fog
## 2 fog 796 0 Fog
## 3 fog and cold temperatures 2 0 Fog
## 4 freezing fog 0 0 Fog
hail <- mutate(eventDamage[grep("hail", eventDamage$EVTYPE), ], Category = "Hail")
hail
## # A tibble: 17 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 hail 970 9.0 Hail
## 2 hail 150 0 0.0 Hail
## 3 hail 200 0 0.0 Hail
## 4 hail 275 0 0.0 Hail
## 5 hail 75 0 0.0 Hail
## 6 hail damage 0 0.0 Hail
## 7 hail/wind 0 0.0 Hail
## 8 hail/winds 0 0.0 Hail
## 9 hailstorm 0 0.2 Hail
## 10 marine hail 0 0.0 Hail
## 11 small hail 10 0.0 Hail
## 12 thunderstorm wind/hail 0 0.0 Hail
## 13 thunderstorm winds hail 0 0.0 Hail
## 14 thunderstorm winds/hail 1 0.0 Hail
## 15 tornadoes, tstm wind, hail 25 1.6 Hail
## 16 tstm wind/hail 100 0.0 Hail
## 17 wind/hail 0 0.0 Hail
heat <- mutate(eventDamage[grep("heat", eventDamage$EVTYPE), ], Category = "Heat")
heat
## # A tibble: 9 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 drought/excessive heat 2 0.0 Heat
## 2 excessive heat 8428 0.5 Heat
## 3 extreme heat 251 0.0 Heat
## 4 heat 3037 0.0 Heat
## 5 heat wave 551 0.0 Heat
## 6 heat wave drought 19 0.0 Heat
## 7 heat waves 5 0.0 Heat
## 8 record heat 52 0.0 Heat
## 9 record/excessive heat 17 0.0 Heat
winds <- mutate(eventDamage[grep(("wind"), eventDamage$EVTYPE), ], Category = "Winds")
winds
## # A tibble: 79 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 cold/wind chill 107 0 Winds
## 2 cold/winds 1 0 Winds
## 3 dry mircoburst winds 1 0 Winds
## 4 extreme cold/wind chill 149 0 Winds
## 5 extreme windchill 22 0 Winds
## 6 flood/rain/winds 0 0 Winds
## 7 gradient wind 0 0 Winds
## 8 gusty wind 2 0 Winds
## 9 gusty wind/hvy rain 0 0 Winds
## 10 gusty winds 15 0 Winds
## # ... with 69 more rows
lightning <- mutate(eventDamage[grep("lightning", eventDamage$EVTYPE), ], Category = "Lightning")
lightning
## # A tibble: 7 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 lightning 6046 0.4 Lightning
## 2 lightning and thunderstorm win 1 0.0 Lightning
## 3 lightning injury 1 0.0 Lightning
## 4 lightning thunderstorm winds 0 0.0 Lightning
## 5 lightning. 1 0.0 Lightning
## 6 thunderstorm wind/lightning 0 0.0 Lightning
## 7 thunderstorm winds lightning 0 0.0 Lightning
tornadoes <- mutate(eventDamage[grep("tornado", eventDamage$EVTYPE), ], Category = "Tornadoes")
tornadoes
## # A tibble: 10 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 tornado 24931 24.2 Tornadoes
## 2 tornado f0 0 0.0 Tornadoes
## 3 tornado f1 0 0.0 Tornadoes
## 4 tornado f2 16 0.0 Tornadoes
## 5 tornado f3 2 0.0 Tornadoes
## 6 tornadoes 0 0.0 Tornadoes
## 7 tornadoes, tstm wind, hail 25 1.6 Tornadoes
## 8 waterspout tornado 1 0.0 Tornadoes
## 9 waterspout/ tornado 0 0.0 Tornadoes
## 10 waterspout/tornado 45 0.1 Tornadoes
hurricanesEtc <- mutate(eventDamage[grep("tropical|hurricane|tsunami", eventDamage$EVTYPE),
], Category = "Hurricanes+")
hurricanesEtc
## # A tibble: 15 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 hurricane 107 6.4 Hurricanes+
## 2 hurricane edouard 2 0.0 Hurricanes+
## 3 hurricane emily 1 0.0 Hurricanes+
## 4 hurricane erin 7 0.4 Hurricanes+
## 5 hurricane felix 1 0.0 Hurricanes+
## 6 hurricane opal 2 2.2 Hurricanes+
## 7 hurricane opal/high winds 2 0.1 Hurricanes+
## 8 hurricane-generated swells 2 0.0 Hurricanes+
## 9 hurricane/typhoon 1339 42.1 Hurricanes+
## 10 tropical depression 0 0.0 Hurricanes+
## 11 tropical storm 398 7.3 Hurricanes+
## 12 tropical storm alberto 0 0.0 Hurricanes+
## 13 tropical storm gordon 51 0.0 Hurricanes+
## 14 tropical storm jerry 0 0.0 Hurricanes+
## 15 tsunami 162 0.1 Hurricanes+
fires <- mutate(eventDamage[grep("fire", eventDamage$EVTYPE), ], Category = "Fires")
fires
## # A tibble: 6 × 4
## EVTYPE Casualties Damage Category
## <fctr> <dbl> <dbl> <chr>
## 1 brush fire 2 0.0 Fires
## 2 wild fires 153 0.6 Fires
## 3 wild/forest fire 557 1.2 Fires
## 4 wild/forest fires 0 0.0 Fires
## 5 wildfire 986 4.7 Fires
## 6 wildfires 0 0.1 Fires
categoryDamage <- rbind(snow, floods, fog, hail, heat, winds, lightning, tornadoes,
hurricanesEtc, fires)
categoryDamage$Category <- as.factor(categoryDamage$Category)
Armed with these ten plausibly broad weather event type categories, we rank the harm each one caused to either population health or the economy.
Over the 19 years from 1993-2011, National Weather Service records show that the top five most harmful categories of weather event types to both population health and the economy included Floods, Tornadoes and Winds.
finalDamage <- summarize(group_by(categoryDamage, Category), Casualties = sum(Casualties),
Damage = round(sum(Damage), 1))
finalDamage[order(finalDamage$Casualties, decreasing = TRUE), c(1, 2)]
## # A tibble: 10 × 2
## Category Casualties
## <fctr> <dbl>
## 1 Tornadoes 25020
## 2 Heat 12362
## 3 Floods 10129
## 4 Winds 9360
## 5 Lightning 6049
## 6 Snow 4410
## 7 Hurricanes+ 2074
## 8 Fires 1698
## 9 Fog 1158
## 10 Hail 1106
finalDamage[order(finalDamage$Damage, decreasing = TRUE), c(1, 3)]
## # A tibble: 10 × 2
## Category Damage
## <fctr> <dbl>
## 1 Hurricanes+ 58.6
## 2 Floods 27.9
## 3 Tornadoes 25.9
## 4 Winds 14.1
## 5 Hail 10.8
## 6 Snow 7.6
## 7 Fires 6.6
## 8 Heat 0.5
## 9 Lightning 0.4
## 10 Fog 0.0
Tornadoes caused the most Casualties at 25,020 with Floods and Winds in the top five, causing 10,129 and 9,360 Causualties, respectively. Those same three categories were also among the top five for economic Damage. Floods caused the second-most Damage at $27.9 billions with Tornadoes and Winds right behind at $25.9 billions and $14.1 billions, respectively. So, the three categories most harmful to both health and the economy were Tornadoes, Floods, and Winds, as two bar charts (Fig.s 2 & 3) fmake clear with top-five thresholds of 5,000 Casulaties and $9 billions in Damage.
ggplot(finalDamage) + aes(Category, Casualties) + geom_bar(stat = "identity",
fill = "red2") + geom_hline(yintercept = 5000) + theme_few() + labs(x = "Event Category",
y = "Casualties (fatalities+injuries)", title = "Fig. 2. Weather events most harmful to health by category",
subtitle = " (1993-2011)", caption = "Caption: As the horizontal reference line highlights, only five event categories caused
more than 5,000 casualties: floods, heat, lightning, tornadoes and winds,
according to records from the National Weather Service Storm Database.") +
theme(plot.title = element_text(face = "bold", size = 16), axis.text.x = element_text(angle = 45))
ggplot(finalDamage) + aes(Category, Damage) + geom_bar(stat = "identity", fill = "blue2") +
geom_hline(yintercept = 9) + theme_few() + labs(x = "Event Category", y = "Damage in $ billions",
title = "Fig. 3. Weather events most harmful to economy by category", subtitle = " (1993-2011)",
caption = "Caption: As the horizontal reference line highlights, only five event categories caused
more than $9 billion in damage: floods, hail, hurricanes, tornadoes and winds,
according to records from the National Weather Service Storm Database.") +
theme(plot.title = element_text(face = "bold", size = 16), axis.text.x = element_text(angle = 45))
Different categories filled out the top five most harmful lists for health and the economy, though. For population health, Heat and Lighting also were among the top five most harmful categories of weather event types with Heat causing the second-most Casualties at 12,362 and Lightning fifth at 6,049. For the economy, in contast, Hurricanes+ and Hail were among the top five most harmful categories with Hurricanes+ causing the most Damage at $58.6 billions and Hail fifth at $10.8 billions.
There was a tendency for the weather events most harmful to health also to be the most harmful to the economy like Floods, Tornadoes and Winds. Some most harmful categories, like Heat and Lightning, however, were relatively much more harmful to health than the economy while others like Hurricanes+ and Hail were much more harmful to the economy than health.