Floods, tornadoes and winds were the most harmful to both health and economy, 1993-2011

Synopsis

Over the 19 years from 1993-2011, National Weather Service records show that the top five most harmful categories of weather event types–to both population health and the economy–included floods, tornadoes and winds. However, the other two top- five categories differed. Heat and lightning were also among the top five most harmful categories for population health while hail and hurricanes were for the economy. Although there was a tendency for some categories most harmful to health also to be among the most harmful to the economy, others like heat, lightning, hurricanes and hail were most harmful only to one or the other, not both.

Data Processing

We start with the compressed file repdata_data_StormData.csv.bz2 from the National Weather Service Storm Database containing records from 124 Forecast Offices, 0.523 gigabits read in directly since this format need not be unzipped first.

library(readr)
library(dplyr)
library(stringr)
library(ggplot2)
library(ggthemes)
(options = scipen = 9)
stormData <- read_csv("./repdata_data_StormData.csv.bz2", na = "?")

From the 37 avalable fields we select the seven covering date (BGN_TDATE), weather event type (EVTYPE), fatalities, injuries, dollar amount of damage to property and crops (PROPDMG, CROPDMG), as well the parallel “exponent” fields (PROPDMGEXP, CROPDMGEXP). Then we extract the variable Year from the date field and start reducing the large number of incomplete and inconsistent records.

While records of weather events in all 61 years from 1950-2011 include data for fatalities, injuries and damage to property, only the 19 most recent years from 1993-2011 include damage to crops, as well as to property. In order to capture the most complete measure of damage to the economy, this analysis focuses on those 19 years from 1993-2011, which turn out to include the vast majority of records: 714,738 of the original 902,297.

stormDamage <- select(stormData, Year = BGN_DATE, EVTYPE, FATALITIES:CROPDMGEXP)
stormDamage$Year <- year(as.Date(stormDamage$Year, "%m/%d/%Y"))
stormDamage <- filter(stormDamage, Year > 1992)

For those 714,738 records from 1993-2011, we transform into factors the event type field, as well as the “exponent” fields indicating whether property and crop damage figures are recorded in thousands, millions or billions of dollars.

stormDamage$EVTYPE <- as.factor(str_to_lower(str_trim(stormDamage$EVTYPE, side = "both")))
stormDamage$PROPDMGEXP <- as.factor(str_to_upper(stormDamage$PROPDMGEXP))
stormDamage$CROPDMGEXP <- as.factor(str_to_upper(stormDamage$CROPDMGEXP))

Unfortunately, some of those records from 1993-2011 with complete economic damage figures do not consistently record the unit size. Since the distinction between thousands, millions or billions is crucial to measuring economic damage, this study further focuses only on those complete records that report either a fatality, an injury, or a property or crop damage figure including its “exponent” as K, M or B in PROPDMGEXP or CROPDMGEXP field.

WARNING:

Although all records of fatalities and injuries were retained, 327 records from 1994-1995 that reported damages but not with the offical exponent letter codes (K, M & B) were dropped as unreliable. So, in the results (below) damage totals for those two years may be slightly understated.

stormDamage <- filter(stormDamage, FATALITIES > 0 | INJURIES > 0 | PROPDMGEXP == 
    c("B", "K", "M") | CROPDMGEXP == c("B", "K", "M"))
stormDamage[7542, ]$CROPDMGEXP <- 0
stormDamage[9143, ]$CROPDMGEXP <- 0

For the remaining 148,081 complete and clean records reporting fatalities, injuries or damage, we use the “exponent” codes to normalize he twin damage figures in new adjPropDmg and adjCropDMG fields, which add up to the new adjTotalDmg field.

stormDamage <- mutate(stormDamage, adjPropDmg = if_else(PROPDMGEXP == "B", PROPDMG, 
    if_else(PROPDMGEXP == "M", PROPDMG * 0.001, if_else(PROPDMGEXP == "K", PROPDMG * 
        1e-06, 0))))
stormDamage <- mutate(stormDamage, adjCropDmg = if_else(CROPDMGEXP == "B", CROPDMG, 
    if_else(CROPDMGEXP == "M", CROPDMG * 0.001, if_else(CROPDMGEXP == "K", CROPDMG * 
        1e-06, 0))))
stormDamage <- mutate(stormDamage, adjTotalDmg = adjPropDmg + adjCropDmg)

After adding fatalities and injuries together in the new Casualties field,we spot check annual totals for Casualties and adjTotalDmg, which vary plausibly.

annualDamage <- summarize(group_by(stormDamage, Year), Casualties = sum(FATALITIES) + 
    sum(INJURIES), Damage = round(sum(adjTotalDmg), 1))
annualDamage
## # A tibble: 19 × 3
##     Year Casualties Damage
##    <int>      <dbl>  <dbl>
## 1   1993       2447   10.3
## 2   1994       4505    7.8
## 3   1995       5971    5.1
## 4   1996       3259    4.8
## 5   1997       4401    4.6
## 6   1998      11864    7.6
## 7   1999       6056    5.2
## 8   2000       3280    3.8
## 9   2001       3190    7.9
## 10  2002       3653    2.9
## 11  2003       3374    7.9
## 12  2004       2796   20.8
## 13  2005       2303   40.2
## 14  2006       3967    5.2
## 15  2007       2612    5.9
## 16  2008       3191   13.3
## 17  2009       1687    3.9
## 18  2010       2280    7.5
## 19  2011       8794   15.3

Since, even after converting the original event type field to all lower case above, there are an unwieldy and overlapping 889 event types, we spot check out top 10 most harmful, to health and the economy, by Casualties and by Damage, which suggests the need to group together overlapping and related event types into a few broader categories. Clear candidates for combining include “excessive heat” and “heat,” “flood” and “flashflood,” “tstm wind” and “thunderstorm wind” in the Casualties table, for example, as well as “hurricane/typhoon” and “hurricane” and “tropical storm” in the Damage table.

eventDamage <- summarize(group_by(stormDamage, EVTYPE), Casualties = sum(FATALITIES) + 
    sum(INJURIES), Damage = round(sum(adjTotalDmg), 1))
eventDamage[order(eventDamage$Casualties, decreasing = TRUE), c(1, 2)]
## # A tibble: 326 × 2
##               EVTYPE Casualties
##               <fctr>      <dbl>
## 1            tornado      24931
## 2     excessive heat       8428
## 3              flood       7259
## 4          lightning       6046
## 5          tstm wind       3872
## 6               heat       3037
## 7        flash flood       2755
## 8          ice storm       2064
## 9  thunderstorm wind       1621
## 10      winter storm       1527
## # ... with 316 more rows
eventDamage[order(eventDamage$Damage, decreasing = TRUE), c(1, 3)]
## # A tibble: 326 × 2
##               EVTYPE Damage
##               <fctr>  <dbl>
## 1  hurricane/typhoon   42.1
## 2            tornado   24.2
## 3              flood   17.1
## 4        storm surge   11.4
## 5        flash flood   10.0
## 6               hail    9.0
## 7     tropical storm    7.3
## 8          ice storm    7.2
## 9          hurricane    6.4
## 10      winter storm    6.3
## # ... with 316 more rows

And, to check how much the scale of health harm caused by event types correlates with the scale of harm to the economy, we fit a simple linear regression line to the data points in a scatterplot (Fig. 1). Since there is a clear correlation between harmful health and economic effects, we anticipate that final results will show that some most harmful categories of weather event typess overlap.

ggplot(eventDamage) + aes(Casualties, Damage) + geom_point(color = "purple2", 
    alpha = 0.3) + geom_smooth(method = lm, se = FALSE, color = "grey2") + theme_few() + 
    labs(x = "Casualties (fatalities+injuries)", y = "Damage in $ billions", 
        title = "Fig. 1. Correlation of health & economic costs", subtitle = "                    of weather events (1993-2011)", 
        caption = "Caption: As the number of casualties increases--both fatalities and injuries-- 
             the dollar amount of damages also increases--to both property and crops--
             according to records from the National Weather Service Storm Database.") + 
    theme(plot.title = element_text(face = "bold", size = 16))

To cope with the large amount of overlap of inconsistently named event types, we compile a list of ten mostly non-overlapping keywords: “blizzard/snow/winter”, “flood,” “fog,” “hail,” “heat,” “wind,” “lightning,” “tornado,” “tropical/hurricane/tsunami,” and “fire”. Grouping together the 233 (of 889) event types containing a variation on one of those keywords produces a list of ten plausibly broad categories of most harmful weather event types with the number of types noted in parentheses:

  • Snow (45)

  • Floods (41)

  • Fog (4)

  • Hail (17)

  • Heat (9)

  • Winds (79),

  • Lightning (7)

  • Tornadoes (10)

  • Hurricanes+ (15)

  • Fires (6)

As we create each broad new weather event type category, we spot check its plausibility by listing some of its event types to make sure they meet common sense expectations, which all ten do. The amount of “double counting” when one event type contains keywords from two categories is insignificant.

snow <- mutate(eventDamage[grep("^blizzard|snow|^winter", eventDamage$EVTYPE), 
    ], Category = "Snow")
snow
## # A tibble: 45 × 4
##                       EVTYPE Casualties Damage Category
##                       <fctr>      <dbl>  <dbl>    <chr>
## 1                   blizzard        906    0.7     Snow
## 2      blizzard/winter storm          0    0.0     Snow
## 3               blowing snow         16    0.0     Snow
## 4              cold and snow         14    0.0     Snow
## 5             excessive snow          2    0.0     Snow
## 6           falling snow/ice          2    0.0     Snow
## 7         freezing rain/snow          1    0.0     Snow
## 8            heavy rain/snow          0    0.0     Snow
## 9                 heavy snow       1148    0.5     Snow
## 10 heavy snow and high winds          2    0.0     Snow
## # ... with 35 more rows
floods <- mutate(eventDamage[grep("flood", eventDamage$EVTYPE), ], Category = "Floods")
floods
## # A tibble: 41 × 4
##                       EVTYPE Casualties Damage Category
##                       <fctr>      <dbl>  <dbl>    <chr>
## 1              coastal flood          5    0.1   Floods
## 2           coastal flooding          3    0.0   Floods
## 3   coastal flooding/erosion          5    0.0   Floods
## 4                flash flood       2755   10.0   Floods
## 5   flash flood - heavy rain          0    0.0   Floods
## 6  flash flood from ice jams          0    0.0   Floods
## 7          flash flood/flood         14    0.2   Floods
## 8      flash flood/landslide          0    0.0   Floods
## 9             flash flooding         27    0.2   Floods
## 10      flash flooding/flood          5    0.0   Floods
## # ... with 31 more rows
fog <- mutate(eventDamage[grep("fog", eventDamage$EVTYPE), ], Category = "Fog")
fog
## # A tibble: 4 × 4
##                      EVTYPE Casualties Damage Category
##                      <fctr>      <dbl>  <dbl>    <chr>
## 1                 dense fog        360      0      Fog
## 2                       fog        796      0      Fog
## 3 fog and cold temperatures          2      0      Fog
## 4              freezing fog          0      0      Fog
hail <- mutate(eventDamage[grep("hail", eventDamage$EVTYPE), ], Category = "Hail")
hail
## # A tibble: 17 × 4
##                        EVTYPE Casualties Damage Category
##                        <fctr>      <dbl>  <dbl>    <chr>
## 1                        hail        970    9.0     Hail
## 2                    hail 150          0    0.0     Hail
## 3                    hail 200          0    0.0     Hail
## 4                    hail 275          0    0.0     Hail
## 5                     hail 75          0    0.0     Hail
## 6                 hail damage          0    0.0     Hail
## 7                   hail/wind          0    0.0     Hail
## 8                  hail/winds          0    0.0     Hail
## 9                   hailstorm          0    0.2     Hail
## 10                marine hail          0    0.0     Hail
## 11                 small hail         10    0.0     Hail
## 12     thunderstorm wind/hail          0    0.0     Hail
## 13    thunderstorm winds hail          0    0.0     Hail
## 14    thunderstorm winds/hail          1    0.0     Hail
## 15 tornadoes, tstm wind, hail         25    1.6     Hail
## 16             tstm wind/hail        100    0.0     Hail
## 17                  wind/hail          0    0.0     Hail
heat <- mutate(eventDamage[grep("heat", eventDamage$EVTYPE), ], Category = "Heat")
heat
## # A tibble: 9 × 4
##                   EVTYPE Casualties Damage Category
##                   <fctr>      <dbl>  <dbl>    <chr>
## 1 drought/excessive heat          2    0.0     Heat
## 2         excessive heat       8428    0.5     Heat
## 3           extreme heat        251    0.0     Heat
## 4                   heat       3037    0.0     Heat
## 5              heat wave        551    0.0     Heat
## 6      heat wave drought         19    0.0     Heat
## 7             heat waves          5    0.0     Heat
## 8            record heat         52    0.0     Heat
## 9  record/excessive heat         17    0.0     Heat
winds <- mutate(eventDamage[grep(("wind"), eventDamage$EVTYPE), ], Category = "Winds")
winds
## # A tibble: 79 × 4
##                     EVTYPE Casualties Damage Category
##                     <fctr>      <dbl>  <dbl>    <chr>
## 1          cold/wind chill        107      0    Winds
## 2               cold/winds          1      0    Winds
## 3     dry mircoburst winds          1      0    Winds
## 4  extreme cold/wind chill        149      0    Winds
## 5        extreme windchill         22      0    Winds
## 6         flood/rain/winds          0      0    Winds
## 7            gradient wind          0      0    Winds
## 8               gusty wind          2      0    Winds
## 9      gusty wind/hvy rain          0      0    Winds
## 10             gusty winds         15      0    Winds
## # ... with 69 more rows
lightning <- mutate(eventDamage[grep("lightning", eventDamage$EVTYPE), ], Category = "Lightning")
lightning
## # A tibble: 7 × 4
##                           EVTYPE Casualties Damage  Category
##                           <fctr>      <dbl>  <dbl>     <chr>
## 1                      lightning       6046    0.4 Lightning
## 2 lightning and thunderstorm win          1    0.0 Lightning
## 3               lightning injury          1    0.0 Lightning
## 4   lightning thunderstorm winds          0    0.0 Lightning
## 5                     lightning.          1    0.0 Lightning
## 6    thunderstorm wind/lightning          0    0.0 Lightning
## 7   thunderstorm winds lightning          0    0.0 Lightning
tornadoes <- mutate(eventDamage[grep("tornado", eventDamage$EVTYPE), ], Category = "Tornadoes")
tornadoes
## # A tibble: 10 × 4
##                        EVTYPE Casualties Damage  Category
##                        <fctr>      <dbl>  <dbl>     <chr>
## 1                     tornado      24931   24.2 Tornadoes
## 2                  tornado f0          0    0.0 Tornadoes
## 3                  tornado f1          0    0.0 Tornadoes
## 4                  tornado f2         16    0.0 Tornadoes
## 5                  tornado f3          2    0.0 Tornadoes
## 6                   tornadoes          0    0.0 Tornadoes
## 7  tornadoes, tstm wind, hail         25    1.6 Tornadoes
## 8          waterspout tornado          1    0.0 Tornadoes
## 9         waterspout/ tornado          0    0.0 Tornadoes
## 10         waterspout/tornado         45    0.1 Tornadoes
hurricanesEtc <- mutate(eventDamage[grep("tropical|hurricane|tsunami", eventDamage$EVTYPE), 
    ], Category = "Hurricanes+")
hurricanesEtc
## # A tibble: 15 × 4
##                        EVTYPE Casualties Damage    Category
##                        <fctr>      <dbl>  <dbl>       <chr>
## 1                   hurricane        107    6.4 Hurricanes+
## 2           hurricane edouard          2    0.0 Hurricanes+
## 3             hurricane emily          1    0.0 Hurricanes+
## 4              hurricane erin          7    0.4 Hurricanes+
## 5             hurricane felix          1    0.0 Hurricanes+
## 6              hurricane opal          2    2.2 Hurricanes+
## 7   hurricane opal/high winds          2    0.1 Hurricanes+
## 8  hurricane-generated swells          2    0.0 Hurricanes+
## 9           hurricane/typhoon       1339   42.1 Hurricanes+
## 10        tropical depression          0    0.0 Hurricanes+
## 11             tropical storm        398    7.3 Hurricanes+
## 12     tropical storm alberto          0    0.0 Hurricanes+
## 13      tropical storm gordon         51    0.0 Hurricanes+
## 14       tropical storm jerry          0    0.0 Hurricanes+
## 15                    tsunami        162    0.1 Hurricanes+
fires <- mutate(eventDamage[grep("fire", eventDamage$EVTYPE), ], Category = "Fires")
fires
## # A tibble: 6 × 4
##              EVTYPE Casualties Damage Category
##              <fctr>      <dbl>  <dbl>    <chr>
## 1        brush fire          2    0.0    Fires
## 2        wild fires        153    0.6    Fires
## 3  wild/forest fire        557    1.2    Fires
## 4 wild/forest fires          0    0.0    Fires
## 5          wildfire        986    4.7    Fires
## 6         wildfires          0    0.1    Fires
categoryDamage <- rbind(snow, floods, fog, hail, heat, winds, lightning, tornadoes, 
    hurricanesEtc, fires)
categoryDamage$Category <- as.factor(categoryDamage$Category)

Armed with these ten plausibly broad weather event type categories, we rank the harm each one caused to either population health or the economy.

Results

Over the 19 years from 1993-2011, National Weather Service records show that the top five most harmful categories of weather event types to both population health and the economy included Floods, Tornadoes and Winds.

finalDamage <- summarize(group_by(categoryDamage, Category), Casualties = sum(Casualties), 
    Damage = round(sum(Damage), 1))
finalDamage[order(finalDamage$Casualties, decreasing = TRUE), c(1, 2)]
## # A tibble: 10 × 2
##       Category Casualties
##         <fctr>      <dbl>
## 1    Tornadoes      25020
## 2         Heat      12362
## 3       Floods      10129
## 4        Winds       9360
## 5    Lightning       6049
## 6         Snow       4410
## 7  Hurricanes+       2074
## 8        Fires       1698
## 9          Fog       1158
## 10        Hail       1106
finalDamage[order(finalDamage$Damage, decreasing = TRUE), c(1, 3)]
## # A tibble: 10 × 2
##       Category Damage
##         <fctr>  <dbl>
## 1  Hurricanes+   58.6
## 2       Floods   27.9
## 3    Tornadoes   25.9
## 4        Winds   14.1
## 5         Hail   10.8
## 6         Snow    7.6
## 7        Fires    6.6
## 8         Heat    0.5
## 9    Lightning    0.4
## 10         Fog    0.0

Tornadoes caused the most Casualties at 25,020 with Floods and Winds in the top five, causing 10,129 and 9,360 Causualties, respectively. Those same three categories were also among the top five for economic Damage. Floods caused the second-most Damage at $27.9 billions with Tornadoes and Winds right behind at $25.9 billions and $14.1 billions, respectively. So, the three categories most harmful to both health and the economy were Tornadoes, Floods, and Winds, as two bar charts (Fig.s 2 & 3) fmake clear with top-five thresholds of 5,000 Casulaties and $9 billions in Damage.

ggplot(finalDamage) + aes(Category, Casualties) + geom_bar(stat = "identity", 
    fill = "red2") + geom_hline(yintercept = 5000) + theme_few() + labs(x = "Event Category", 
    y = "Casualties (fatalities+injuries)", title = "Fig. 2. Weather events most harmful to health by category", 
    subtitle = "                    (1993-2011)", caption = "Caption: As the horizontal reference line highlights, only five event categories caused 
more than 5,000 casualties: floods, heat, lightning, tornadoes and winds, 
according to records from the National Weather Service Storm Database.") + 
    theme(plot.title = element_text(face = "bold", size = 16), axis.text.x = element_text(angle = 45))

ggplot(finalDamage) + aes(Category, Damage) + geom_bar(stat = "identity", fill = "blue2") + 
    geom_hline(yintercept = 9) + theme_few() + labs(x = "Event Category", y = "Damage in $ billions", 
    title = "Fig. 3. Weather events most harmful to economy by category", subtitle = "                    (1993-2011)", 
    caption = "Caption: As the horizontal reference line highlights, only five event categories caused
more than $9 billion in damage: floods, hail, hurricanes, tornadoes and winds,
according to records from the National Weather Service Storm Database.") + 
    theme(plot.title = element_text(face = "bold", size = 16), axis.text.x = element_text(angle = 45))

Different categories filled out the top five most harmful lists for health and the economy, though. For population health, Heat and Lighting also were among the top five most harmful categories of weather event types with Heat causing the second-most Casualties at 12,362 and Lightning fifth at 6,049. For the economy, in contast, Hurricanes+ and Hail were among the top five most harmful categories with Hurricanes+ causing the most Damage at $58.6 billions and Hail fifth at $10.8 billions.

There was a tendency for the weather events most harmful to health also to be the most harmful to the economy like Floods, Tornadoes and Winds. Some most harmful categories, like Heat and Lightning, however, were relatively much more harmful to health than the economy while others like Hurricanes+ and Hail were much more harmful to the economy than health.