Impact of weather events on population health and their economic consequences

Synopsis

Weather events are responsible for substantial damage to people and the economy. In this report we identify the weather events in the United States that caused the highest amount of damage. Events from the year 1950 to November 2011 are included in our analysis. According to this data, tornadoes are the most dangerous weather events as measured by number of fatalities and injuries. Most economic damage was caused by hurricanes, as measured by the sum of damage to property and crops. Floods were the most damaging events for crops, while damage to property was again dominated by hurricanes.

Data Processing

This analysis is based on the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The data was downloaded from here. It is stored in the CSV format. Missing data is coded as empty fields. Since there are no comments in the data, we can set the argument comment.char to the empty string.

data.full <- read.csv("repdata_data_StormData.csv.bz2",
                      na.strings = "",
                      comment.char = "")

Let’s get a feeling for the dataset by looking at its dimensions and the feature names:

data.nrow = nrow(data.full)
data.ncol = ncol(data.full)
names(data.full)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

There are 902297 data entrys and 37 features in the dataset.

We’re mostly interested in the event type, EVTYPE, the impact on public health measured by the number of fatalities, FATALITIES, and the number of injuries, INJURIES, and the economic impact measured by the property damage, PROPDMG, and the crop damage, CROPDMG.

The damage variables are measured in USD. However, the units of these values are stored in additional columns, PROPDMGEXP and CROPDMGEXP. If the unit is one of the letters K, M, B, it is interpreted as thousands, millions and billions, respectively. Otherwise, the unit is set to NA. Luckily, most of the data entries adhere to this code:

table(data.full$PROPDMGEXP)
## 
##      -      ?      +      0      1      2      3      4      5      6 
##      1      8      5    216     25     13      4      4     28      4 
##      7      8      B      h      H      K      m      M 
##      5      1     40      1      6 424665      7  11330
table(data.full$CROPDMGEXP)
## 
##      ?      0      2      B      k      K      m      M 
##      7     19      1      9     21 281832      1   1994

The conversion from unit code to multiplier for the damage columns is done in a helper function:

unit.to.multiplier <- function(unit)
    {
        if (is.na(unit))
        {
            1  # missing entry is interpreted as no exponent -> multiply by one
        }
        else if (unit %in% c("k", "K"))
        {
            1e3
        }
        else if (unit %in% c("m", "M"))
        {
            1e6
        }
        else if (unit %in% c("b", "B"))
        {
            1e9
        }
        else
        {
            NA  # any unrecognized factor level is interpreted as NA
        }
    }

After converting the damage columns they are added to create a variable for the total damage in USD, TOTALDMG.USD. In addition, we select only the columns we’re interested in and have a look at the first few rows:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
data <- data.full %>%     
        select(EVTYPE, FATALITIES, INJURIES, PROPDMG:CROPDMGEXP) %>%
        mutate(PROPDMGEXP = sapply(PROPDMGEXP, unit.to.multiplier),
               PROPDMG.USD = PROPDMG * PROPDMGEXP,               
               CROPDMGEXP = sapply(CROPDMGEXP, unit.to.multiplier),
               CROPDMG.USD = CROPDMG * CROPDMGEXP,
               TOTALDMG.USD = PROPDMG.USD + CROPDMG.USD)
rm(data.full)
head(data)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0       1000       0          1
## 2 TORNADO          0        0     2.5       1000       0          1
## 3 TORNADO          0        2    25.0       1000       0          1
## 4 TORNADO          0        2     2.5       1000       0          1
## 5 TORNADO          0        2     2.5       1000       0          1
## 6 TORNADO          0        6     2.5       1000       0          1
##   PROPDMG.USD CROPDMG.USD TOTALDMG.USD
## 1       25000           0        25000
## 2        2500           0         2500
## 3       25000           0        25000
## 4        2500           0         2500
## 5        2500           0         2500
## 6        2500           0         2500

We expect that there are only few missing values after processing the data.

sapply(data, function(x) mean(is.na(x)))
##       EVTYPE   FATALITIES     INJURIES      PROPDMG   PROPDMGEXP 
## 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 3.557587e-04 
##      CROPDMG   CROPDMGEXP  PROPDMG.USD  CROPDMG.USD TOTALDMG.USD 
## 0.000000e+00 2.992363e-05 3.557587e-04 2.992363e-05 3.856823e-04

Results

The impact of weather events on public health can be measured by the total number of fatalities and injuries attributed to a certain event.

data.pop.health <- data %>%
    group_by(EVTYPE) %>%
    summarize(SUMMED.FATALITIES = sum(FATALITIES),
              SUMMED.INJURIES = sum(INJURIES))

head(arrange(data.pop.health, desc(SUMMED.FATALITIES), desc(SUMMED.INJURIES)))
## # A tibble: 6 × 3
##           EVTYPE SUMMED.FATALITIES SUMMED.INJURIES
##           <fctr>             <dbl>           <dbl>
## 1        TORNADO              5633           91346
## 2 EXCESSIVE HEAT              1903            6525
## 3    FLASH FLOOD               978            1777
## 4           HEAT               937            2100
## 5      LIGHTNING               816            5230
## 6      TSTM WIND               504            6957
head(arrange(data.pop.health, desc(SUMMED.INJURIES), desc(SUMMED.FATALITIES)))
## # A tibble: 6 × 3
##           EVTYPE SUMMED.FATALITIES SUMMED.INJURIES
##           <fctr>             <dbl>           <dbl>
## 1        TORNADO              5633           91346
## 2      TSTM WIND               504            6957
## 3          FLOOD               470            6789
## 4 EXCESSIVE HEAT              1903            6525
## 5      LIGHTNING               816            5230
## 6           HEAT               937            2100

So we see that tornadoes are by far the most harmful weather events, both by fatalities and by injuries. We can also compare the injuries of the top 10 weather events with the highest total number of injuries. Interestingly, heat causes on average more injuries per event than tornadoes. The total number of injuries is dominated by a few catastrophic tornadoes. Other events such as lightnings almost never cause any injuries. The distributions are even more skewed for fatalities.

top.pop.health.events <- arrange(data.pop.health, desc(SUMMED.INJURIES))$EVTYPE[1:10]
data.top.pop.health.events <- data %>% filter(EVTYPE %in% top.pop.health.events)

library(ggplot2)
qplot(EVTYPE, log10(INJURIES),
      data = data.top.pop.health.events,
      geom=c("boxplot"),
      xlab="Event type",
      ylab="log10 of fatalities",
      main="Fatalities caused by top 10 fatal weather events")
## Warning: Removed 736740 rows containing non-finite values (stat_boxplot).

Economic damage is calculated as the sum of damage to property and to crops.

data.economy <- data %>%
  group_by(EVTYPE) %>%
  summarize(SUMMED.TOTALDMG.USD = sum(TOTALDMG.USD),
            SUMMED.PROPDMG.USD = sum(PROPDMG.USD),
            SUMMED.CROPDMG.USD = sum(CROPDMG.USD))

head(arrange(data.economy, desc(SUMMED.TOTALDMG.USD)))
## # A tibble: 6 × 4
##              EVTYPE SUMMED.TOTALDMG.USD SUMMED.PROPDMG.USD
##              <fctr>               <dbl>              <dbl>
## 1 HURRICANE/TYPHOON         71913712800        69305840000
## 2       STORM SURGE         43323541000        43323536000
## 3         HURRICANE         14610229010        11868319010
## 4       RIVER FLOOD         10148404500         5118945500
## 5    TROPICAL STORM          8382236550         7703890550
## 6          WILDFIRE          5060586800         4765114000
## # ... with 1 more variables: SUMMED.CROPDMG.USD <dbl>
head(arrange(data.economy, desc(SUMMED.PROPDMG.USD)))
## # A tibble: 6 × 4
##              EVTYPE SUMMED.TOTALDMG.USD SUMMED.PROPDMG.USD
##              <fctr>               <dbl>              <dbl>
## 1 HURRICANE/TYPHOON         71913712800        69305840000
## 2       STORM SURGE         43323541000        43323536000
## 3         HURRICANE         14610229010        11868319010
## 4    TROPICAL STORM          8382236550         7703890550
## 5       RIVER FLOOD         10148404500         5118945500
## 6          WILDFIRE          5060586800         4765114000
## # ... with 1 more variables: SUMMED.CROPDMG.USD <dbl>
head(arrange(data.economy, desc(SUMMED.CROPDMG.USD)))
## # A tibble: 6 × 4
##              EVTYPE SUMMED.TOTALDMG.USD SUMMED.PROPDMG.USD
##              <fctr>               <dbl>              <dbl>
## 1             FLOOD                  NA                 NA
## 2       RIVER FLOOD         10148404500         5118945500
## 3         ICE STORM                  NA                 NA
## 4         HURRICANE         14610229010        11868319010
## 5 HURRICANE/TYPHOON         71913712800        69305840000
## 6       FLASH FLOOD                  NA                 NA
## # ... with 1 more variables: SUMMED.CROPDMG.USD <dbl>

Measured by total and property damage, hurricanes/typhoons are the most devastating weather events, followed by storm surge. If you combined the events HURRICANE/TYPHOON and HURRICANE the result would be even clearer. However, if only judging by crop damage, floods lead to the largest economic losses, followed by ice storms.