Synopsis

We wish to determine which storm event type is a) the most harmful to population health, measured here in terms of number of injuries and number of fatalities, and b) have the greatest economic consequences, measured here in the estimated USD
value of property and crop damage. This is done by examining the NOASS Storm Database, a data set of storm events that have occurred across the United States of America, with data collected between 1950 through to 2011,
Tornadoes are the event which cause the greatest number of fatalities and injuries (96,979) as well as costing the most in combined estimated property and crop damage (USD 3,312,277).

Data Processing

We have used the NOAA Storm database, which can be downloaded as a compressed CSV file from here. The data is described in this document, and this FAQ.

Downloading the data.

Our first step is to make sure that we’ve downloaded the data and then read it in.

if (!file.exists('StormData.csv.bz2')) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                  destfile = "StormData.csv.bz2",
                  method="curl")
}

data <- read.csv('StormData.csv.bz2')

Brief description of the data

Our analyses is interested in the following four five variables:

  • EVTYPE: being the type of storm event
  • FATALATIES and INJURIES being the number of deaths and injuries associated with a storm event.
  • PROPDMG and CROPDMG, being the estimated dollar value of the damage caused to property and crops.

Cleaning

The EVTYPE field is very dirty. It consists of a collection of strings that differ from each other sometimes merely by case. We will only clean up the data by ensuring that we use a consistent case, and replacing the one unknown value (represented with ‘?’) with NA.

library(dplyr)

data <- data %>%
    mutate(EVTYPE = factor(toupper(EVTYPE)))

data$EVTYPE[which(data$EVTYPE == '?')] <- NA

Health

We have FATALITIES and INJURIES that interest us. First, we want to calculate the sum and mean of these values per event type. We also introduce a new variable total.health, that is the sum of FATALITIES and INJURIES.

data <- data %>% 
    mutate(total.health = INJURIES + FATALITIES,
           damage = PROPDMG + CROPDMG)

health.summary <- data %>% 
    group_by(EVTYPE) %>%
    summarise(fatality.sum = sum(FATALITIES), fatality.mean = mean(FATALITIES),
              injury.sum = sum(INJURIES), injury.mean = mean(INJURIES),
              health.sum = sum(total.health), health.mean = mean(total.health)) %>%
    arrange(desc(health.sum), desc(health.mean))

For plotting purposes, we also want to produce a cleaner version of this data, where the type of aggregate value (mean and sum) are differentiated by a new variable:

library(tidyr)

health.clean <- health.summary %>% 
    # We only want to look at those events with 500 or more injuries and fatalities
    filter(health.sum > 500) %>% 
    # We want to shift some variables in to columns of their own
    gather(variable, value, c(injury.sum, fatality.sum, injury.mean, fatality.mean)) %>% 
    # And then separate out the type of health related event in to its own column
    separate(variable, into=c("health.type", "aggregate"), sep="\\.") %>% 
    # and then add sum / mean back as columns of their own
    spread(aggregate, value)

And this is what the clean health data looks like:

knitr::kable(head(health.clean %>% arrange(desc(health.sum), EVTYPE), n = 10),
             digits = 2)
EVTYPE health.sum health.mean health.type mean sum
TORNADO 96979 1.60 fatality 0.09 5633
TORNADO 96979 1.60 injury 1.51 91346
EXCESSIVE HEAT 8428 5.02 fatality 1.13 1903
EXCESSIVE HEAT 8428 5.02 injury 3.89 6525
TSTM WIND 7461 0.03 fatality 0.00 504
TSTM WIND 7461 0.03 injury 0.03 6957
FLOOD 7259 0.29 fatality 0.02 470
FLOOD 7259 0.29 injury 0.27 6789
LIGHTNING 6046 0.38 fatality 0.05 816
LIGHTNING 6046 0.38 injury 0.33 5230

Economy

On the economic side, we’re mostly interested in the total economic consequence, which we calculate as the sum of the property damage and crop damage. Below, we calculate this value for each event type.

    economy.summary <- data %>%
        group_by(EVTYPE) %>%
        # damage is the sum of property damage and crop damage
        summarise(total.damage = sum(damage), mean = mean(damage)) %>%
        arrange(desc(total.damage), desc(mean))

Results

Health consequences

Below we show show total number of health consequences for the storm events which have managed to reach a total of 500 or more such consequences. We have differentiated between injuries and fatalities by using a stacked bar graph.

library(ggplot2)
library(scales)

ggplot(health.clean, aes(x = EVTYPE, y = sum, fill = health.type)) + 
    geom_bar(stat = "identity") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    labs(title = "Total number of health consequences for storm events",
         x = "Storm event",
         y = "Total number of health consequences (injuries + fatalities)") 

Notice that the TORNADO event has by far the greatest number of health consequences in absolute terms. There have been 96,979 injuries and related fatalities from tornadoes, while the storm event type with the second largest number of health consequences is EXCESSIVE HEAT, which has an order of magnitude fewer injuries and fatalities: 8,428.

Here are the ten event with the most health consequences:

knitr::kable(head(health.summary %>% select(EVTYPE, health.sum), n = 10))
EVTYPE health.sum
TORNADO 96979
EXCESSIVE HEAT 8428
TSTM WIND 7461
FLOOD 7259
LIGHTNING 6046
HEAT 3037
FLASH FLOOD 2755
ICE STORM 2064
THUNDERSTORM WIND 1621
WINTER STORM 1527

In contrast, tornadoes on average do not have a high number of injuries and fatalities. That distinction goes to hurricanes and typhoons, as can be seen in the chart below, which shows average number of health consequences across the event types which have had more than 500 health consequences.

ggplot(health.clean, aes(x = EVTYPE, y = mean, fill = health.type)) + 
    geom_bar(stat = "identity") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    labs(title = "Average number of health consequences for storm events",
         x = "Storm event",
         y = "Average number of health consequences (injuries + fatalities)") 

Here are the event types with the highest means:

knitr::kable(head(health.summary %>% 
                      filter(health.sum > 500) %>%
                      arrange(desc(health.mean)), n = 10),
             digits = 2)
EVTYPE fatality.sum fatality.mean injury.sum injury.mean health.sum health.mean
HURRICANE/TYPHOON 64 0.73 1275 14.49 1339 15.22
HEAT WAVE 172 2.29 379 5.05 551 7.35
EXCESSIVE HEAT 1903 1.13 6525 3.89 8428 5.02
HEAT 937 1.22 2100 2.74 3037 3.96
RIP CURRENTS 204 0.67 297 0.98 501 1.65
TORNADO 5633 0.09 91346 1.51 96979 1.60
FOG 62 0.12 734 1.36 796 1.48
RIP CURRENT 368 0.78 232 0.49 600 1.28
ICE STORM 89 0.04 1975 0.98 2064 1.03
LIGHTNING 816 0.05 5230 0.33 6046 0.38

Economic consequences

Below we plot the number storm event types which have exceeded $20 000 in estimated damage to property and crops combined.

ggplot(filter(economy.summary, total.damage > 20000), 
       aes(x = EVTYPE, y = total.damage)) +
    geom_bar(stat = "identity", fill="grey") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    labs(title = "Total economic damage in US dollars",
         x = "Storm event",
         y = "Damage in US dollars (property + crop damage)") 

Notice that tornadoes have the greatest economic consequences, causing an estimated total of $3,312,277 of damage. The top ten storm events by economic consequence are:

knitr::kable(head(economy.summary %>%
                      mutate(total.damage = paste0('$', comma(total.damage))), 
                  n = 10),
             digits = 2)
EVTYPE total.damage mean
TORNADO $3,312,276.68 54.61
FLASH FLOOD $1,599,325.05 29.47
TSTM WIND $1,445,198.21 6.57
HAIL $1,268,289.66 4.39
FLOOD $1,067,976.36 42.17
THUNDERSTORM WIND $943,635.62 11.43
LIGHTNING $606,932.39 38.53
THUNDERSTORM WINDS $464,978.11 22.31
HIGH WIND $342,014.77 16.92
WINTER STORM $134,699.58 11.78