Economic and Health Consequences of Storms in the United States: 1950

Synopsis

We wish to determine which storm event type is a) the most harmful to population health, measured here in terms of number of injuries and number of fatalities, and b) have the greatest economic consequences, measured here in the estimated USD
value of property and crop damage. This is done by examining the NOASS Storm Database, a data set of storm events that have occurred across the United States of America, with data collected between 1950 through to 2011,
Tornadoes are the event which cause the greatest number of fatalities and injuries (96,979) as well as costing the most in combined estimated property and crop damage (USD 3,312,277).

Data Processing

We have used the NOAA Storm database, which can be downloaded as a compressed CSV file from here. The data is described in this document, and this FAQ.

Downloading the data.

Our first step is to make sure that we’ve downloaded the data and then read it in.

if (!file.exists('StormData.csv.bz2')) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
                  destfile = "StormData.csv.bz2",
                  method="curl")
}

data <- read.csv('StormData.csv.bz2')

Brief description of the data

Our analyses is interested in the following four five variables:

EVTYPE: being the type of storm event
FATALATIES and INJURIES being the number of deaths and injuries associated with a storm event.
PROPDMG and CROPDMG, being the estimated dollar value of the damage caused to property and crops.

Cleaning

The EVTYPE field is very dirty. It consists of a collection of strings that differ from each other sometimes merely by case. We will only clean up the data by ensuring that we use a consistent case, and replacing the one unknown value (represented with ‘?’) with NA.

library(dplyr)

data <- data %>%
    mutate(EVTYPE = factor(toupper(EVTYPE)))

data$EVTYPE[which(data$EVTYPE == '?')] <- NA

Health

We have FATALITIES and INJURIES that interest us. First, we want to calculate the sum and mean of these values per event type. We also introduce a new variable total.health, that is the sum of FATALITIES and INJURIES.

data <- data %>% 
    mutate(total.health = INJURIES + FATALITIES,
           damage = PROPDMG + CROPDMG)

health.summary <- data %>% 
    group_by(EVTYPE) %>%
    summarise(fatality.sum = sum(FATALITIES), fatality.mean = mean(FATALITIES),
              injury.sum = sum(INJURIES), injury.mean = mean(INJURIES),
              health.sum = sum(total.health), health.mean = mean(total.health)) %>%
    arrange(desc(health.sum), desc(health.mean))

For plotting purposes, we also want to produce a cleaner version of this data, where the type of aggregate value (mean and sum) are differentiated by a new variable:

library(tidyr)

health.clean <- health.summary %>% 
    # We only want to look at those events with 500 or more injuries and fatalities
    filter(health.sum > 500) %>% 
    # We want to shift some variables in to columns of their own
    gather(variable, value, c(injury.sum, fatality.sum, injury.mean, fatality.mean)) %>% 
    # And then separate out the type of health related event in to its own column
    separate(variable, into=c("health.type", "aggregate"), sep="\\.") %>% 
    # and then add sum / mean back as columns of their own
    spread(aggregate, value)

And this is what the clean health data looks like:

knitr::kable(head(health.clean %>% arrange(desc(health.sum), EVTYPE), n = 10),
             digits = 2)

EVTYPE	health.sum	health.mean	health.type	mean	sum
TORNADO	96979	1.60	fatality	0.09	5633
TORNADO	96979	1.60	injury	1.51	91346
EXCESSIVE HEAT	8428	5.02	fatality	1.13	1903
EXCESSIVE HEAT	8428	5.02	injury	3.89	6525
TSTM WIND	7461	0.03	fatality	0.00	504
TSTM WIND	7461	0.03	injury	0.03	6957
FLOOD	7259	0.29	fatality	0.02	470
FLOOD	7259	0.29	injury	0.27	6789
LIGHTNING	6046	0.38	fatality	0.05	816
LIGHTNING	6046	0.38	injury	0.33	5230

Economy

On the economic side, we’re mostly interested in the total economic consequence, which we calculate as the sum of the property damage and crop damage. Below, we calculate this value for each event type.

    economy.summary <- data %>%
        group_by(EVTYPE) %>%
        # damage is the sum of property damage and crop damage
        summarise(total.damage = sum(damage), mean = mean(damage)) %>%
        arrange(desc(total.damage), desc(mean))

Results

Health consequences

Below we show show total number of health consequences for the storm events which have managed to reach a total of 500 or more such consequences. We have differentiated between injuries and fatalities by using a stacked bar graph.

library(ggplot2)
library(scales)

ggplot(health.clean, aes(x = EVTYPE, y = sum, fill = health.type)) + 
    geom_bar(stat = "identity") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    labs(title = "Total number of health consequences for storm events",
         x = "Storm event",
         y = "Total number of health consequences (injuries + fatalities)")

Notice that the TORNADO event has by far the greatest number of health consequences in absolute terms. There have been 96,979 injuries and related fatalities from tornadoes, while the storm event type with the second largest number of health consequences is EXCESSIVE HEAT, which has an order of magnitude fewer injuries and fatalities: 8,428.

Here are the ten event with the most health consequences:

knitr::kable(head(health.summary %>% select(EVTYPE, health.sum), n = 10))

EVTYPE	health.sum
TORNADO	96979
EXCESSIVE HEAT	8428
TSTM WIND	7461
FLOOD	7259
LIGHTNING	6046
HEAT	3037
FLASH FLOOD	2755
ICE STORM	2064
THUNDERSTORM WIND	1621
WINTER STORM	1527

In contrast, tornadoes on average do not have a high number of injuries and fatalities. That distinction goes to hurricanes and typhoons, as can be seen in the chart below, which shows average number of health consequences across the event types which have had more than 500 health consequences.

ggplot(health.clean, aes(x = EVTYPE, y = mean, fill = health.type)) + 
    geom_bar(stat = "identity") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    labs(title = "Average number of health consequences for storm events",
         x = "Storm event",
         y = "Average number of health consequences (injuries + fatalities)")

Here are the event types with the highest means:

knitr::kable(head(health.summary %>% 
                      filter(health.sum > 500) %>%
                      arrange(desc(health.mean)), n = 10),
             digits = 2)

EVTYPE	fatality.sum	fatality.mean	injury.sum	injury.mean	health.sum	health.mean
HURRICANE/TYPHOON	64	0.73	1275	14.49	1339	15.22
HEAT WAVE	172	2.29	379	5.05	551	7.35
EXCESSIVE HEAT	1903	1.13	6525	3.89	8428	5.02
HEAT	937	1.22	2100	2.74	3037	3.96
RIP CURRENTS	204	0.67	297	0.98	501	1.65
TORNADO	5633	0.09	91346	1.51	96979	1.60
FOG	62	0.12	734	1.36	796	1.48
RIP CURRENT	368	0.78	232	0.49	600	1.28
ICE STORM	89	0.04	1975	0.98	2064	1.03
LIGHTNING	816	0.05	5230	0.33	6046	0.38

Economic consequences

Below we plot the number storm event types which have exceeded $20 000 in estimated damage to property and crops combined.

ggplot(filter(economy.summary, total.damage > 20000), 
       aes(x = EVTYPE, y = total.damage)) +
    geom_bar(stat = "identity", fill="grey") +
    theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
    scale_y_continuous(labels = comma) +
    labs(title = "Total economic damage in US dollars",
         x = "Storm event",
         y = "Damage in US dollars (property + crop damage)")

Notice that tornadoes have the greatest economic consequences, causing an estimated total of $3,312,277 of damage. The top ten storm events by economic consequence are:

knitr::kable(head(economy.summary %>%
                      mutate(total.damage = paste0('$', comma(total.damage))), 
                  n = 10),
             digits = 2)

EVTYPE	total.damage	mean
TORNADO	$3,312,276.68	54.61
FLASH FLOOD	$1,599,325.05	29.47
TSTM WIND	$1,445,198.21	6.57
HAIL	$1,268,289.66	4.39
FLOOD	$1,067,976.36	42.17
THUNDERSTORM WIND	$943,635.62	11.43
LIGHTNING	$606,932.39	38.53
THUNDERSTORM WINDS	$464,978.11	22.31
HIGH WIND	$342,014.77	16.92
WINTER STORM	$134,699.58	11.78

Economic and Health Consequences of Storms in the United States: 1950 - 2011

RN

03/21/2015