Population Health and Economic Impact of Severe Weather Events in U.S.

Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) has a database who tracks severe weather events in the U.S., including where and when the event ocurred, as well as an estimate of injuries, fatalities and property damages.

The database used for this analysis contains 902,297 records of severe wheather events, starting in 1950 and ending in November, 2011. The database is available at the following link:

NOAA’s documentation provides guidelines for entering event types and estimate property damages. 48 Storm Data Events are defined in the documentation, available in the following link:

Useful information regarding how the data is collected and published is available in the following document:

The aim of the analysis is to answer two questions regarding the health of the population and the economic consequences as a result of these events:

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

1. Load required R Libraries

library(ggplot2)
library(stringr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

2. Download the data

my_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(my_url, destfile="StormData.csv.bz2")
my_df <- read.csv("StormData.csv.bz2")

3. Cleaning the data

The events are captured by different users, resulting in differences in the event type (EVTYPE) compared to those defined by NOAA. For instance, “Thunderstorm Wind” appears as “Thunderstormw”, “TSTM Wind”, “Thundertorm”, to name a few.

To clen the data, specifically the EVTYPE, some tyding was made according to the following criteria:

  1. Correct some orthographic mistakes (Avalance -> Avalanche)
  2. Convert from plural to singular (Winds -> Wind)
  3. Remove special terms (G40, G52)
  4. Remove special characters (parentheses, double space)
  5. Group individually named events by one type (Hurricane George -> Hurricane)

As for the Property Damages, the amount in US Dollars is the combination of the numeric value PROPDMG by the factor PROPDMGEXP, which can be thousands (K), millions (M,m) or billions (B) of dolars. Any other value was converted to a factor of 1.

ty_df <- my_df %>%
  select(EVTYPE, INJURIES, FATALITIES, PROPDMG, PROPDMGEXP) %>%
    mutate(EVTYPE=toupper(EVTYPE)) %>%
    mutate(EVTYPE=str_replace_all(EVTYPE, c("AVALANCE"="AVALANCHE", "LIGHTNING."="LIGHTNING", "WINDSS"="WIND", "THUNDERSTORMW"="THUNDERSTORM WIND", "THUNDERTORM"="THUNDERSTORM"))) %>%
    mutate(EVTYPE=str_replace_all(EVTYPE, c("WINDS"="WIND", "STORMS"="STORM", "FLOODS"="FLOOD", "RAINS"="RAIN", "SLIDES"="SLIDE", "THUNDERSTORMS"="THUNDERSTORM"))) %>%
    mutate(EVTYPE=str_replace_all(EVTYPE, c("FLOODING"="FLOOD", "TSTM"="THUNDERSTORM"))) %>% 
    mutate(EVTYPE=str_replace_all(EVTYPE, c("G40"="", "G52"="", "13"="", "G35"="", "G45"="", "F2"="", "F3"=""))) %>%
    mutate(EVTYPE=str_replace_all(EVTYPE, c("[[(]][[)]]"="", "[[ ]]$"="", "^[[ ]]"="", "[[ ]][[ ]]"=""))) %>%
    mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "^HURRICANE"), "HURRICANE (TYPHOON)")) %>%
    mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "^TYPHOON"), "HURRICANE (TYPHOON)")) %>%
    mutate(EVTYPE=replace(EVTYPE, str_detect(EVTYPE, "FLOOD$"), "FLOOD")) %>%
    mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, "[012345678Hh]", "1")) %>%
    mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, c("[[+]]"="1", "[[-]]$"="1", "[[?]]$"="1", "^$"="1"))) %>%
    mutate(PROPDMGEXP=str_replace_all(PROPDMGEXP, c("K"="1000", "M"="1000000", "B"="1000000000", "m"="1000000"))) %>%
    mutate(PROPDMGEXP=as.numeric(PROPDMGEXP))

Results

1. Across the United States, which types of events are most harmful with respect to population health?

hh_df <- ty_df %>%
  select(EVTYPE, INJURIES, FATALITIES) %>%
    group_by(EVTYPE) %>%
    summarize(SUM.INJURIES=sum(INJURIES), SUM.FATALITIES=sum(FATALITIES)) %>%
    filter(SUM.INJURIES!=0 | SUM.FATALITIES!=0) %>%
    arrange(desc(SUM.INJURIES), desc(SUM.FATALITIES))
head(hh_df)
## # A tibble: 6 x 3
##   EVTYPE            SUM.INJURIES SUM.FATALITIES
##   <chr>                    <dbl>          <dbl>
## 1 TORNADO                  91364           5633
## 2 THUNDERSTORM WIND         9390            705
## 3 FLOOD                     8599           1523
## 4 EXCESSIVE HEAT            6525           1903
## 5 LIGHTNING                 5230            817
## 6 HEAT                      2100            937
plot1 <- ggplot(top_n(hh_df, 10), aes(x=reorder(EVTYPE, SUM.INJURIES), y=SUM.INJURIES)) +
    geom_bar(fill="blue", stat="identity") +
    coord_flip() +
    labs(title="Number of Injuries by Severe Weather Events") +
    labs(subtitle="Top 10 Events") +
    labs(x="Severe Weather Event", y="Injuries")
## Selecting by SUM.FATALITIES
print(plot1)

plot2 <- ggplot(top_n(hh_df, 10), aes(x=reorder(EVTYPE, SUM.FATALITIES), y=SUM.FATALITIES)) +
    geom_bar(fill="red", stat="identity") +
    coord_flip() +
    labs(title="Number of Fatalities by Severe Weather Events") +
    labs(subtitle="Top 10 Events") +
    labs(x="Severe Weather Event", y="Fatalities")
## Selecting by SUM.FATALITIES
print(plot2)

The most harmful event for both injuries and fatalities is Tornado. Thunderstorm Wind and Flood account for an important ammount of injuries, though significantly less than Tornado. On the other hand, Excessive Heat and Flood have an important impact in fatalities.

2. Across the United States, which types of events have the greatest economic consequences?

pd_df <- ty_df %>%
  select(EVTYPE, PROPDMG, PROPDMGEXP) %>%
    mutate(PROPDMGUSD=PROPDMG*PROPDMGEXP/10^9) %>%
    group_by(EVTYPE) %>%
    summarize(PROPDMG.BUSD=sum(PROPDMGUSD)) %>%
    arrange(desc(PROPDMG.BUSD))
head(pd_df)
## # A tibble: 6 x 2
##   EVTYPE              PROPDMG.BUSD
##   <chr>                      <dbl>
## 1 FLOOD                     167   
## 2 HURRICANE (TYPHOON)        85.4 
## 3 TORNADO                    56.9 
## 4 STORM SURGE                43.3 
## 5 HAIL                       15.7 
## 6 THUNDERSTORM WIND           9.72
plot3 <- ggplot(top_n(pd_df, 10), aes(x=reorder(EVTYPE, PROPDMG.BUSD), y=PROPDMG.BUSD)) +
    geom_bar(fill="green", stat="identity") +
    coord_flip() +
    labs(title="Estimated Property Damage by Severe Weather Events") +
    labs(subtitle="Top 10 Events") +
    labs(x="Severe Weather Event", y="Property Damages (Billions USD)")
## Selecting by PROPDMG.BUSD
print(plot3)

Regarding the economic impact of Severe Weather Events, Flood is the most harmful event, with an estimated impact of 167 billions of USD, followed by Hurricane, with 85.4 billions and Tornado, with 56.9 billions.