Title: Analysis of Weather Events Recorded from 1966 - 2011 by the U.S. National Oceanic and Atmospheric Administration (NOAA)

Synopsis:

The purpose of this analysis is to examine the Storm Data dataset provided by the U.S. National Oceanic and Atmospheric Administration (NOAA), which captures weather events from 1950 to 2011, to answer the following questions: 1.) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health; and 2) Across the United States, which types of events have the greatest economic consequences?

This analysis discovers through the Storm Data dataset that tornadoes are by far the most impactful weather events to the US human population in terms of fatalities and injuries. For economic impact, floods and drought are the most significant weather events.

Data Processing:

The data for this analysis comes in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Consequently, when analyzing this dataset, any observations prior to 1966 have been discarded due to the unreliability and sparcity of the data collected. Addtionally, since the focus of this analysis is on the human and economic impact of weather events, observations are limited to those weather events that had associated fatalities/injuries or property/crop damages

The following variables will be analyzed from the Storm Data dataset:

  1. BGN_DATE: Date indicating beginning date of the weather event, as recorded by NOAA
  2. EVTYPE: factor variable indicating the type of weather event. The NOAA documentation indicates there are 48 defined event types.
  3. FATALITES: numeric variable indicating the number of fatalities associated with the event
  4. INJURIES: numeric variable indicating the number of fatalities associated with the event
  5. PROPDMG: numeric variable indicating the property damage
  6. PROPDMGEXP: exponential multiplier to be used in conjunction with PROPDMG
    • values in this analysis will be constrained to:
    1. H = 100
    2. K = 1000
    3. M = 1000000
    4. B = 1000000000
  7. CROPDMG: numeric variable indicating the crop damage
  8. CROPDMGEXP: exponential multiplier to be used in conjunction with CROPDMG
    • values in this analysis will be constrained to:
    1. H = 100
    2. K = 1000
    3. M = 1000000
    4. B = 1000000000

Setup

R Packages needed for this analysis:

suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(lubridate))
suppressPackageStartupMessages(library(ggplot2))

#disable scientific notation
options(scipen=999)

Download the Storm Data dataset, unzip, and begin processing:

StormData = "Storm-Dataset.csv.bz2"
if ( ! file.exists(StormData)) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", StormData, quiet = TRUE)
    print("Downloaded file to working directory")
}

setInternet2(use = TRUE)
data <- read.csv(bzfile(StormData))

# convert BGN_DATE to a date
data$BGN_DATE <- mdy_hms(data$BGN_DATE)

#change EVTYP to Uppercase
data$EVTYPE <- toupper(data$EVTYPE)

One of the discoveries in the storm dataset was that the EVTYPE variable for the weather event did not always map to the list of 48 valid, documented weather types. To properly and consistently categorize the weather events captured, a mapping to the correct weather event was created. EVTYPE.CLEAN conforms to one of the 48 valid weather events.

# Read in Event Types
evtype.mapping <- read.csv("EVTYPE_Mapping.csv")

head(evtype.mapping, 10)
##            EVTYPE      EVTYPE.CLEAN
## 1         TORNADO           TORNADO
## 2  EXCESSIVE HEAT    EXCESSIVE HEAT
## 3     FLASH FLOOD       FLASH FLOOD
## 4            HEAT              HEAT
## 5       LIGHTNING         LIGHTNING
## 6       TSTM WIND THUNDERSTORM WIND
## 7           FLOOD             FLOOD
## 8     RIP CURRENT       RIP CURRENT
## 9       HIGH WIND         HIGH WIND
## 10      AVALANCHE         AVALANCHE

Convert H, K, M, and B Exponent values to compute the property and crop damage values using the PROPDMGEXP and CROPDMGEXP variables:

exp.multiplier.df <- data.frame(exp=c("H", "K", "M", "B"), multiplier=c(100, 1000, 1000000, 1000000000))

head(exp.multiplier.df)
##   exp multiplier
## 1   H        100
## 2   K       1000
## 3   M    1000000
## 4   B 1000000000

Begin processing the Storm Data dataset by limiting the recorded observations to those that had fatalities, injuries, property damage, or crop damage. Additionally, observations are limited to the year 1966 forward. All observations not meeting these criteria are discarded for this analysis.

Additionally, this step converts the EVTYPE to one of the valid 48 documented event types.

stormdata <- data %>%
             filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
             filter(BGN_DATE >= '1966-01-01') %>%
             left_join(evtype.mapping, by = c("EVTYPE" = "EVTYPE")) %>%
             transmute(event.type         = EVTYPE.CLEAN,  # convert to one of the 48 valid event types
                       fatalities         = ifelse(is.na(FATALITIES), 0, FATALITIES),
                       injuries           = ifelse(is.na(INJURIES),   0, INJURIES),
                       property.damage    = PROPDMG,
                       property.damage.exp  = toupper(PROPDMGEXP),  # convert lowercase exponents to uppercase
                       crop.damage          = CROPDMG,
                       crop.damage.exp      = toupper(CROPDMGEXP))  # convert lowercase exponents to uppercase

Finally, calculate the property and crop damage to create the data frame for the analysis:

stormdata <- 
   stormdata %>%
       left_join(exp.multiplier.df, by=c("property.damage.exp" = "exp"))  %>%
       left_join(exp.multiplier.df, by=c("crop.damage.exp" = "exp")) %>%
             mutate(property.damage.value = ifelse(is.na(property.damage), 0, property.damage) * 
                                            ifelse(is.na(multiplier.x), 0, multiplier.x),
                    crop.damage.value     = ifelse(is.na(crop.damage), 0, crop.damage) * 
                                            ifelse(is.na(multiplier.y), 0, multiplier.y))

stormdata$multiplier.x <- NULL
stormdata$multiplier.y <- NULL
stormdata$crop.damage.exp <- NULL
stormdata$property.damage.exp <- NULL

The resultant data looks like:

head(stormdata, 10)
##    event.type fatalities injuries property.damage crop.damage
## 1     TORNADO          0       11           250.0           0
## 2     TORNADO          0        2            25.0           0
## 3     TORNADO          1        3           250.0           0
## 4     TORNADO          0        0            25.0           0
## 5     TORNADO          0        0             2.5           0
## 6     TORNADO          0        1           250.0           0
## 7     TORNADO          0        0            25.0           0
## 8     TORNADO          0        0           250.0           0
## 9     TORNADO          0        0            25.0           0
## 10    TORNADO          0        0            25.0           0
##    property.damage.value crop.damage.value
## 1                 250000                 0
## 2                  25000                 0
## 3                 250000                 0
## 4                  25000                 0
## 5                   2500                 0
## 6                 250000                 0
## 7                  25000                 0
## 8                 250000                 0
## 9                  25000                 0
## 10                 25000                 0

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Results of the analysis - Fatalies by event:

stormdata.fatalities <- stormdata %>%
                        group_by(event.type)  %>%
                        summarise(fatalities = sum(fatalities)) %>%
                        top_n(10) %>%
                        transmute(event  = event.type,
                                  impact = "Fatalities",
                                  total  = fatalities)
## Selecting by fatalities
arrange(stormdata.fatalities, desc(total))
## Source: local data frame [10 x 3]
## 
##                      event     impact total
## 1                  TORNADO Fatalities  3681
## 2           EXCESSIVE HEAT Fatalities  2195
## 3              FLASH FLOOD Fatalities  1036
## 4                     HEAT Fatalities   937
## 5                LIGHTNING Fatalities   817
## 6        THUNDERSTORM WIND Fatalities   712
## 7              RIP CURRENT Fatalities   577
## 8                    FLOOD Fatalities   512
## 9                HIGH WIND Fatalities   323
## 10 EXTREME COLD/WIND CHILL Fatalities   317

Results of the analysis - Injuries by event:

stormdata.injuries <- stormdata %>%
                      group_by(event.type)  %>%
                      summarise(injuries = sum(injuries)) %>%
                      top_n(10) %>%
                      transmute(event  = event.type,
                                impact = "Injuries",
                                total  = injuries) 
## Selecting by injuries
 arrange(stormdata.injuries , desc(total))  
## Source: local data frame [10 x 3]
## 
##                event   impact total
## 1            TORNADO Injuries 67618
## 2  THUNDERSTORM WIND Injuries  9511
## 3     EXCESSIVE HEAT Injuries  7111
## 4              FLOOD Injuries  6873
## 5          LIGHTNING Injuries  5232
## 6          ICE STORM Injuries  2483
## 7               HEAT Injuries  2100
## 8        FLASH FLOOD Injuries  1800
## 9          HIGH WIND Injuries  1615
## 10          WILDFIRE Injuries  1608

Figure 1 shows that tornadoes are clearly the most impactful upon the human population. Tornadoes are the number one event for both total numbers associated fatalities and injuries. Injuries from tornadoes far exceed another single event. Second to tornadoes we see that Excessive Heat also contributes to fatalities and injuries.

stormdata.top10 <- rbind(stormdata.fatalities, stormdata.injuries)

ggplot(stormdata.top10, aes(x=total, y=event)) +  geom_segment(aes(yend=event), xend=0, colour="grey50") +
  geom_point(size = 3, aes(colour = impact)) + 
  scale_colour_brewer(palette ="Set1", limits=c("Fatalities", "Injuries"), guide=FALSE) + 
  theme_grey() + theme(panel.grid.major.y = element_blank()) +
  facet_grid(impact ~ ., scales ="free_y", space="free_y") +
  labs(title = "Figure 1: Top 10 Weather Events - Fatalities vs. Injuries ", x="Total", y="Weather Event")

Across the United States, which types of events have the greatest economic consequences?

Results of the analysis - Property Damage by event:

stormdata.property.damage <- stormdata %>%
                     group_by(event.type)  %>% na.omit() %>%
                     summarise(property.damage = sum(property.damage.value)) %>%
                     top_n(10) %>% 
                     transmute(event  = event.type,
                              impact = "Property Damage",
                              total  = property.damage) 
## Selecting by property.damage
arrange(stormdata.property.damage, desc(total))
## Source: local data frame [10 x 3]
## 
##                event          impact        total
## 1              FLOOD Property Damage 150234934300
## 2  HURRICANE/TYPHOON Property Damage  85366885010
## 3            TORNADO Property Damage  53040326470
## 4   STORM SURGE/TIDE Property Damage  47965244000
## 5        FLASH FLOOD Property Damage  16906877610
## 6               HAIL Property Damage  15974564720
## 7  THUNDERSTORM WIND Property Damage   9821780280
## 8           WILDFIRE Property Damage   8496628500
## 9     TROPICAL STORM Property Damage   7703890550
## 10      WINTER STORM Property Damage   6689064800

Results of the analysis - Crop Damage by event:

# Crop Damage 
stormdata.crop.damage <- stormdata %>%   
  group_by(event.type)  %>% 
  summarise(crop.damage = sum(crop.damage.value)) %>%
  top_n(10) %>% 
  transmute(event  = event.type,
            impact = "Crop Damage",
            total  = crop.damage) 
## Selecting by crop.damage
arrange(stormdata.crop.damage, desc(total))
## Source: local data frame [10 x 3]
## 
##                      event      impact       total
## 1                  DROUGHT Crop Damage 13972621780
## 2                    FLOOD Crop Damage 10855941050
## 3        HURRICANE/TYPHOON Crop Damage  5532667800
## 4                ICE STORM Crop Damage  5022114300
## 5                     HAIL Crop Damage  3046937600
## 6             FROST/FREEZE Crop Damage  1700831000
## 7              FLASH FLOOD Crop Damage  1532197150
## 8  EXTREME COLD/WIND CHILL Crop Damage  1330023000
## 9        THUNDERSTORM WIND Crop Damage  1258359900
## 10              HEAVY RAIN Crop Damage  1021770800

Figure 2 shows that floods, hurricanes/typhoons, and tornadoes are the most impactful weather events for property damage. Floods are by far the single biggest contributor to property damage in the United States.

Drought and floods are the most significant weather events for crop damage.

stormdata.top10.damage <- rbind(stormdata.property.damage, stormdata.crop.damage)

ggplot(stormdata.top10.damage, aes(x=total/10^9, y=event)) +  geom_segment(aes(yend=event), xend=0, colour="grey50") +
  geom_point(size = 3, aes(colour = impact)) + 
  scale_colour_brewer(palette ="Set1", limits=c("Property Damage", "Crop Damage"), guide=FALSE) + 
  theme_grey()  + theme(panel.grid.major.y = element_blank()) +
  facet_grid(impact ~ ., scales ="free_y", space="free_y") +
  labs(title = "Figure 2: Top 10 Weather Events and Economic Impact", x="Total Damage (Billions USD)", y="Weather Event")