Storm Event Impact on Public Health and the Economy

Synopsis

Using the dataset provided by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, we have analyzed the logged weather events and the impacts each had on population health and the economy of the U.S. Due to poor and infrequents records all data prior to 1983 was not included in this analysis. A more thorough study in the future could include this data as well, however, damages should be adjusted for inflation. All weather events were grouped into 19 distinct categories and statistics were calculated with respect to human fatalities, human injuries, and the total damage done to property and crops as expressed in U.S. dollars. The outcomes were considered whether they were directly or indirectly caused by the weather event.

Loss of human life was most significant for weather events related to extreme heat and droughts at 3,178 which occurred more than twice as often as the third and fourth leading causes of weather related fatalities, flooding (1,547) and high wind/dust storms (1,468) respectively. The second leading cause of fatality was tornado/water spout at 2,164.

Injury due to weather events was lead by tornadoes at 34,833 which had almost three times the number of injuries associated with it than the second leading cause, high winds and dust storms 11,974.

Economic impact was dominated by weather events involving water, lead by floods at $179.5B , followed by hurricanes ($90.3B), and tidal related events ($49.3B)

Data Processing

Loading the Data & Summarizing

There should be a section titled Data Processing which describes (in words and code) how the data were loaded into R and processed for analysis. In particular, your analysis must start from the raw CSV file containing the data. You cannot do any preprocessing outside the document. If preprocessing is time-consuming you may consider using the cache = TRUE option for certain code chunks.

The data file to be used for this analysis was downloaded from the coursera reproducible research course cloud. The csv file, repdata-data-StormData.csv.bz2, was compressed via the bzip2 algorithm to reduce its size. First we must extract the file and read the csv into R.

storm_data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
str(storm_data)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Determining Specific Data Needed

There are 902,297 observations with 37 variables. This analysis is concerned with the impacts of events have on public health, more specificially which are the most harmful, and which types of events have the greatest economic consequences.

Load Required Libraries

library(dplyr)

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)

First we will remove all columns that not relevant to this analysis. We will keep BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP

 relevant_columns <- c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
  storm_data_trimmed <- storm_data[, relevant_columns]

Next we will read Dates as Dates.

storm_data_trimmed$BGN_DATE <- as.Date(as.character(storm_data_trimmed$BGN_DATE), "%m/%d/%Y")

Distribution of Data

The data is said to be very sparse in the earlier years likely due to lack of good records. The other thing not mentioned but worth considering is the time value of money. The only way to really assess the economic impact is to make sure we convert all dollar amounts so that they are not weighted by inflation. It would be nice to throw out older data for the aforementioned reasons but we don’t want to throw out significant data. If you look at the histogram produced by the code below you can see the justification for excluding data prior to 1983.

hist(storm_data_trimmed$BGN_DATE, breaks = 63)

storm_data_trimmed2 <- storm_data_trimmed[storm_data_trimmed$BGN_DATE >= '1983-01-01', ]
str(storm_data_trimmed2)

## 'data.frame':    809136 obs. of  8 variables:
##  $ BGN_DATE  : Date, format: "1983-02-01" "1983-02-01" ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 856 856 834 856 856 856 856 856 856 856 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ INJURIES  : num  0 0 1 0 0 0 0 0 0 0 ...
##  $ PROPDMG   : num  0 0 25 0 0 0 0 0 0 0 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 1 1 17 1 1 1 1 1 1 1 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...

The code above removes all observations occuring prior to 1983. Our dataset is now down to 809,136 observations and 8 variables.

Event Types

Looking at National Weather Service Storm Data Documentation, NATIONAL WEATHER SERVICE INSTRUCTION 10-1605, found here, https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf we can see a number of changes that may be of interest to this analysis.

Section 7 of the table of contents lists 48 event types, some of which were further subdivided, however, the summary output above of our raw data shows 985 event types.

EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY"

This total of 48 event types is also confirmed in 2.1.1 Storm Data Event Table on pg. 6 of the report. To look at the unique levels, event types, you can run the R code below. I have surpressed the output due to its length.

levels(storm_data_trimmed2$EVTYPE)

Unfortunately it appears that the dataset we were given to download does not match the link provided for the Storm Data Documentation. For example, the storm data documentation says the following, “The event name of Landslide was renamed to Debris Flow.”; however, when I look at all of the unique EVTYPEs in my dataset there is NO value called Debris Flow, but there are the following: [442] “LANDSLIDE” “LANDSLIDE/URBAN FLOOD” “LANDSLIDES” [445] “Landslump” “LANDSLUMP”

Here grepl can save the day. We can search the names entered into the Storm Database case insensitivity and group them by keywords. This should reduce the number of event categories we have.

storm_data_events <- storm_data_trimmed2
storm_data_events$EVENT <- NA

storm_data_events[grepl("fire|smoke", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Fire & Smoke"
storm_data_events[grepl("dam", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Dam Break"
storm_data_events[grepl("tornado|spout|microburst|funnel", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Tornado/Water Spout"
storm_data_events[grepl("fog", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Fog"
storm_data_events[grepl("volcan", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Volcano"
storm_data_events[grepl("summary", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Miscellaneous"
storm_data_events[grepl("sleet|hail", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Sleet Or Hail"
storm_data_events[grepl("mud|slide", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Mudslide"
storm_data_events[grepl("ice|freeze", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Ice"
storm_data_events[grepl("thunderstorm|tunderstorm|tstm|torrential|lightning", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Thunderstorm"
storm_data_events[grepl("winter|wintry|snow|cold|Blizzard", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Winter Weather"
storm_data_events[grepl("wind|blow|wnd|southeast|dust", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "High Wind & Dust Storms"
storm_data_events[grepl("flood|erosion|erosin|fld|stream", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Flooding"
storm_data_events[grepl("tide|surge|seas|coastal", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Tide"
storm_data_events[grepl("dry|drought|driest", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Drought"
storm_data_events[grepl("heat|hot|warm", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Hot Weather"
storm_data_events[grepl("hurricane", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Hurricane"
storm_data_events[grepl("tropical|wet|wettest", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Tropical Storm"
storm_data_events[grepl("avalanche", storm_data_trimmed2$EVTYPE, ignore.case=TRUE), "EVENT"] <- "Avalanche"

storm_data_events$EVENT <- as.factor(storm_data_events$EVENT)
#storm_data_events$EVENT <- as.character(storm_data_events$EVENT)

This grouping has reduced the 985 distinct event types to 19.

Summarize Public Health Data

Now we will summarize the fatality and injury data by finding the total for each event type:

storm_data_fatalities <- storm_data_events
storm_data_injuries <- storm_data_events

storm_data_fatalities$BGN_DATE <- NULL
storm_data_fatalities$EVTYPE <- NULL
storm_data_fatalities$INJURIES <- NULL
storm_data_fatalities$PROPDMG <- NULL
storm_data_fatalities$PROPDMGEXP <- NULL
storm_data_fatalities$CROPDMG <- NULL
storm_data_fatalities$CROPDMGEXP <- NULL

storm_data_injuries$BGN_DATE <- NULL
storm_data_injuries$EVTYPE <- NULL
storm_data_injuries$injuries <- NULL
storm_data_injuries$PROPDMG <- NULL
storm_data_injuries$PROPDMGEXP <- NULL
storm_data_injuries$CROPDMG <- NULL
storm_data_injuries$CROPDMGEXP <- NULL

storm_data_fatal <- storm_data_fatalities %>% group_by (EVENT) %>% summarise(Fatal = sum(FATALITIES))
storm_data_injury <- storm_data_injuries %>% group_by (EVENT) %>% summarise(Injuries = sum(INJURIES))
storm_data_fatal <- na.omit(storm_data_fatal)
storm_data_injury <- na.omit(storm_data_injury)

Economic Impacts

From the Storm Data Documentation: “Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.”

We need a function to convert the letter given representing power of 10 to multiply by the value given by PROPDMG or CROPDMG. Each type of damage is calculated and then added together for each event type to give the total damage by event category. The final total damage value is expressed in millions of U.S. dollars.

  # exponent value provided to calculate damage
  exponent <- function(x){
    if(is.numeric(x)) {x <- x}
    else if(grepl("h", x, ignore.case=TRUE)) {x <- 2}
    else if(grepl("k", x, ignore.case=TRUE)) {x <- 3}
    else if(grepl("m", x, ignore.case=TRUE)) {x <- 6}
    else if(grepl("b", x, ignore.case=TRUE)) {x <- 9}
    else if(x == "" || x == " "){x <- 0}
    else{x <- 0}
    x
  }

  # calculate total damage given multiplier and exponent
  damage <- function(multiplier, exp){
    power <- exponent(exp)
    if(is.numeric(multiplier)){
      multiplier <- multiplier * (10 ^ power)
    }
    if(!is.numeric(multiplier)){
      multiplier <- 0
    }
    multiplier
  }

# calculate the property and crop damages and combine the to give total estimate damage
storm_data_events$PROPDAMAGE <- mapply(damage, storm_data_events$PROPDMG, storm_data_events$PROPDMGEXP)
storm_data_events$CROPDAMAGE <- mapply(damage, storm_data_events$CROPDMG, storm_data_events$CROPDMGEXP)
storm_data_events$TOTALDAMAGE = storm_data_events$PROPDAMAGE + storm_data_events$CROPDAMAGE
  
storm_data_damage <- storm_data_events
storm_data_damage$BGN_DATE <- NULL
storm_data_damage$EVTYPE <- NULL
storm_data_damage$FATALITIES <- NULL
storm_data_damage$INJURIES <- NULL
storm_data_damage$PROPDMG <- NULL
storm_data_damage$PROPDMGEXP <- NULL
storm_data_damage$CROPDMG <- NULL
storm_data_damage$CROPDMGEXP <- NULL
storm_data_damage$PROPDAMAGE <- NULL
storm_data_damage$CROPDAMAGE <- NULL

storm_data_damage <- storm_data_damage %>% group_by (EVENT) %>% summarise(Damage = (sum(TOTALDAMAGE)/1000000))

Results

Public Health Results

With respect to population health we consider two primary measurable impacts, the first being fatalities directly or indirectly attributed to the weather event, and the second being injuries directly or indirectly attributed to the weather event.

njury due to weather events was lead by tornadoes at 34,833 which had almost three times the number of injuries associated with it than the second leading cause, high winds and dust storms 11,974.

storm_data_fatal

## Source: local data frame [19 x 2]
## 
##                      EVENT Fatal
##                     (fctr) (dbl)
## 1                Avalanche   224
## 2                Dam Break     0
## 3                  Drought     3
## 4             Fire & Smoke    90
## 5                 Flooding  1547
## 6                      Fog    80
## 7  High Wind & Dust Storms  1468
## 8              Hot Weather  3178
## 9                Hurricane   135
## 10                     Ice    98
## 11           Miscellaneous     0
## 12                Mudslide    44
## 13           Sleet Or Hail    17
## 14            Thunderstorm   818
## 15                    Tide    59
## 16     Tornado/Water Spout  2164
## 17          Tropical Storm    66
## 18                 Volcano     0
## 19          Winter Weather   754

ggplot(storm_data_fatal, aes(x = reorder(EVENT,-Fatal), y = Fatal)) + geom_bar(stat = "identity") +labs(title="Total Fatalities by Event Type",x = "Event", y = "Fatalities") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

storm_data_injury

## Source: local data frame [19 x 2]
## 
##                      EVENT Injuries
##                     (fctr)    (dbl)
## 1                Avalanche      171
## 2                Dam Break        0
## 3                  Drought       33
## 4             Fire & Smoke     1608
## 5                 Flooding     8676
## 6                      Fog     1076
## 7  High Wind & Dust Storms    11974
## 8              Hot Weather     9243
## 9                Hurricane     1328
## 10                     Ice     2152
## 11           Miscellaneous        0
## 12                Mudslide       55
## 13           Sleet Or Hail     1371
## 14            Thunderstorm     5275
## 15                    Tide       85
## 16     Tornado/Water Spout    34833
## 17          Tropical Storm      383
## 18                 Volcano        0
## 19          Winter Weather     4146

ggplot(storm_data_injury, aes(x = reorder(EVENT,-Injuries), y = Injuries)) + geom_bar(stat = "identity") +labs(title="Total Injuries by Event Type",x = "Event", y = "Injuries") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Economic Impacts

Here the results of total damages, property and crop, in U.S. dollars per event category are displayed. Total damage values are in millions of U.S. dollars. Economic impact was dominated by weather events involving water, lead by floods at $179.5B , followed by hurricanes ($90.3B), and tidal related events ($49.3B)

storm_data_damage

## Source: local data frame [20 x 2]
## 
##                      EVENT      Damage
##                     (fctr)       (dbl)
## 1                Avalanche      8.7218
## 2                Dam Break      1.0020
## 3                  Drought  15025.4196
## 4             Fire & Smoke   8900.0101
## 5                 Flooding 179544.0857
## 6                      Fog     25.0115
## 7  High Wind & Dust Storms  19681.4159
## 8              Hot Weather    924.8050
## 9                Hurricane  90271.4728
## 10                     Ice  10888.8647
## 11           Miscellaneous      0.0000
## 12                Mudslide    346.8431
## 13           Sleet Or Hail  19020.9310
## 14            Thunderstorm   2177.5381
## 15                    Tide  48450.2002
## 16     Tornado/Water Spout  39771.4003
## 17          Tropical Storm   8624.0585
## 18                 Volcano      0.5000
## 19          Winter Weather  10121.6581
## 20                      NA   4992.4017

ggplot(storm_data_damage, aes(x = reorder(EVENT,-Damage), y = Damage)) + geom_bar(stat = "identity") +labs(title="Total Damage ($) by Event Type",x = "Event", y = "Total Damages in Millions ($)") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Storm Event Impact on Public Health and the Economy

Prepared for Coursera Reproducible Research Course

October 24, 2015

Synopsis

Data Processing

Loading the Data & Summarizing

Determining Specific Data Needed

Load Required Libraries

Distribution of Data

Event Types

Summarize Public Health Data

Economic Impacts

Results

Public Health Results

Economic Impacts