Synopsis

In this report we aim to determine the types of severe storm events that have the most significant impact on the health of the U.S. population and the greatest economic consequences in the United States. The data was collected by the U.S. National Oceanic and Atmospheric Administration across the country between the years of 1950 and 2011. To investigate storm events’ impact on population health, we looked at the Fatalities and Injuries caused by events, while we focused on Property Damage and Crop Damage to explore economic impact. For all of these variables, we used the sum of each variable for each event type across all locations and years to assess impact.

From the data, we found that Tornadoes have by far the greatest impact on population health in terms of both Fatalities and Injuries. Heat related events proved to be the next most fatal. Floods had significantly greater economic consequences than all other events, and property damage far outweighs crop damage in most cases for each event. The exception to this was Droughts, which caused far more crop damage than proprety damage.

Data Processing

Load libraries to be used in analysis.

library(dplyr)
library(tidyr)
library(ggplot2)
library(tidytext)
library(stringdist)

Read in the Storm Data dataset and a list of event names, as specified by the National Weather Service Storm Data Documentation file.

allData <- read.csv("repdata_data_StormData.csv")
events <- read.csv("EventList.csv")

Population Health Data Processing

Select only the Event Type, Fatalities, and Injuries columns from the complete dataset, then group the data by Event Type, thus summing the data across all years, locations, etc.

popHealthData <- allData %>% select(EVTYPE, FATALITIES, INJURIES) %>% group_by(EVTYPE) %>% 
                summarise_all(sum) 

Clean up all the variations in recording of Event Types by approximately matching the data to the list of events provided by the Storm Data Documentation file using the Jaro-Winkler distance method and a max distance of 5. Group by Event Type and sum the data again.

cleanHealthData <- popHealthData %>% mutate(EVTYPE =
        events$EVTYPE[amatch(popHealthData$EVTYPE, toupper(events$EVTYPE), method="jw", 
        maxDist=5)]) %>% group_by(EVTYPE) %>% summarise_all(sum) 

Economic Consequences Data Processing

Select the Event Type, Property Damage, Property Damage Exponent, Crop Damage, and Crop Damage Exponent columns from the complete dataset. Since we only care about the events with the most damage, filter out rows with no damage and rows where the exponent is not either “K”, “M”, or “B”. This reduces the size of the dataset. Reasoning shown below.

econData <- allData %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% 
            filter((PROPDMG != 0 | CROPDMG != 0) & 
                (PROPDMGEXP %in% c('K','M','B') | CROPDMGEXP %in% c('K','M','B')))

These tables show that most Property/Crop Damage Exponents are missing, corresponding to 0 Property/Crop damage and are therefore not relevant to this analysis. The remaining values are overwhelmingly either “K”, “M”, or “B”, thus other values have a negligble effect on the data and can be removed.

table(allData$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330
table(allData$CROPDMGEXP)
## 
##             ?      0      2      B      k      K      m      M 
## 618413      7     19      1      9     21 281832      1   1994

The code below multiplies the Property & Crop Damage values by the factor corresponding to their associated Exponent values, then replaces the original value with the new calculated value.

econData$PROPDMG[econData$PROPDMGEXP == "K"] <- econData$PROPDMG[econData$PROPDMGEXP == "K"] * 1000
econData$PROPDMG[econData$PROPDMGEXP == "M"] <- econData$PROPDMG[econData$PROPDMGEXP == "M"] * 1000000
econData$PROPDMG[econData$PROPDMGEXP == "B"] <- econData$PROPDMG[econData$PROPDMGEXP == "B"] * 1000000000

econData$CROPDMG[econData$CROPDMGEXP == "K"] <- econData$CROPDMG[econData$CROPDMGEXP == "K"] * 1000
econData$CROPDMG[econData$CROPDMGEXP == "M"] <- econData$CROPDMG[econData$CROPDMGEXP == "M"] * 1000000
econData$CROPDMG[econData$CROPDMGEXP == "B"] <- econData$CROPDMG[econData$CROPDMGEXP == "B"] * 1000000000

Use the same method of cleaning up documentation variation in Event Type as for the Population Health data.

cleanEconData <- econData %>% select(EVTYPE, PROPDMG, CROPDMG) %>% mutate(EVTYPE =
            events$EVTYPE[amatch(econData$EVTYPE, toupper(events$EVTYPE), method="jw", 
            maxDist=5)]) %>% group_by(EVTYPE) %>% summarise_all(sum) 

Results

Population Health Impact

Pull out the top 10 Event Types by Injuries, as there were more Injuries than Fatalities in most events and they therefore hold more weight in the impact on population health. Format the data to be plotted by Fatalities and Injuries.

healthPlot <- cleanHealthData %>% top_n(10, INJURIES) %>% gather("Impact", "Count", 2:3) %>%
    arrange(desc(Count))

Plot the Population Health data, showing the approximate top 10 Event Types for both Injuries and Fatalities.

ggplot(healthPlot, aes(x = reorder_within(EVTYPE, -Count, Impact), y = Count)) +
    geom_bar(stat = "identity") +
    scale_x_reordered() +
    facet_wrap(~Impact, scales = "free") +
    theme(axis.text.x=element_text(angle=30, vjust=.8, hjust=0.8)) +
    labs(x = "Event Type", y = "Number of Fatalities/Injuries")

The plot above shows that across all years and locations in which Storm Data has been collected, Tornadoes have had by far the greatest impact on population health in terms of both fatalities and injuries. High Wind and Floods have the next greatest overall impact, with heavy weight on injuries over fatalities. Excessive Heat and Heat proved to be the next two most fatal event types.

Economic Consequences

Create a dataset with a totalCost column that is the sum of Property and Crop Damage for each Event Type, then pull out the top 10 Event Types based on totalCost. Create a second dataset that formats the first to be plotted by Property Damage and Crop Damage.

totalEconPlot <- cleanEconData %>% mutate(totalCost = PROPDMG + CROPDMG) %>% 
            top_n(10, totalCost) %>% arrange(desc(totalCost))

separateEconPlot <- totalEconPlot %>% gather("DmgType", "Cost", 2:3) %>% arrange(desc(Cost))

Plot the Economic Consequence data, showing the top 10 Event Types by total economic cost.

ggplot(totalEconPlot, aes(x = reorder(EVTYPE, -totalCost), totalCost)) +
    geom_bar(stat = "identity") +
    theme(axis.text.x=element_text(angle=30, vjust=.8, hjust=0.8)) +
    labs(x = "Event Type", y = "Total Economic Cost")

The plot above shows that across all years and locations in which Storm Data has been collected, Floods had significantly greater overall economic impact than any other type of event. Hurricanes (Typhoons) had the next greatest impact, followed by Tornadoes and Storm Surges/Tides. Now we will look at which types of events had greater Property versus Crop Damage costs, as well as which type of damage contributed most to the overall economic cost of each event.

Plot the Economic Consequence data, showing the approximate top 10 Event Types for both Property Damage and Crop Damage.

ggplot(separateEconPlot, aes(x = reorder_within(EVTYPE, -Cost, DmgType), y = Cost)) +
    geom_bar(stat = "identity") +
    scale_x_reordered() +
    facet_wrap(~DmgType, scales = "free") +
    theme(axis.text.x=element_text(angle=30, vjust=.8, hjust=0.8)) +
    labs(x = "Event Type", y = "Property/Crop Damage Cost")

In this plot, we can see that the plot for property damage cost nearly mirrors the plot for overall economic cost, showing that the cost of property damage is typically much greater than that of crop damage for most events. The exception to this is Droughts, which had more than double the crop damage cost than the next type of event, and this crop damage contributed far more to the overall cost than the property damage.