Synopsis

There are many different types of storms that wreak havoc across the United States. This report will look at how those different types of storms affect the population health of the United States, as well as the economic damages these storms accrue. When looking at population health, a data subset was created to look at exclusively storm type, fatalities, and injuries. Said subset was then grouped by storm type and summarized. A top ten list was then generated with the storm types that caused the most fatalities and injuries combined. For economic impact, the base values and exponents had to be combined to get the true dollar amount. The data was then grouped by storm type and summarized. As with population health, a top ten list of the storm types with the highest total economic damage was generated. Plots were then generated for both these factors using RStudio and the ggplot2 plotting system. When looking at the impact on population health, tornadoes have the highest levels of total harm. Meanwhile, when looking at economic impact, floods have the highest economic consequences.

Data Processing

The initial step in any data processing is loading the raw data. First, we loaded the raw Storm Data CSV file into R. The necessary libraries were then loaded as well.

setwd("~/Storm Data")
storm_data <- read.csv("repdata_data_StormData.csv")

library(dplyr)
library(ggplot2)

The first question we set out to answer was: “Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?” Our first step was to subset our data. This new subset contained only the storm type, inuries, and fatalities.

health_data <- storm_data %>%
        select(EVTYPE, FATALITIES, INJURIES)

The data was then grouped by storm type. All injuries and all fatalities were then summed up, and summed together to get a variable of the total harm done.A list of the top 10 most harmful storm events was then generated from this summary.

harm_summary <- health_data %>%
        group_by(EVTYPE) %>%
        summarise(
                Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
                Total_Injuries = sum(INJURIES, na.rm = TRUE),
                Total_Harm = Total_Fatalities + Total_Injuries
        ) %>%
        arrange(desc(Total_Harm))
top_harmful_events <- head(harm_summary, 10)
print(top_harmful_events)
## # A tibble: 10 × 4
##    EVTYPE            Total_Fatalities Total_Injuries Total_Harm
##    <chr>                        <dbl>          <dbl>      <dbl>
##  1 TORNADO                       5633          91346      96979
##  2 EXCESSIVE HEAT                1903           6525       8428
##  3 TSTM WIND                      504           6957       7461
##  4 FLOOD                          470           6789       7259
##  5 LIGHTNING                      816           5230       6046
##  6 HEAT                           937           2100       3037
##  7 FLASH FLOOD                    978           1777       2755
##  8 ICE STORM                       89           1975       2064
##  9 THUNDERSTORM WIND              133           1488       1621
## 10 WINTER STORM                   206           1321       1527

A bargraph was then generated using this list. The plot was generated using R’s ggplot2 plotting system.

ggplot(top_harmful_events, aes(x = reorder(EVTYPE, Total_Harm), y = Total_Harm)) +
        geom_bar(stat = "identity", fill = "steelblue") +
        coord_flip() +
        labs(title = "Top 10 Most Harmful Weather Events in the US",
             x = "Event Type",
             y = "Total Harm (Fatalities + Injuries)") +
        theme_minimal()

The next question we aimed to answer was: “Across the United States, which types of events have the greatest economic consequences?” The first step in answering this question was to calculate what the true dollar amount was for storm-related damages. This required the base values, PROPDMG and CROPDMG respectively, to be combined with their related exponent value.

First, a function had to be created that converted the exponents into numeric multipliers: K into 1,000, M into 1,000,000, and B into 1,000,000,000.

exp_to_num <- function(exp) {
        exp <- toupper(exp)
        ifelse(exp == "K", 1e3,
               ifelse(exp == "M", 1e6,
                      ifelse(exp == "B", 1e9, 1)))
}

This function was then applied to the exponent variables, PROPDMGEXP and CROPDMGEXP. These newly re-written variables were then multiplied with their base value variables. A new numeric variable, TOTALDMG, was then created which summed both CROPDMG and PROPDMG together.

storm_data <- storm_data %>%
        mutate(
                PROPDMGEXP = exp_to_num(PROPDMGEXP),
                CROPDMGEXP = exp_to_num(CROPDMGEXP),
                PROPDMGVAL = PROPDMG * PROPDMGEXP,
                CROPDMGVAL = CROPDMG * CROPDMGEXP,
                TOTALDMG = PROPDMGVAL + CROPDMGVAL
        )

The next step was then to summarize the total economic damage by storm event. As with population health, a top 10 list of the storm types with the most economic damages was then generated.

economic_impact <- storm_data %>%
        group_by(EVTYPE) %>%
        summarise(Total_Economic_Damage = sum(TOTALDMG, na.rm = TRUE)) %>%
        arrange(desc(Total_Economic_Damage))
top_economic_events <- head(economic_impact, 10)
print(top_economic_events)
## # A tibble: 10 × 2
##    EVTYPE            Total_Economic_Damage
##    <chr>                             <dbl>
##  1 FLOOD                     150319678257 
##  2 HURRICANE/TYPHOON          71913712800 
##  3 TORNADO                    57352114049.
##  4 STORM SURGE                43323541000 
##  5 HAIL                       18758221521.
##  6 FLASH FLOOD                17562129167.
##  7 DROUGHT                    15018672000 
##  8 HURRICANE                  14610229010 
##  9 RIVER FLOOD                10148404500 
## 10 ICE STORM                   8967041360

A bargraph was then generated using this list. Again, the plot was generated using R’s ggplot2 plotting system.

ggplot(top_economic_events, aes(x = reorder(EVTYPE, Total_Economic_Damage), y = Total_Economic_Damage / 1e9)) +
        geom_bar(stat = "identity", fill = "darkgreen") +
        coord_flip() +
        labs(title = "Top 10 Weather Events by Economic Impact in the US",
             x = "Event Type",
             y = "Total Damage (Billions USD)") +
        theme_minimal()

Results

When it comes to impact to the US population, tornadoes have done the most harm. Across the 61 years of data collection, tornadoes caused a total of 5,633 fatalities and 91,346 injuries, leading to harm total of 96,979 people. Excessive heat is the next highest in terms of harm, with 8,428 people injured or dead. Third for total harm done is thunderstorm wind with 7,461 people either injured or dead.

Figure 1. Top 10 Storm Types with the highest harm, combined injuries and fatalities.

For the storm event with the farthest reaching economic consequences, floods are the highest with a combined total of $150,319,678,257 in property and crop damage. Hurricanes/typhoons come in second with $71,913,712,800 in combined damages. Tornadoes are third in combined property and crop damage with $57,352,114,049.

Figure 2. Top 10 Storm Types with the highest total economic impact, combined property damage and crop damage.