Synopsis

This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (from 1950 to November 2011) to determine which weather events are most harmful to population health and which have the greatest economic consequences.

This report provides a reproducible workflow for municipal managers to assess environmental risks according to the following methods:

Data Processing

A. Loading the Data

Download the file from the source URL if it does not already exist in the working directory. Then, load the raw CSV file directly.

# Setting up code chunks and loading packages
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(ggplot2)
library(tidyr)

# URL for the dataset
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "repdata_data_StormData.csv.bz2"

# Download if not present
if (!file.exists(fileName)) {
    download.file(fileUrl, destfile = fileName, method = "libcurl", mode = "wb")
}

# Reading the data
storm_data <- read.csv(fileName, sep=",", header=T)

B. Preprocessing the Data

  • In order to process the data effectively, “storm_data” was subsetted to contain only the relevant columns.
  • Next, in order to convert the economic damage data into actual numeric multipliers to calculate total cost, the “EXP” (exponent) character at the end of the “DMG” (damage) variables (“PROPDMGEXP” & “CROPDMGEXP”) needs to be changed
    i.e. ‘K’ (thousands), ‘M’ (millions), or ‘B’ (billions).
  • Thereafter, Property and Crop damage were calculated as numeric values.
# Subsetting the data to keep only the NB columns
Keep_columns <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm_sub <- storm_data[, Keep_columns]

# Exponent conversion
exp <- c("H"=100, "K"=1000, "M"=1e6, "B"=1e9)

# Calculate Property and Crop damage as numeric values
# (use toupper to ensure 'k' and 'K' are included)
storm_sub <- storm_sub %>%
    mutate(
        prop_val = PROPDMG * ifelse(toupper(PROPDMGEXP) %in% names(exp), 
                                    exp[toupper(PROPDMGEXP)], 1),
        crop_val = CROPDMG * ifelse(toupper(CROPDMGEXP) %in% names(exp), 
                                    exp[toupper(CROPDMGEXP)], 1),
        total_econ = prop_val + crop_val
          )

C. Aggregating the Data

  • The “storm_sub” data was aggregated by event type (EVTYPE) to find the totals for health and economic impact.
# Aggregate Health Data
health_totals <- storm_sub %>%
    group_by(EVTYPE) %>%
    summarise(health_impact = sum(FATALITIES + INJURIES, na.rm = TRUE)) %>%
    arrange(desc(health_impact))

# Aggregate Economic Data
econ_totals <- storm_sub %>%
    group_by(EVTYPE) %>%
    summarise(economic_impact = sum(total_econ, na.rm = TRUE)) %>%
    arrange(desc(economic_impact))

Results

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

To determine which events are most harmful to population health, the total number of fatalities and injuries for each event type were aggregated.
The following table and plot show the 10 most harmful event types.

# Getting the top 10 health-harming events
top10_health <- head(health_totals, 10)

# Display the summary table
print(top10_health)
## # A tibble: 10 × 2
##    EVTYPE            health_impact
##    <chr>                     <dbl>
##  1 TORNADO                   96979
##  2 EXCESSIVE HEAT             8428
##  3 TSTM WIND                  7461
##  4 FLOOD                      7259
##  5 LIGHTNING                  6046
##  6 HEAT                       3037
##  7 FLASH FLOOD                2755
##  8 ICE STORM                  2064
##  9 THUNDERSTORM WIND          1621
## 10 WINTER STORM               1527
# Plotting the health impact
ggplot(top10_health, 
       aes(x = reorder(EVTYPE, health_impact), y = health_impact)) +
       geom_bar(stat = "identity", fill = "blue") +
       labs(
        title = "Most Harmful Events to Population Health (Fatalities and Injuries Combined)",
        x = "Event Type",
        y = "Total Number of People Affected"
            ) +
        theme(axis.text.x = element_text(angle = 75, hjust = 1))

Figure 1: This plot illustrates that Tornadoes are the leading cause of health-related issues in the NOAA database, followed by excessive heat and thunderstorm winds

2. Across the United States, which types of events have the greatest economic consequences?

To evaluate economic consequences, the sum of property and crop damage was calculated.
The data was converted into billions of dollars for easier interpretation by municipal managers.

# Getting the top 10 economic-impact events
top_econ <- head(econ_totals, 10)

# Display the summary table
print(top_econ)
## # A tibble: 10 × 2
##    EVTYPE            economic_impact
##    <chr>                       <dbl>
##  1 FLOOD               150319678257 
##  2 HURRICANE/TYPHOON    71913712800 
##  3 TORNADO              57352114049.
##  4 STORM SURGE          43323541000 
##  5 HAIL                 18758222016.
##  6 FLASH FLOOD          17562129167.
##  7 DROUGHT              15018672000 
##  8 HURRICANE            14610229010 
##  9 RIVER FLOOD          10148404500 
## 10 ICE STORM             8967041360
# Plotting the economic impact
ggplot(top_econ, 
       aes(x = reorder(EVTYPE, economic_impact), y = economic_impact / 1e9)) +
       geom_bar(stat = "identity", fill = "blue") +
       labs(
        title = "Events with Greatest Economic Impact (Property and Crop Damage Combined)",
        x = "Event Type",
        y = "Total Economic Damage (Billions of USD)"
           ) +
        theme(axis.text.x = element_text(angle = 75, hjust = 1))

Figure 2: This plot highlights that Floods have the highest total economic impact, followed by damage caused by hurricanes or tornadoes.

2 key questions answered:

1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The analysis shows that Tornadoes are the most harmful to population health in terms of both fatalities and injuries.

2. Across the United States, which types of events have the greatest economic consequences?

The analysis shows that Floods have caused the greatest property and crop damage combined.

=======================================================================================================