Synopsis

This project examines storm data obtained from the U.S. National Oceanic and Atmospheric Administration (NOAA), focusing on variables related to disaster type, public health effects, and economic losses. It seeks to answer two primary questions: (1) Which event types (as classified under the EVTYPE variable) pose the greatest threat to population health across the United States? (2) Which event types incur the most significant economic costs nationwide? Our analysis reveals that tornadoes are the most detrimental to population health, while floods result in the highest economic damage.

Data Processing

Loading necessary libraries and setting options

options(digits = 1) # One decimal after
options(scipen = 999) # Turn off scientific notation for numbers
library(ggplot2)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

To load the data file, we employ the read.csv() function in conjunction with bzfile() to handle the compressed file format.

filename <- "~/R Projects/RepData_PeerAssessment2/repdata_data_StormData.csv.bz2"
data <- read.csv(bzfile(filename))

For the population health measure, we aggregate fatalities and injuries into a single total and assign this value to a newly created variable/column.

data$populationHealth <- data$FATALITIES + data$INJURIES

For economic impact, it is important to note that two additional variables, PROPDMGEXP and CROPDMGEXP, indicate the multiplier for property and crop damage, respectively. In these columns, “K” denotes thousands, “M” represents millions, and “B” signifies billions. To convert these alphabetic codes into their corresponding numeric values, we apply the mutate() function from the dplyr package, while replacing any missing values with 1.

data <- data %>%
  mutate(PROPDMGEXP = case_when(
    PROPDMGEXP == "K" ~ 1000,
    PROPDMGEXP == "M" ~ 1000000,
    PROPDMGEXP == "B" ~ 1000000000,
    TRUE ~ 1
  ))

data <- data %>%
  mutate(CROPDMGEXP = case_when(
    CROPDMGEXP == "K" ~ 1000,
    CROPDMGEXP == "M" ~ 1000000,
    CROPDMGEXP == "B" ~ 1000000000,
    TRUE ~ 1
  ))

With the numeric multipliers in place, we compute the total economic impact by multiplying the property and crop damage values by their respective factors and then summing the two products.

data$economicImpact <- data$PROPDMG * data$PROPDMGEXP + data$CROPDMG * data$CROPDMGEXP

We then generate a summarized version of the dataset that retains only the event type, the aggregate population health impact, and the total economic damage.

# Calculate the sum of total fatalities by event type
health <- data %>%
  group_by(EVTYPE) %>%
  summarize(health = sum(populationHealth, na.rm = TRUE))
# Sorting the event types from highest to lowest fatalities
health <- arrange(health, desc(health))

# Calculate the sum of total economical damage by event type
economy <- data %>%
  group_by(EVTYPE) %>%
  summarize(economy = sum(economicImpact, na.rm = TRUE))
# Sorting the event types from highest to lowest cost of damage
economy <- arrange(economy, desc(economy))

Results

Impact on Population Health

View the top 10 type events with the highest impact on population health.

head(health, n=10)
## # A tibble: 10 × 2
##    EVTYPE            health
##    <chr>              <dbl>
##  1 TORNADO            96979
##  2 EXCESSIVE HEAT      8428
##  3 TSTM WIND           7461
##  4 FLOOD               7259
##  5 LIGHTNING           6046
##  6 HEAT                3037
##  7 FLASH FLOOD         2755
##  8 ICE STORM           2064
##  9 THUNDERSTORM WIND   1621
## 10 WINTER STORM        1527

Tornadoes rank as the event type with the greatest impact on population health, accounting for nearly 100,000 fatalities. They are followed, in descending order, by excessive heat, very strong wind, floods, lightning, heat, flash floods, ice storms, thunderstorm wind, and winter storms (refer to Figure 1).

Figure 1. The Ten Event Types with the Most Severe Effect on Population Health.

ggplot(health %>% head(n = 10), aes(x = reorder(EVTYPE, -health), y = health, fill = EVTYPE)) + 
  geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + 
  xlab("") + 
  ylab("Population") + 
  ggtitle("Top Ten Event Types by Population Health Impact")

Impact on Economy

View the top 10 type events with the highest economical damages.

head(economy, n=10)
## # A tibble: 10 × 2
##    EVTYPE                  economy
##    <chr>                     <dbl>
##  1 FLOOD             150319678257 
##  2 HURRICANE/TYPHOON  71913712800 
##  3 TORNADO            57340614060.
##  4 STORM SURGE        43323541000 
##  5 HAIL               18752904943.
##  6 FLASH FLOOD        17562129167.
##  7 DROUGHT            15018672000 
##  8 HURRICANE          14610229010 
##  9 RIVER FLOOD        10148404500 
## 10 ICE STORM           8967041360

Floods are attributed to the highest economic losses, which account for over $150 billion in damages. The remaining top event types—ranked sequentially—are typhoon-like hurricanes, tornadoes, storm surges, hail, flash floods, drought, hurricanes, river floods, and ice storms (refer to Figure 2).

Figure 2. The Ten Event Types with the Most Damage cost and Economical Impact.

ggplot(economy %>% head(n = 10), aes(x = reorder(EVTYPE, -economy), y = economy, fill = EVTYPE)) + 
  geom_bar(stat = "identity") + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1)) + 
  xlab("") + 
  ylab("Damage Cost") + 
  ggtitle("Top Ten Event Types by Damage Cost in Dollars")