Analysis of Severe Weather Events Impact on Population Health and Economy in the United States

Synopsis

This study delves into the ramifications of severe weather phenomena on both public health and economic sectors in the United States, leveraging NOAA storm database records. Tornadoes, Excessive Heat, and TSTM Wind stand out as the most fatal events, each displaying distinct temporal patterns. Furthermore, floods emerge as the primary contributors to economic losses, surpassing $144 billion in total property damage. These insights underscore the critical importance of implementing robust disaster preparedness and mitigation measures to alleviate the detrimental impacts on both human lives and the economy.

Data Processing

The data were processed starting from the raw CSV file containing the NOAA storm database. The following steps were taken to prepare the data for analysis:

  1. First, all necessary libraries are loaded and installed.

  2. Then, the bz2 file is downloaded locally to later load the csv file containing the data for this project.

  3. Finally, time and date parameters are transformed into the correct format and all the data set was converted to a data frame for easy handle.

library(ggplot2)  
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(gridExtra)
library(kableExtra)
library(knitr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following object is masked from 'package:gridExtra':
## 
##     combine
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(viridisLite)
library(cowplot)
## 
## Attaching package: 'cowplot'
## The following object is masked from 'package:lubridate':
## 
##     stamp
# Local Download the bz2 file

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "StormData.csv.bz2"
download.file(url, dest_file)

# Read the file

dt <- readr::read_csv(dest_file)
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Convert BGN_DATE to Date format
dt$BGN_DATE <- as.Date(dt$BGN_DATE, format = "%m/%d/%Y")

# Convert BGN_TIME to Hour format (HH:MM)
dt$BGN_TIME <- strptime(dt$BGN_TIME, format = "%H%M")
dt$BGN_TIME <- format(dt$BGN_TIME, format = "%H:%M")

# Create dataframe df
df <- as.data.frame(dt)

Results

1. Most Harmful Events to Population Health

A summary of the database is performed by grouping events and their total count of deaths and injuries. Subsequently, the top 10 most catastrophic events are displayed.

# Calculate total fatalities and injuries for each event type
events_health_impact <- df %>%
  group_by(EVTYPE) %>%
  summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE),
            total_injuries = sum(INJURIES, na.rm = TRUE)) %>%
  filter(!is.na(total_fatalities) & !is.na(total_injuries)) %>%
  arrange(desc(total_fatalities + total_injuries))

# Display table of most harmful events to population health
head(events_health_impact,10)
## # A tibble: 10 × 3
##    EVTYPE            total_fatalities total_injuries
##    <chr>                        <dbl>          <dbl>
##  1 TORNADO                       5633          91346
##  2 EXCESSIVE HEAT                1903           6525
##  3 TSTM WIND                      504           6957
##  4 FLOOD                          470           6789
##  5 LIGHTNING                      816           5230
##  6 HEAT                           937           2100
##  7 FLASH FLOOD                    978           1777
##  8 ICE STORM                       89           1975
##  9 THUNDERSTORM WIND              133           1488
## 10 WINTER STORM                   206           1321

From the previous table, it is possible to discern the 10 most damaging events and their consequences. The order is based on the sum of both indicators, resulting in the total impact on the population. Thus, Tornado, Excessive Heat, and TSTM Wind are the deadliest events with 104,828 injured and 8,040 fatalities.

Then, by analyzing the top 3 deadliest events, it’s possible to make a comparison for each fatality indicator and event, as shown below:

top_3_events <- head(events_health_impact, 3)$EVTYPE

df_top3 <- df %>%
  filter(EVTYPE %in% top_3_events)

df_top3$BGN_DATE <- as.Date(df_top3$BGN_DATE, format = "%m/%d/%Y")

df_top3$Year <- year(df_top3$BGN_DATE)

injuries_by_year <- df_top3 %>%
  group_by(Year, EVTYPE) %>%
  summarise(total_injuries = sum(INJURIES, na.rm = TRUE),
            total_fatalities = sum(FATALITIES, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
#Injuries plot over the years for Top 3 Events

plot1 <- ggplot(injuries_by_year, aes(x = Year, y = total_injuries , color = EVTYPE)) +
  geom_line() +
  labs(title = "Total Injuries by Year for Top 3 Most Harmful Events",
       x = "Year",
       y = "Total Injuries",
       color = "Event Type") +
  scale_color_manual(values = c("black", "blue", "red"))+
  theme_minimal()

#Fatalities plot over the years for Top 3 Events

plot2 <- ggplot(injuries_by_year, aes(x = Year, y = total_fatalities , color = EVTYPE)) +
  geom_line() +
  labs(title = "Total Fatalities by Year for Top 3 Most Harmful Events",
       x = "Year",
       y = "Total Fatalities",
       color = "Event Type") +
  scale_color_manual(values = c("black", "blue", "red"))+
  theme_minimal()

2. Events with the Greatest Economic Consequences

Similarly to the previous question, a summary of the total damages caused in dollars by the events is performed. It is noteworthy that there are two columns associated with property damages: one contains the unit (PROPDMGEXP), which is in thousands (k), millions (m), or billions (b) of dollars. On the other hand, there is PROPDMG, which indicates the magnitude of the damage without units. Therefore, the total damage in dollars must be calculated to enable comparison.

# Calculate total property damage for each event type
events_economic_impact <- df %>%
  mutate(prop_damage = PROPDMG * ifelse(PROPDMGEXP %in% c("K", "k"), 1e3,
                                        ifelse(PROPDMGEXP %in% c("M", "m"), 1e6,
                                               ifelse(PROPDMGEXP %in% c("B", "b"), 1e9, 1)))) %>%
  group_by(EVTYPE) %>%
  summarise(total_prop_damage = sum(prop_damage, na.rm = TRUE)) %>%
  filter(!is.na(total_prop_damage)) %>%
  arrange(desc(total_prop_damage))

# Display table of events with the greatest economic consequences
head(events_economic_impact,10)
## # A tibble: 10 × 2
##    EVTYPE            total_prop_damage
##    <chr>                         <dbl>
##  1 FLOOD                 144657709807 
##  2 HURRICANE/TYPHOON      69305840000 
##  3 TORNADO                56937160779.
##  4 STORM SURGE            43323536000 
##  5 FLASH FLOOD            16140862067.
##  6 HAIL                   15732267048.
##  7 HURRICANE              11868319010 
##  8 TROPICAL STORM          7703890550 
##  9 WINTER STORM            6688497251 
## 10 HIGH WIND               5270046295

Then, it is possible to create a bar plot for easier comparison using the following code:

# Graficar el top 10 de eventos con mayores consecuencias económicas (orden invertido)
dmge_plot_top10 <- ggplot(head(events_economic_impact, 10), aes(x = reorder(EVTYPE, -total_prop_damage), y = total_prop_damage)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Top 10 Events by Total Property Damage in US Dollars",
       x = "Event Type",
       y = "Total Property Damage") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
Figure 2: bar plot of the top 10 events with most property damage in US dollars
dmge_plot_top10

En esta figura es posible apreciar que Flood tiene por lejos las peores consecuencias en ´terminos monetarios de daño a la propiedad, siendo superior a los daños causados por Hurricane/Typhoon y Tornado juntos. Es notorio también que este tipo de eventos (top4) tienen una alta relación y comúnmente suelen ocurrir en épocas del año y lugares similares. He aquí algunos puntos en comun de los fenómenos con mayores pérdidas materiales:

  1. Meteorological Interconnection

These events are closely linked to extreme weather phenomena such as low-pressure systems, strong winds, and intense precipitation. For instance, hurricanes and typhoons can produce heavy rainfall leading to flash floods, while tornadoes may form in conjunction with severe storms, including hurricanes. Storm surges, on the other hand, primarily occur during hurricane events and can significantly contribute to coastal flooding.

  1. Geographical Impact

These events tend to occur in regions prone to extreme weather conditions, such as coastal areas, river basins, and regions with high storm activity. Therefore, it is common for a region affected by a hurricane to experience flooding due to heavy rainfall and storm surge, along with the possibility of associated tornadoes.

  1. Risk Amplification

The presence of one extreme weather event can increase the risk of occurrence or intensify the impact of others. For example, a hurricane striking a coastal region can lead to significant flooding due to heavy rainfall and storm surge, thereby increasing the likelihood of tornado formation in affected areas.