Title: Impact of Severe Weather Events on Public Health and Economic Consequences in the U.S.

Synopsis:

This analysis explores the National Oceanic and Atmospheric Administration (NOAA) Storm Database to assess the impact of severe weather events on public health and the economy in the United States from 1950 to 2011. We investigate which types of events cause the most harm to public health, in terms of fatalities and injuries, and which types result in the greatest economic losses, considering property and crop damage. Our results reveal the most damaging events in these two categories, providing valuable insights for prioritizing disaster preparedness and resource allocation in communities vulnerable to severe weather.

Data Processing:

In this section, we describe the steps taken to load and preprocess the data, followed by the methods used to analyze it.

Step 1: Load the Data

We begin by downloading and loading the NOAA Storm Database into R. The data is in CSV format and compressed using the bzip2 algorithm. We use the read.csv() function to load the data into R.

# Load the required package
library(dplyr)

# Download the file 
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, destfile = "StormData.csv.bz2")

# Load the data
storm_data <- read.csv("StormData.csv.bz2")
# Data check
dim(storm_data)
## [1] 902297     37
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(storm_data, 2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                        14   100 3   0          0       15    25.0
## 2         0                         2   150 2   0          0        0     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2

Step 2: Data Cleaning

  • We clean the data to focus on the relevant columns for this analysis. We are primarily concerned with the following columns:
    • EVTYPE: The type of the event (e.g., tornado, flood, etc.)
    • FATALITIES: Number of fatalities caused by the event
    • INJURIES: Number of injuries caused by the event
    • PROPDMG: Property damage in dollars
    • CROPDMG: Crop damage in dollars
    • PROPDMGEXP: Property Damage Exponent
    • CROPDMGEXP: Crop Damage Exponent
  • We filter out any rows where all of the values for fatalities, injuries, and damage are zero, as they do not contribute to the analysis. Specifically, we retain only rows where at least one of these variables is greater than zero.
  • We convert all variables (column names) to lowercase.
library(dplyr)
# Filter the relevant columns and rows
storm <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, CROPDMG, PROPDMGEXP, CROPDMGEXP) %>%
  filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)

# Transform all column names to lowercase
names(storm) <- tolower(names(storm))

Step 3: Data Transformation

For the analysis, we will compute the total fatalities and injuries by event type and the total property and crop damage for each event type. We aggregate the data by evtype using the summarise() function.

library(dplyr)
# Aggregate the data by event type and create a new variable for total health impact
event_health_impact <- storm %>%
  group_by(evtype) %>%
  summarise(
    total_fatalities = sum(fatalities, na.rm = TRUE),
    total_injuries = sum(injuries, na.rm = TRUE),
    total_propdmg = sum(propdmg, na.rm = TRUE),
    total_cropdmg = sum(cropdmg, na.rm = TRUE),
    total_health_impact = sum(fatalities, na.rm = TRUE) + 
      sum(injuries, na.rm = TRUE)) %>%
  arrange(desc(total_health_impact))

Step 4: Scaling Damage Values

Since the damage values (propdmg and cropdmg) are recorded in different scales (e.g., thousands, millions, etc.), we standardize these values for a more meaningful comparison.

# Function to convert exponent characters to numeric values
convert_exp_to_numeric <- function(exp) {
  if (exp == "K") {
    return(1000)
  } else if (exp == "M") {
    return(1e6)
  } else if (exp == "B") {
    return(1e9)
  } else if (exp == "H") {
    return(100)
  } else if (exp == "") {
    return(1)  # No exponent (i.e., the value is assumed to be in actual units)
  } else if (exp %in% c("1", "2", "3", "4", "5", "6", "7", "8")) {
    return(10^as.numeric(exp))
  } else {
    return(NA)  # Return NA for unexpected values
  }
}

# Apply the conversion function to both propdmgexp and cropdmgexp
storm$propdmgexp_numeric <- unlist(sapply(storm$propdmgexp, convert_exp_to_numeric))
storm$cropdmgexp_numeric <- unlist(sapply(storm$cropdmgexp, convert_exp_to_numeric))

# Adjust the damage columns by multiplying with the corresponding exponent values
storm$propdmg <- storm$propdmg * storm$propdmgexp_numeric
storm$cropdmg <- storm$cropdmg * storm$cropdmgexp_numeric

Results:

1. Most Harmful Weather Events to Population Health

We evaluate the types of events causing the most harm to population health by summing the fatalities and injuries. The event types that result in the highest total fatalities and injuries are tornadoes and heat waves.

library(dplyr)
library(ggplot2)

# Summarize the total health impact (total fatalities and total injuries for the 10 most harmful events)
health_impact <- event_health_impact %>%
  top_n(10, total_health_impact) %>%
  arrange(desc(total_health_impact))

# Plot the most harmful events in terms of health impact
ggplot(health_impact, aes(x = reorder(evtype, total_health_impact), 
                           y = total_health_impact)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  labs(title = "Top 10 Most Harmful Weather Events in Terms of Health Impact",
       x = "Event Type",
       y = "Health Impact (Total Fatalities & Injuries)") +
  theme_minimal()

2. Weather Events with the Greatest Economic Consequences

Next, we focus on the economic impact of severe weather events. This includes the total property and crop damage caused by each event type. Hurricanes and floods are among the most damaging events in terms of economic costs.

library(dplyr)
library(ggplot2)
library(scales)
# Summarize the total economic damage for each event type
economic_impact <- storm %>%
  group_by(evtype) %>%
  summarise(
    total_propdmg = sum(propdmg, na.rm = TRUE),
    total_cropdmg = sum(cropdmg, na.rm = TRUE)
  ) %>%
  mutate(total_damage = total_propdmg + total_cropdmg) %>%  # Create total damage variable
  arrange(desc(total_damage)) %>%
  top_n(10, total_damage)  # Get top 10 events by economic damage

# Function to format numbers in billions
format_billions <- function(x) {
  paste0(label_comma()(x / 1e9), "B")
}

# Plot the most damaging events in terms of economic impact
ggplot(economic_impact, aes(x = reorder(evtype, total_damage), y = total_damage)) +
  geom_bar(stat = "identity", aes(fill = total_damage), color = "black") +
  coord_flip() +
  labs(title =
         "Top 10 Weather Events by Economic Cost",
       x = "Event Type",
       y = "Economic Cost (Total Property & Crop Damage in Billions (B) of Dollars ($))") +
  theme_minimal() +
  scale_fill_gradient(low = "blue", high = "red", name = "Total Damage", 
                      labels = format_billions) +  
  scale_y_continuous(labels = format_billions)  

Conclusion:

This analysis identifies the most harmful weather events in the U.S. based on both public health and economic consequences. Tornadoes, heat waves, and hurricanes are among the most impactful events in terms of fatalities, injuries, and economic costs. By understanding these patterns, municipalities can better allocate resources for disaster preparedness and mitigation.