Synopsis

This analysis examines the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify which types of severe weather events are most harmful to population health and have the greatest economic consequences. The dataset contains records from 1950 to 2011, with more complete data in recent years. After comprehensive data processing and analysis, we found that tornadoes are overwhelmingly the most harmful event type for population health, causing the highest number of fatalities and injuries. For economic consequences, floods inflict the greatest property damage while droughts cause the most crop damage. The total economic impact is dominated by floods, hurricanes/typhoons, and storm surges. These findings can assist government and municipal managers in prioritizing resources and preparation strategies for different types of severe weather events based on their specific impacts on public safety and economic stability.

Data Processing

Loading Required Packages

library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)
library(scales)

## Data Processing

### Loading Required Packages
library(dplyr)
library(ggplot2)
library(tidyr)
library(stringr)

###Loading the Data The data was loaded directly from the compressed CSV file as required.

# Download and load the storm data from the provided URL
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "storm_data.csv.bz2"

# Download file if it doesn't exist
if (!file.exists(dest_file)) {
    download.file(file_url, dest_file, method = "curl")
}

# Load the data
storm_data <- read.csv(dest_file, stringsAsFactors = FALSE)

# Examine the data structure
cat("Dataset dimensions:", dim(storm_data), "\n")
## Dataset dimensions: 902297 37
cat("Number of event types:", length(unique(storm_data$EVTYPE)), "\n")
## Number of event types: 985

###Data Transformation and Cleaning The data requires significant processing, particularly for the economic damage variables which use exponent codes to represent multipliers.

# Select relevant variables for analysis
clean_storm <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Examine the exponent columns to understand the coding system
cat("Unique PROPDMGEXP values:", unique(clean_storm$PROPDMGEXP), "\n")
## Unique PROPDMGEXP values: K M  B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
cat("Unique CROPDMGEXP values:", unique(clean_storm$CROPDMGEXP), "\n")
## Unique CROPDMGEXP values:  M K m B ? 0 k 2
# Function to convert damage exponents to numeric multipliers
convert_damage_exponent <- function(exp) {
  exp <- toupper(as.character(exp))
  case_when(
    exp %in% c("", "+", "-", "?") ~ 1,
    exp == "H" ~ 100,                    # Hundreds
    exp == "K" ~ 1000,                   # Thousands
    exp == "M" ~ 1000000,                # Millions
    exp == "B" ~ 1000000000,             # Billions
    exp %in% c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9") ~ 10^as.numeric(exp),
    TRUE ~ 1
  )
}

# Calculate actual damage values in dollars
clean_storm <- clean_storm %>%
  mutate(
    PROP_DAMAGE = PROPDMG * convert_damage_exponent(PROPDMGEXP),
    CROP_DAMAGE = CROPDMG * convert_damage_exponent(CROPDMGEXP),
    TOTAL_DAMAGE = PROP_DAMAGE + CROP_DAMAGE,
    TOTAL_HEALTH = FATALITIES + INJURIES
  )
## Warning: There were 2 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `PROP_DAMAGE = PROPDMG * convert_damage_exponent(PROPDMGEXP)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
# Standardize event types by converting to uppercase and trimming whitespace
clean_storm <- clean_storm %>%
  mutate(EVTYPE = str_to_upper(str_trim(EVTYPE)))

# Aggregate data by event type for health impact analysis
health_impact <- clean_storm %>%
  group_by(EVTYPE) %>%
  summarise(
    Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
    Total_Injuries = sum(INJURIES, na.rm = TRUE),
    Total_Health_Impact = sum(TOTAL_HEALTH, na.rm = TRUE),
    Event_Count = n()
  ) %>%
  arrange(desc(Total_Health_Impact))

# Aggregate data by event type for economic impact analysis
economic_impact <- clean_storm %>%
  group_by(EVTYPE) %>%
  summarise(
    Property_Damage = sum(PROP_DAMAGE, na.rm = TRUE),
    Crop_Damage = sum(CROP_DAMAGE, na.rm = TRUE),
    Total_Economic_Impact = sum(TOTAL_DAMAGE, na.rm = TRUE),
    Event_Count = n()
  ) %>%
  arrange(desc(Total_Economic_Impact))

# Get top events for detailed analysis
top_health <- health_impact %>% head(10)
top_economic <- economic_impact %>% head(10)

###Results Most Harmful Events for Population Health

# Display top 10 most harmful events for population health
cat("Top 10 Most Harmful Events for Population Health:\n")
## Top 10 Most Harmful Events for Population Health:
print(top_health[, c("EVTYPE", "Total_Fatalities", "Total_Injuries", "Total_Health_Impact")])
## # A tibble: 10 × 4
##    EVTYPE            Total_Fatalities Total_Injuries Total_Health_Impact
##    <chr>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527
# Prepare data for plotting
health_plot_data <- top_health %>%
  select(EVTYPE, Total_Fatalities, Total_Injuries) %>%
  pivot_longer(cols = c(Total_Fatalities, Total_Injuries), 
               names_to = "Impact_Type", 
               values_to = "Count")

# Create the plot
ggplot(health_plot_data, aes(x = reorder(EVTYPE, -Count), y = Count, fill = Impact_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = c("Total_Fatalities" = "#E74C3C", "Total_Injuries" = "#F39C12"),
                    labels = c("Fatalities", "Injuries")) +
  labs(title = "Top 10 Most Harmful Weather Events for Population Health",
       x = "Event Type",
       y = "Total Count",
       fill = "Impact Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = comma_format())

Key Finding 1: Tornadoes are by far the most harmful event type for population health, causing approximately 5,633 fatalities and 91,346 injuries - substantially more than any other weather event.

Key Finding 2: Excessive heat and flash floods are the second and third most harmful events for population health, primarily due to their fatal impacts.

###Events with Greatest Economic Consequences

# Display top 10 events with greatest economic consequences
cat("Top 10 Events with Greatest Economic Consequences (in billions):\n")
## Top 10 Events with Greatest Economic Consequences (in billions):
top_economic_display <- top_economic %>%
  mutate(across(Property_Damage:Total_Economic_Impact, ~ . / 1e9))
print(top_economic_display[, c("EVTYPE", "Property_Damage", "Crop_Damage", "Total_Economic_Impact")])
## # A tibble: 10 × 4
##    EVTYPE            Property_Damage Crop_Damage Total_Economic_Impact
##    <chr>                       <dbl>       <dbl>                 <dbl>
##  1 FLOOD                      145.      5.66                    150.  
##  2 HURRICANE/TYPHOON           69.3     2.61                     71.9 
##  3 TORNADO                     56.9     0.415                    57.4 
##  4 STORM SURGE                 43.3     0.000005                 43.3 
##  5 HAIL                        15.7     3.03                     18.8 
##  6 FLASH FLOOD                 16.8     1.42                     18.2 
##  7 DROUGHT                      1.05   14.0                      15.0 
##  8 HURRICANE                   11.9     2.74                     14.6 
##  9 RIVER FLOOD                  5.12    5.03                     10.1 
## 10 ICE STORM                    3.94    5.02                      8.97
# Prepare data for plotting
economic_plot_data <- top_economic %>%
  select(EVTYPE, Property_Damage, Crop_Damage) %>%
  mutate(Property_Damage = Property_Damage / 1e9,
         Crop_Damage = Crop_Damage / 1e9) %>%
  pivot_longer(cols = c(Property_Damage, Crop_Damage), 
               names_to = "Damage_Type", 
               values_to = "Amount")

# Create the plot
ggplot(economic_plot_data, aes(x = reorder(EVTYPE, -Amount), y = Amount, fill = Damage_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  scale_fill_manual(values = c("Property_Damage" = "#3498DB", "Crop_Damage" = "#27AE60"),
                    labels = c("Crop Damage", "Property Damage")) +
  labs(title = "Top 10 Weather Events with Greatest Economic Consequences",
       x = "Event Type",
       y = "Total Damage (Billions USD)",
       fill = "Damage Type") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(hjust = 0.5)) +
  scale_y_continuous(labels = dollar_format())

*Key Finding 1: Floods cause the most property damage at approximately $144.7 billion, while droughts cause the most crop damage at about $13.9 billion.

*Key Finding 2: Hurricanes/typhoons and storm surges also cause massive economic damage, particularly to property, highlighting the devastating impact of coastal weather events.

###Comprehensive Impact Analysis

# Combine health and economic data for comprehensive analysis
combined_impact <- health_impact %>%
  inner_join(economic_impact, by = "EVTYPE", suffix = c("_health", "_economic")) %>%
  filter(Total_Health_Impact > 1000 | Total_Economic_Impact > 1e9) %>%
  arrange(desc(Total_Health_Impact)) %>%
  head(20)

# Create comprehensive scatter plot
ggplot(combined_impact, 
       aes(x = Total_Economic_Impact/1e9, 
           y = Total_Health_Impact,
           size = Total_Health_Impact,
           color = Total_Economic_Impact/1e9)) +
  geom_point(alpha = 0.7) +
  geom_text(aes(label = EVTYPE), size = 3, hjust = 0.5, vjust = -0.5, 
            check_overlap = FALSE) +
  scale_color_gradient(low = "blue", high = "red", 
                       name = "Economic Impact\n(Billions USD)") +
  scale_size_continuous(name = "Health Impact") +
  labs(title = "Comprehensive Analysis: Health vs Economic Impacts of Weather Events",
       x = "Total Economic Impact (Billions USD)",
       y = "Total Health Impact (Fatalities + Injuries)") +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5)) +
  scale_x_continuous(labels = dollar_format()) +
  scale_y_continuous(labels = comma_format())

###Summary and Conclusions

Based on the comprehensive analysis of NOAA storm data from 1950 to 2011:

*Population Health Protection Priority: Tornadoes require the highest priority for public safety measures, as they cause the overwhelming majority of weather-related fatalities and injuries. Excessive heat and flash floods also warrant significant attention for health protection.

*Economic Protection Priority: Flood mitigation should be the primary focus for economic protection, followed by hurricane/typhoon preparedness and storm surge protection systems. Agricultural regions should prioritize drought preparedness.

*Resource Allocation Strategy: Government and municipal managers should adopt differentiated strategies:

**Tornado-prone areas: Invest in early warning systems, shelters, and public education

**Flood-prone regions: Focus on infrastructure protection, zoning regulations, and flood control systems

**Coastal areas: Prioritize hurricane and storm surge protection

**Agricultural regions: Implement water conservation and drought-resistant farming practices

The analysis demonstrates that different severe weather events pose distinct threats, requiring targeted preparation and resource allocation strategies to maximize public safety and economic protection.