Synopsis

This analysis investigates the NOAA Storm Database to identify the most harmful severe weather events across the United States, focusing on both population health and economic consequences. The study uses raw data directly from the CSV file to ensure a fully reproducible analysis. Event severity is measured by reported fatalities, injuries, and estimated property and crop damage costs. Data processing involves cleaning, transforming damage exponents to numeric values, and summarizing impacts by event type. The results reveal that tornadoes are the most damaging to population health, causing nearly 97,000 casualties. Excessive heat, thunderstorms, floods, and lightning also contribute significantly to human harm. Economically, floods are the leading cause of damage with losses exceeding $138 billion, followed by hurricanes, tornadoes, and hail. Other events such as ice storms and storm surges also result in substantial financial costs. Visualizations present the top ten event types in both health and economic impact categories. This report provides valuable insights for policymakers and emergency managers to prioritize resources for disaster preparedness and mitigation efforts.

Reading Libraries

library(readr)   # For reading CSV and other flat files efficiently
library(dplyr)   # For data manipulation (filter, select, mutate, etc.)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2) # For creating plots and visualizations
library(tidyr)   # For reshaping and tidying data (e.g., pivoting)

Data Processing

# Read the CSV file containing storm data into R
stormData <- read_csv("repdata_data_StormData.csv", show_col_types = FALSE)

# Display the first few rows of the raw storm data
head(stormData)
## # A tibble: 6 × 37
##   STATE__ BGN_DATE   BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
##     <dbl> <chr>      <chr>    <chr>      <dbl> <chr>      <chr> <chr>      <dbl>
## 1       1 4/18/1950… 0130     CST           97 MOBILE     AL    TORNA…         0
## 2       1 4/18/1950… 0145     CST            3 BALDWIN    AL    TORNA…         0
## 3       1 2/20/1951… 1600     CST           57 FAYETTE    AL    TORNA…         0
## 4       1 6/8/1951 … 0900     CST           89 MADISON    AL    TORNA…         0
## 5       1 11/15/195… 1500     CST           43 CULLMAN    AL    TORNA…         0
## 6       1 11/15/195… 2000     CST           77 LAUDERDALE AL    TORNA…         0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## #   END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## #   END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## #   MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## #   PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## #   STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## #   LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
# Select relevant columns for analysis: location, event type, damage, and casualties
clean_stormData <- stormData %>%
  select(COUNTY, STATE, EVTYPE,        # Location and event type
         FATALITIES, INJURIES,         # Human impact
         PROPDMG, PROPDMGEXP,          # Property damage and its magnitude
         CROPDMG, CROPDMGEXP)          # Crop damage and its magnitude

# Display the first few rows of the cleaned dataset
head(clean_stormData)
## # A tibble: 6 × 9
##   COUNTY STATE EVTYPE  FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
##    <dbl> <chr> <chr>        <dbl>    <dbl>   <dbl> <chr>        <dbl> <chr>     
## 1     97 AL    TORNADO          0       15    25   K                0 <NA>      
## 2      3 AL    TORNADO          0        0     2.5 K                0 <NA>      
## 3     57 AL    TORNADO          0        2    25   K                0 <NA>      
## 4     89 AL    TORNADO          0        2     2.5 K                0 <NA>      
## 5     43 AL    TORNADO          0        2     2.5 K                0 <NA>      
## 6     77 AL    TORNADO          0        6     2.5 K                0 <NA>
# Generate summary statistics for the cleaned dataset
summary(clean_stormData)  
##      COUNTY         STATE              EVTYPE            FATALITIES       
##  Min.   :  0.0   Length:902297      Length:902297      Min.   :  0.00000  
##  1st Qu.: 31.0   Class :character   Class :character   1st Qu.:  0.00000  
##  Median : 75.0   Mode  :character   Mode  :character   Median :  0.00000  
##  Mean   :100.6                                         Mean   :  0.01678  
##  3rd Qu.:131.0                                         3rd Qu.:  0.00000  
##  Max.   :873.0                                         Max.   :583.00000  
##     INJURIES            PROPDMG         PROPDMGEXP           CROPDMG       
##  Min.   :   0.0000   Min.   :   0.00   Length:902297      Min.   :  0.000  
##  1st Qu.:   0.0000   1st Qu.:   0.00   Class :character   1st Qu.:  0.000  
##  Median :   0.0000   Median :   0.00   Mode  :character   Median :  0.000  
##  Mean   :   0.1557   Mean   :  12.06                      Mean   :  1.527  
##  3rd Qu.:   0.0000   3rd Qu.:   0.50                      3rd Qu.:  0.000  
##  Max.   :1700.0000   Max.   :5000.00                      Max.   :990.000  
##   CROPDMGEXP       
##  Length:902297     
##  Class :character  
##  Mode  :character  
##                    
##                    
## 

Results

Most harmful events with respect to population health:

# Identify the top 10 most harmful event types based on total fatalities and injuries
most_harmful_events <- clean_stormData %>%
  group_by(EVTYPE) %>%  # Group data by event type
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),  # Sum total fatalities per event type
    total_injuries = sum(INJURIES, na.rm = TRUE),      # Sum total injuries per event type
    .groups = "drop"
  ) %>%
  mutate(total_harmful_events = total_fatalities + total_injuries) %>%  # Combine to get total harm
  arrange(desc(total_harmful_events)) %>%  # Sort by most harmful events
  slice(1:10)  # Select the top 10 most harmful events

# Reshape data from wide to long format for easier plotting with ggplot2
most_harmful_events_long <- most_harmful_events %>%
  pivot_longer(
    cols = c(total_fatalities, total_injuries),  # Columns to pivot
    names_to = "event",                          # New column for variable names
    values_to = "value"                          # New column for values
  )

# Create a bar plot showing the top 10 most harmful events, broken down by fatalities and injuries
ggplot(most_harmful_events_long, aes(
  fill = event,                # Use different fill colors for fatalities and injuries
  y = value,                   # Height of bars based on the value (count)
  x = reorder(EVTYPE, -value)  # Order event types by descending value
)) +
  geom_bar(position = "dodge", stat = "identity") +  # Create side-by-side bars
  scale_y_log10() +                                  # Use log scale for better visibility of differences
  theme_minimal() +                                  # Apply a clean theme
  theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) +  # Rotate x-axis labels for readability
  xlab("Top 10 harmful events") +                    # X-axis label
  ylab("Count (log scale)") +                        # Y-axis label
  ggtitle("Top 10 harmful events for population health in USA")  # Plot title

### Across the United States, greatest economic consequence causing events:

# View the unique property and crop damage exponents used in the dataset
unique(clean_stormData$PROPDMGEXP)
##  [1] "K" "M" NA  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(stormData$CROPDMGEXP)
## [1] NA  "M" "K" "m" "B" "?" "0" "k" "2"
# Create a lookup table to map exponent symbols to their numeric multipliers
exp_map <- setNames(
  c(1e3, 1e3, 1e6, 1e6,    # K/k = thousand, M/m = million
    1e9, 1e9, 1,    1,     # B/b = billion, "" and "0" = 1
    10, 100, 1000, 10000,  # Numeric exponents: 1 = 10, 2 = 100, etc.
    1e5, 1e6, 1e7, 1e8, 1e9, # Continuing numeric values
    1, 1, 1),              # Symbols like "+", "-", "?" are treated as multiplier of 1
  c("K", "k", "M", "m",
    "B", "b", "",   "0", "1", "2",
    "3", "4", "5", "6",
    "7", "8", "9", "+", "-", "?")
)

# Apply the multiplier mapping and calculate total damage values
clean_stormData <- clean_stormData %>%
  mutate(
    prop_multiplier = exp_map[as.character(PROPDMGEXP)],  # Get numeric multiplier for property damage
    crop_multiplier = exp_map[as.character(CROPDMGEXP)],  # Get numeric multiplier for crop damage
    prop_cost = PROPDMG * prop_multiplier,  # Compute actual property damage
    crop_cost = CROPDMG * crop_multiplier,  # Compute actual crop damage
    total_cost = prop_cost + crop_cost      # Compute total economic damage
  )

# Aggregate economic damages by event type
economic_impact <- clean_stormData %>%
  group_by(EVTYPE) %>%
  summarise(
    total_property_damage = sum(prop_cost, na.rm = TRUE),  # Total property damage per event
    total_crop_damage = sum(crop_cost, na.rm = TRUE),      # Total crop damage per event
    total_economic_damage = sum(total_cost, na.rm = TRUE), # Combined damage
    .groups = "drop"
  ) %>%
  arrange(desc(total_economic_damage))  # Sort events by highest total economic loss

# Select the top 10 events with the highest economic impact
top_economic <- head(economic_impact, 10)

# Reshape the data to long format for plotting side-by-side bars
economic_damage_long <- top_economic %>%
  pivot_longer(
    cols = c(total_property_damage, total_crop_damage),  # Columns to pivot
    names_to = "type",     # New column to store damage type
    values_to = "value"    # New column to store damage values
  )

# Plot the top 10 economically damaging events
ggplot(economic_damage_long, aes(
  fill = type,                          # Fill bars by damage type (property or crop)
  y = value / 1e6,                      # Convert damage values to millions for easier reading
  x = reorder(EVTYPE, -value)          # Order event types by descending damage
)) +
  geom_bar(position = "dodge", stat = "identity") +  # Side-by-side bar chart
  scale_y_log10() +                                  # Use log scale for better comparison
  theme_minimal() +                                  # Clean, minimal theme
  theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) +  # Rotate x-axis labels
  xlab("Top 10 harmful events") +                    # X-axis label
  ylab("Damage (in million USD, log scale)") +       # Y-axis label
  ggtitle("Top 10 harmful events for economic loss in USA") +  # Plot title
  scale_fill_manual(                                 # Custom colors and labels for damage types
    values = c("total_property_damage" = "steelblue",
               "total_crop_damage" = "darkgreen"),
    labels = c("Property Damage", "Crop Damage"),
    name = "Damage Type"
  )

Conclusion

This analysis highlights the varying impacts of severe weather events across the United States on both population health and economic loss. Tornadoes emerge as the deadliest and most injurious event type, causing the highest number of combined fatalities and injuries. Excessive heat, thunderstorms, and floods also contribute significantly to human harm. From an economic perspective, floods lead to the greatest financial damages, followed by hurricanes, tornadoes, and hailstorms. These findings emphasize the need for targeted disaster preparedness and resource allocation focused on the most damaging event types. By understanding which weather hazards cause the most harm, policymakers and emergency managers can prioritize mitigation strategies to better protect communities. Continued analysis and monitoring of storm data are essential for improving resilience against future severe weather threats. This study demonstrates the value of data-driven approaches in supporting public safety and economic stability. Overall, the results provide a foundation for informed decision-making to reduce the impacts of severe weather events.