This analysis investigates the NOAA Storm Database to identify the most harmful severe weather events across the United States, focusing on both population health and economic consequences. The study uses raw data directly from the CSV file to ensure a fully reproducible analysis. Event severity is measured by reported fatalities, injuries, and estimated property and crop damage costs. Data processing involves cleaning, transforming damage exponents to numeric values, and summarizing impacts by event type. The results reveal that tornadoes are the most damaging to population health, causing nearly 97,000 casualties. Excessive heat, thunderstorms, floods, and lightning also contribute significantly to human harm. Economically, floods are the leading cause of damage with losses exceeding $138 billion, followed by hurricanes, tornadoes, and hail. Other events such as ice storms and storm surges also result in substantial financial costs. Visualizations present the top ten event types in both health and economic impact categories. This report provides valuable insights for policymakers and emergency managers to prioritize resources for disaster preparedness and mitigation efforts.
library(readr) # For reading CSV and other flat files efficiently library(dplyr) # For data manipulation (filter, select, mutate, etc.) ## ## Attaching package: ‘dplyr’ ## The following objects are masked from ‘package:stats’: ## ## filter, lag ## The following objects are masked from ‘package:base’: ## ## intersect, setdiff, setequal, union library(ggplot2) # For creating plots and visualizations library(tidyr) # For reshaping and tidying data (e.g., pivoting)
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
stormData <- read_csv(“repdata_data_StormData.csv”, show_col_types = FALSE)
head(stormData) ## # A tibble: 6 × 37 ## STATE__ BGN_DATE BGN_TIME
TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE ##
head(clean_stormData) ## # A tibble: 6 × 9 ## COUNTY STATE EVTYPE
FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP ##
## 1 97 AL TORNADO 0 15 25 K 0
## 2 3 AL TORNADO 0 0 2.5 K 0
## 3 57 AL TORNADO 0 2 25 K 0
## 4 89 AL TORNADO 0 2 2.5 K 0
## 5 43 AL TORNADO 0 2 2.5 K 0
## 6 77 AL TORNADO 0 6 2.5 K 0
## COUNTY STATE EVTYPE FATALITIES
## Min. : 0.0 Length:902297 Length:902297 Min. : 0.00000
## 1st Qu.: 31.0 Class :character Class :character 1st Qu.:
0.00000
## Median : 75.0 Mode :character Mode :character Median : 0.00000
## Mean :100.6 Mean : 0.01678
## 3rd Qu.:131.0 3rd Qu.: 0.00000
## Max. :873.0 Max. :583.00000
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 Length:902297 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 Mode :character Median : 0.000
## Mean : 0.1557 Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##
Most harmful events with respect to population health: # Identify the top 10 most harmful event types based on total fatalities and injuries most_harmful_events <- clean_stormData %>% group_by(EVTYPE) %>% # Group data by event type summarise( total_fatalities = sum(FATALITIES, na.rm = TRUE), # Sum total fatalities per event type total_injuries = sum(INJURIES, na.rm = TRUE), # Sum total injuries per event type .groups = “drop” ) %>% mutate(total_harmful_events = total_fatalities + total_injuries) %>% # Combine to get total harm arrange(desc(total_harmful_events)) %>% # Sort by most harmful events slice(1:10) # Select the top 10 most harmful events
most_harmful_events_long <- most_harmful_events %>% pivot_longer( cols = c(total_fatalities, total_injuries), # Columns to pivot names_to = “event”, # New column for variable names values_to = “value” # New column for values )
ggplot(most_harmful_events_long, aes( fill = event, # Use different fill colors for fatalities and injuries y = value, # Height of bars based on the value (count) x = reorder(EVTYPE, -value) # Order event types by descending value )) + geom_bar(position = “dodge”, stat = “identity”) + # Create side-by-side bars scale_y_log10() + # Use log scale for better visibility of differences theme_minimal() + # Apply a clean theme theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels for readability xlab(“Top 10 harmful events”) + # X-axis label ylab(“Count (log scale)”) + # Y-axis label ggtitle(“Top 10 harmful events for population health in USA”) # Plot title greatest economic consequence causing events:
unique(clean_stormData\(PROPDMGEXP) ## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8" unique(stormData\)CROPDMGEXP) ## [1] NA “M” “K” “m” “B” “?” “0” “k” “2” # Create a lookup table to map exponent symbols to their numeric multipliers exp_map <- setNames( c(1e3, 1e3, 1e6, 1e6, # K/k = thousand, M/m = million 1e9, 1e9, 1, 1, # B/b = billion, “” and “0” = 1 10, 100, 1000, 10000, # Numeric exponents: 1 = 10, 2 = 100, etc. 1e5, 1e6, 1e7, 1e8, 1e9, # Continuing numeric values 1, 1, 1), # Symbols like “+”, “-”, “?” are treated as multiplier of 1 c(“K”, “k”, “M”, “m”, “B”, “b”, ““,”0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “+”, “-”, “?”) )
clean_stormData <- clean_stormData %>% mutate( prop_multiplier = exp_map[as.character(PROPDMGEXP)], # Get numeric multiplier for property damage crop_multiplier = exp_map[as.character(CROPDMGEXP)], # Get numeric multiplier for crop damage prop_cost = PROPDMG * prop_multiplier, # Compute actual property damage crop_cost = CROPDMG * crop_multiplier, # Compute actual crop damage total_cost = prop_cost + crop_cost # Compute total economic damage )
economic_impact <- clean_stormData %>% group_by(EVTYPE) %>% summarise( total_property_damage = sum(prop_cost, na.rm = TRUE), # Total property damage per event total_crop_damage = sum(crop_cost, na.rm = TRUE), # Total crop damage per event total_economic_damage = sum(total_cost, na.rm = TRUE), # Combined damage .groups = “drop” ) %>% arrange(desc(total_economic_damage)) # Sort events by highest total economic loss
top_economic <- head(economic_impact, 10)
economic_damage_long <- top_economic %>% pivot_longer( cols = c(total_property_damage, total_crop_damage), # Columns to pivot names_to = “type”, # New column to store damage type values_to = “value” # New column to store damage values )
ggplot(economic_damage_long, aes( fill = type, # Fill bars by damage type (property or crop) y = value / 1e6, # Convert damage values to millions for easier reading x = reorder(EVTYPE, -value) # Order event types by descending damage )) + geom_bar(position = “dodge”, stat = “identity”) + # Side-by-side bar chart scale_y_log10() + # Use log scale for better comparison theme_minimal() + # Clean, minimal theme theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels xlab(“Top 10 harmful events”) + # X-axis label ylab(“Damage (in million USD, log scale)”) + # Y-axis label ggtitle(“Top 10 harmful events for economic loss in USA”) + # Plot title scale_fill_manual( # Custom colors and labels for damage types values = c(“total_property_damage” = “steelblue”, “total_crop_damage” = “darkgreen”), labels = c(“Property Damage”, “Crop Damage”), name = “Damage Type” )
#Sum up This analysis highlights the varying impacts of severe weather events across the United States on both population health and economic loss. Tornadoes emerge as the deadliest and most injurious event type, causing the highest number of combined fatalities and injuries. Excessive heat, thunderstorms, and floods also contribute significantly to human harm. From an economic perspective, floods lead to the greatest financial damages, followed by hurricanes, tornadoes, and hailstorms. These findings emphasize the need for targeted disaster preparedness and resource allocation focused on the most damaging event types. By understanding which weather hazards cause the most harm, policymakers and emergency managers can prioritize mitigation strategies to better protect communities. Continued analysis and monitoring of storm data are essential for improving resilience against future severe weather threats. This study demonstrates the value of data-driven approaches in supporting public safety and economic stability. Overall, the results provide a foundation for informed decision-making to reduce the impacts of severe weather events.