Synopsis

This analysis investigates the NOAA Storm Database to identify the most harmful severe weather events across the United States, focusing on both population health and economic consequences. The study uses raw data directly from the CSV file to ensure a fully reproducible analysis. Event severity is measured by reported fatalities, injuries, and estimated property and crop damage costs. Data processing involves cleaning, transforming damage exponents to numeric values, and summarizing impacts by event type. The results reveal that tornadoes are the most damaging to population health, causing nearly 97,000 casualties. Excessive heat, thunderstorms, floods, and lightning also contribute significantly to human harm. Economically, floods are the leading cause of damage with losses exceeding $138 billion, followed by hurricanes, tornadoes, and hail. Other events such as ice storms and storm surges also result in substantial financial costs. Visualizations present the top ten event types in both health and economic impact categories. This report provides valuable insights for policymakers and emergency managers to prioritize resources for disaster preparedness and mitigation efforts.

Data Processing

library(readr) # For reading CSV and other flat files efficiently library(dplyr) # For data manipulation (filter, select, mutate, etc.) ## ## Attaching package: ‘dplyr’ ## The following objects are masked from ‘package:stats’: ## ## filter, lag ## The following objects are masked from ‘package:base’: ## ## intersect, setdiff, setequal, union library(ggplot2) # For creating plots and visualizations library(tidyr) # For reshaping and tidying data (e.g., pivoting)

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Select relevant columns

Read the CSV file containing storm data into R

stormData <- read_csv(“repdata_data_StormData.csv”, show_col_types = FALSE)

Display the first few rows of the raw storm data

head(stormData) ## # A tibble: 6 × 37 ## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE ## ## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0 ## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0 ## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0 ## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0 ## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0 ## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0 ## # ℹ 28 more variables: BGN_AZI , BGN_LOCATI , END_DATE , ## # END_TIME , COUNTY_END , COUNTYENDN , END_RANGE , ## # END_AZI , END_LOCATI , LENGTH , WIDTH , F , ## # MAG , FATALITIES , INJURIES , PROPDMG , ## # PROPDMGEXP , CROPDMG , CROPDMGEXP , WFO , ## # STATEOFFIC , ZONENAMES , LATITUDE , LONGITUDE , ## # LATITUDE_E , LONGITUDE_ , REMARKS , REFNUM # Select relevant columns for analysis: location, event type, damage, and casualties clean_stormData <- stormData %>% select(COUNTY, STATE, EVTYPE, # Location and event type FATALITIES, INJURIES, # Human impact PROPDMG, PROPDMGEXP, # Property damage and its magnitude CROPDMG, CROPDMGEXP) # Crop damage and its magnitude

Display the first few rows of the cleaned dataset

head(clean_stormData) ## # A tibble: 6 × 9 ## COUNTY STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP ##
## 1 97 AL TORNADO 0 15 25 K 0
## 2 3 AL TORNADO 0 0 2.5 K 0
## 3 57 AL TORNADO 0 2 25 K 0
## 4 89 AL TORNADO 0 2 2.5 K 0
## 5 43 AL TORNADO 0 2 2.5 K 0
## 6 77 AL TORNADO 0 6 2.5 K 0 # Generate summary statistics for the cleaned dataset summary(clean_stormData)
## COUNTY STATE EVTYPE FATALITIES
## Min. : 0.0 Length:902297 Length:902297 Min. : 0.00000
## 1st Qu.: 31.0 Class :character Class :character 1st Qu.: 0.00000
## Median : 75.0 Mode :character Mode :character Median : 0.00000
## Mean :100.6 Mean : 0.01678
## 3rd Qu.:131.0 3rd Qu.: 0.00000
## Max. :873.0 Max. :583.00000
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 Length:902297 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 Mode :character Median : 0.000
## Mean : 0.1557 Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##

Result

Most harmful events with respect to population health: # Identify the top 10 most harmful event types based on total fatalities and injuries most_harmful_events <- clean_stormData %>% group_by(EVTYPE) %>% # Group data by event type summarise( total_fatalities = sum(FATALITIES, na.rm = TRUE), # Sum total fatalities per event type total_injuries = sum(INJURIES, na.rm = TRUE), # Sum total injuries per event type .groups = “drop” ) %>% mutate(total_harmful_events = total_fatalities + total_injuries) %>% # Combine to get total harm arrange(desc(total_harmful_events)) %>% # Sort by most harmful events slice(1:10) # Select the top 10 most harmful events

Reshape data from wide to long format for easier plotting with ggplot2

most_harmful_events_long <- most_harmful_events %>% pivot_longer( cols = c(total_fatalities, total_injuries), # Columns to pivot names_to = “event”, # New column for variable names values_to = “value” # New column for values )

Create a bar plot showing the top 10 most harmful events, broken down by fatalities and injuries

ggplot(most_harmful_events_long, aes( fill = event, # Use different fill colors for fatalities and injuries y = value, # Height of bars based on the value (count) x = reorder(EVTYPE, -value) # Order event types by descending value )) + geom_bar(position = “dodge”, stat = “identity”) + # Create side-by-side bars scale_y_log10() + # Use log scale for better visibility of differences theme_minimal() + # Apply a clean theme theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels for readability xlab(“Top 10 harmful events”) + # X-axis label ylab(“Count (log scale)”) + # Y-axis label ggtitle(“Top 10 harmful events for population health in USA”) # Plot title greatest economic consequence causing events:

View the unique property and crop damage exponents used in the dataset

unique(clean_stormData\(PROPDMGEXP) ## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8" unique(stormData\)CROPDMGEXP) ## [1] NA “M” “K” “m” “B” “?” “0” “k” “2” # Create a lookup table to map exponent symbols to their numeric multipliers exp_map <- setNames( c(1e3, 1e3, 1e6, 1e6, # K/k = thousand, M/m = million 1e9, 1e9, 1, 1, # B/b = billion, “” and “0” = 1 10, 100, 1000, 10000, # Numeric exponents: 1 = 10, 2 = 100, etc. 1e5, 1e6, 1e7, 1e8, 1e9, # Continuing numeric values 1, 1, 1), # Symbols like “+”, “-”, “?” are treated as multiplier of 1 c(“K”, “k”, “M”, “m”, “B”, “b”, ““,”0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “+”, “-”, “?”) )

Apply the multiplier mapping and calculate total damage values

clean_stormData <- clean_stormData %>% mutate( prop_multiplier = exp_map[as.character(PROPDMGEXP)], # Get numeric multiplier for property damage crop_multiplier = exp_map[as.character(CROPDMGEXP)], # Get numeric multiplier for crop damage prop_cost = PROPDMG * prop_multiplier, # Compute actual property damage crop_cost = CROPDMG * crop_multiplier, # Compute actual crop damage total_cost = prop_cost + crop_cost # Compute total economic damage )

Aggregate economic damages by event type

economic_impact <- clean_stormData %>% group_by(EVTYPE) %>% summarise( total_property_damage = sum(prop_cost, na.rm = TRUE), # Total property damage per event total_crop_damage = sum(crop_cost, na.rm = TRUE), # Total crop damage per event total_economic_damage = sum(total_cost, na.rm = TRUE), # Combined damage .groups = “drop” ) %>% arrange(desc(total_economic_damage)) # Sort events by highest total economic loss

Select the top 10 events with the highest economic impact

top_economic <- head(economic_impact, 10)

Reshape the data to long format for plotting side-by-side bars

economic_damage_long <- top_economic %>% pivot_longer( cols = c(total_property_damage, total_crop_damage), # Columns to pivot names_to = “type”, # New column to store damage type values_to = “value” # New column to store damage values )

Plot the top 10 economically damaging events

ggplot(economic_damage_long, aes( fill = type, # Fill bars by damage type (property or crop) y = value / 1e6, # Convert damage values to millions for easier reading x = reorder(EVTYPE, -value) # Order event types by descending damage )) + geom_bar(position = “dodge”, stat = “identity”) + # Side-by-side bar chart scale_y_log10() + # Use log scale for better comparison theme_minimal() + # Clean, minimal theme theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels xlab(“Top 10 harmful events”) + # X-axis label ylab(“Damage (in million USD, log scale)”) + # Y-axis label ggtitle(“Top 10 harmful events for economic loss in USA”) + # Plot title scale_fill_manual( # Custom colors and labels for damage types values = c(“total_property_damage” = “steelblue”, “total_crop_damage” = “darkgreen”), labels = c(“Property Damage”, “Crop Damage”), name = “Damage Type” )

#Sum up This analysis highlights the varying impacts of severe weather events across the United States on both population health and economic loss. Tornadoes emerge as the deadliest and most injurious event type, causing the highest number of combined fatalities and injuries. Excessive heat, thunderstorms, and floods also contribute significantly to human harm. From an economic perspective, floods lead to the greatest financial damages, followed by hurricanes, tornadoes, and hailstorms. These findings emphasize the need for targeted disaster preparedness and resource allocation focused on the most damaging event types. By understanding which weather hazards cause the most harm, policymakers and emergency managers can prioritize mitigation strategies to better protect communities. Continued analysis and monitoring of storm data are essential for improving resilience against future severe weather threats. This study demonstrates the value of data-driven approaches in supporting public safety and economic stability. Overall, the results provide a foundation for informed decision-making to reduce the impacts of severe weather events.