This analysis investigates the NOAA Storm Database to identify the most harmful severe weather events across the United States, focusing on both population health and economic consequences. The study uses raw data directly from the CSV file to ensure a fully reproducible analysis. Event severity is measured by reported fatalities, injuries, and estimated property and crop damage costs. Data processing involves cleaning, transforming damage exponents to numeric values, and summarizing impacts by event type. The results reveal that tornadoes are the most damaging to population health, causing nearly 97,000 casualties. Excessive heat, thunderstorms, floods, and lightning also contribute significantly to human harm. Economically, floods are the leading cause of damage with losses exceeding $138 billion, followed by hurricanes, tornadoes, and hail. Other events such as ice storms and storm surges also result in substantial financial costs. Visualizations present the top ten event types in both health and economic impact categories. This report provides valuable insights for policymakers and emergency managers to prioritize resources for disaster preparedness and mitigation efforts.
library(readr) # For reading CSV and other flat files efficiently
library(dplyr) # For data manipulation (filter, select, mutate, etc.)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) # For creating plots and visualizations
library(tidyr) # For reshaping and tidying data (e.g., pivoting)
# Read the CSV file containing storm data into R
stormData <- read_csv("repdata_data_StormData.csv", show_col_types = FALSE)
# Display the first few rows of the raw storm data
head(stormData)
## # A tibble: 6 × 37
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
## <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 1 4/18/1950… 0130 CST 97 MOBILE AL TORNA… 0
## 2 1 4/18/1950… 0145 CST 3 BALDWIN AL TORNA… 0
## 3 1 2/20/1951… 1600 CST 57 FAYETTE AL TORNA… 0
## 4 1 6/8/1951 … 0900 CST 89 MADISON AL TORNA… 0
## 5 1 11/15/195… 1500 CST 43 CULLMAN AL TORNA… 0
## 6 1 11/15/195… 2000 CST 77 LAUDERDALE AL TORNA… 0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## # END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## # END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## # MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## # PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## # STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## # LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>
# Select relevant columns for analysis: location, event type, damage, and casualties
clean_stormData <- stormData %>%
select(COUNTY, STATE, EVTYPE, # Location and event type
FATALITIES, INJURIES, # Human impact
PROPDMG, PROPDMGEXP, # Property damage and its magnitude
CROPDMG, CROPDMGEXP) # Crop damage and its magnitude
# Display the first few rows of the cleaned dataset
head(clean_stormData)
## # A tibble: 6 × 9
## COUNTY STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <dbl> <chr>
## 1 97 AL TORNADO 0 15 25 K 0 <NA>
## 2 3 AL TORNADO 0 0 2.5 K 0 <NA>
## 3 57 AL TORNADO 0 2 25 K 0 <NA>
## 4 89 AL TORNADO 0 2 2.5 K 0 <NA>
## 5 43 AL TORNADO 0 2 2.5 K 0 <NA>
## 6 77 AL TORNADO 0 6 2.5 K 0 <NA>
# Generate summary statistics for the cleaned dataset
summary(clean_stormData)
## COUNTY STATE EVTYPE FATALITIES
## Min. : 0.0 Length:902297 Length:902297 Min. : 0.00000
## 1st Qu.: 31.0 Class :character Class :character 1st Qu.: 0.00000
## Median : 75.0 Mode :character Mode :character Median : 0.00000
## Mean :100.6 Mean : 0.01678
## 3rd Qu.:131.0 3rd Qu.: 0.00000
## Max. :873.0 Max. :583.00000
## INJURIES PROPDMG PROPDMGEXP CROPDMG
## Min. : 0.0000 Min. : 0.00 Length:902297 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.00 Class :character 1st Qu.: 0.000
## Median : 0.0000 Median : 0.00 Mode :character Median : 0.000
## Mean : 0.1557 Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :1700.0000 Max. :5000.00 Max. :990.000
## CROPDMGEXP
## Length:902297
## Class :character
## Mode :character
##
##
##
# Identify the top 10 most harmful event types based on total fatalities and injuries
most_harmful_events <- clean_stormData %>%
group_by(EVTYPE) %>% # Group data by event type
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE), # Sum total fatalities per event type
total_injuries = sum(INJURIES, na.rm = TRUE), # Sum total injuries per event type
.groups = "drop"
) %>%
mutate(total_harmful_events = total_fatalities + total_injuries) %>% # Combine to get total harm
arrange(desc(total_harmful_events)) %>% # Sort by most harmful events
slice(1:10) # Select the top 10 most harmful events
# Reshape data from wide to long format for easier plotting with ggplot2
most_harmful_events_long <- most_harmful_events %>%
pivot_longer(
cols = c(total_fatalities, total_injuries), # Columns to pivot
names_to = "event", # New column for variable names
values_to = "value" # New column for values
)
# Create a bar plot showing the top 10 most harmful events, broken down by fatalities and injuries
ggplot(most_harmful_events_long, aes(
fill = event, # Use different fill colors for fatalities and injuries
y = value, # Height of bars based on the value (count)
x = reorder(EVTYPE, -value) # Order event types by descending value
)) +
geom_bar(position = "dodge", stat = "identity") + # Create side-by-side bars
scale_y_log10() + # Use log scale for better visibility of differences
theme_minimal() + # Apply a clean theme
theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels for readability
xlab("Top 10 harmful events") + # X-axis label
ylab("Count (log scale)") + # Y-axis label
ggtitle("Top 10 harmful events for population health in USA") # Plot title
### Across the United States, greatest economic consequence causing
events:
# View the unique property and crop damage exponents used in the dataset
unique(clean_stormData$PROPDMGEXP)
## [1] "K" "M" NA "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(stormData$CROPDMGEXP)
## [1] NA "M" "K" "m" "B" "?" "0" "k" "2"
# Create a lookup table to map exponent symbols to their numeric multipliers
exp_map <- setNames(
c(1e3, 1e3, 1e6, 1e6, # K/k = thousand, M/m = million
1e9, 1e9, 1, 1, # B/b = billion, "" and "0" = 1
10, 100, 1000, 10000, # Numeric exponents: 1 = 10, 2 = 100, etc.
1e5, 1e6, 1e7, 1e8, 1e9, # Continuing numeric values
1, 1, 1), # Symbols like "+", "-", "?" are treated as multiplier of 1
c("K", "k", "M", "m",
"B", "b", "", "0", "1", "2",
"3", "4", "5", "6",
"7", "8", "9", "+", "-", "?")
)
# Apply the multiplier mapping and calculate total damage values
clean_stormData <- clean_stormData %>%
mutate(
prop_multiplier = exp_map[as.character(PROPDMGEXP)], # Get numeric multiplier for property damage
crop_multiplier = exp_map[as.character(CROPDMGEXP)], # Get numeric multiplier for crop damage
prop_cost = PROPDMG * prop_multiplier, # Compute actual property damage
crop_cost = CROPDMG * crop_multiplier, # Compute actual crop damage
total_cost = prop_cost + crop_cost # Compute total economic damage
)
# Aggregate economic damages by event type
economic_impact <- clean_stormData %>%
group_by(EVTYPE) %>%
summarise(
total_property_damage = sum(prop_cost, na.rm = TRUE), # Total property damage per event
total_crop_damage = sum(crop_cost, na.rm = TRUE), # Total crop damage per event
total_economic_damage = sum(total_cost, na.rm = TRUE), # Combined damage
.groups = "drop"
) %>%
arrange(desc(total_economic_damage)) # Sort events by highest total economic loss
# Select the top 10 events with the highest economic impact
top_economic <- head(economic_impact, 10)
# Reshape the data to long format for plotting side-by-side bars
economic_damage_long <- top_economic %>%
pivot_longer(
cols = c(total_property_damage, total_crop_damage), # Columns to pivot
names_to = "type", # New column to store damage type
values_to = "value" # New column to store damage values
)
# Plot the top 10 economically damaging events
ggplot(economic_damage_long, aes(
fill = type, # Fill bars by damage type (property or crop)
y = value / 1e6, # Convert damage values to millions for easier reading
x = reorder(EVTYPE, -value) # Order event types by descending damage
)) +
geom_bar(position = "dodge", stat = "identity") + # Side-by-side bar chart
scale_y_log10() + # Use log scale for better comparison
theme_minimal() + # Clean, minimal theme
theme(axis.text.x = element_text(angle = 30, vjust = 0.7)) + # Rotate x-axis labels
xlab("Top 10 harmful events") + # X-axis label
ylab("Damage (in million USD, log scale)") + # Y-axis label
ggtitle("Top 10 harmful events for economic loss in USA") + # Plot title
scale_fill_manual( # Custom colors and labels for damage types
values = c("total_property_damage" = "steelblue",
"total_crop_damage" = "darkgreen"),
labels = c("Property Damage", "Crop Damage"),
name = "Damage Type"
)
This analysis highlights the varying impacts of severe weather events across the United States on both population health and economic loss. Tornadoes emerge as the deadliest and most injurious event type, causing the highest number of combined fatalities and injuries. Excessive heat, thunderstorms, and floods also contribute significantly to human harm. From an economic perspective, floods lead to the greatest financial damages, followed by hurricanes, tornadoes, and hailstorms. These findings emphasize the need for targeted disaster preparedness and resource allocation focused on the most damaging event types. By understanding which weather hazards cause the most harm, policymakers and emergency managers can prioritize mitigation strategies to better protect communities. Continued analysis and monitoring of storm data are essential for improving resilience against future severe weather threats. This study demonstrates the value of data-driven approaches in supporting public safety and economic stability. Overall, the results provide a foundation for informed decision-making to reduce the impacts of severe weather events.