This analysis studies severe weather events across the United States using the raw Storm Events CSV file. The main focus is on which event types caused the most harm to population health, measured by deaths and injuries. Because the file contains joined data, some events appear more than once, so the analysis removes duplicate EVENT_ID records before summarizing event-level results. Excessive Heat, Tornadoes, Flash Floods, Heat, and Thunderstorm Wind were among the most harmful event types for population health. Thunderstorm Wind, Flash Flood, Hail, Flood, and High Wind were the most commonly reported event types overall. Several event types were strongly seasonal, such as Heat in July and August, Winter Storms in January and February, and Tornadoes in April, May, and June. The analysis also examines which states had the highest number of each major event type. As an additional question, the report studies which event types caused the greatest property and crop damage.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.3
## Warning: package 'tibble' was built under R version 4.5.3
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.3
## Warning: package 'forcats' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.2.1 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.3 ✔ tibble 3.3.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
storm <- read_csv("StormEvents_joined_data.csv")
## Rows: 94364 Columns: 70
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (31): STATE, MONTH_NAME, EVENT_TYPE, CZ_TYPE, CZ_NAME, WFO, BEGIN_DATE_T...
## dbl (38): BEGIN_YEARMONTH, BEGIN_DAY, BEGIN_TIME, END_YEARMONTH, END_DAY, EN...
## lgl (1): CATEGORY
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check structure
glimpse(storm)
## Rows: 94,364
## Columns: 70
## $ BEGIN_YEARMONTH <dbl> 202503, 202503, 202501, 202501, 202501, 202501, 202…
## $ BEGIN_DAY <dbl> 31, 30, 5, 3, 3, 3, 3, 3, 3, 3, 19, 13, 13, 13, 13,…
## $ BEGIN_TIME <dbl> 1104, 1552, 1800, 1300, 1300, 1300, 1547, 1527, 130…
## $ END_YEARMONTH <dbl> 202503, 202503, 202501, 202501, 202501, 202501, 202…
## $ END_DAY <dbl> 31, 30, 6, 3, 3, 3, 3, 3, 3, 3, 19, 13, 13, 13, 13,…
## $ END_TIME <dbl> 1106, 1555, 2227, 1900, 1900, 1900, 1619, 1619, 190…
## $ EPISODE_ID.x <dbl> 201366, 200337, 197733, 197761, 197761, 197761, 197…
## $ EVENT_ID <dbl> 1252415, 1241136, 1222851, 1223112, 1223113, 122311…
## $ STATE <chr> "GEORGIA", "MICHIGAN", "VIRGINIA", "MARYLAND", "MAR…
## $ STATE_FIPS <dbl> 13, 26, 51, 24, 24, 24, 24, 51, 24, 24, 27, 27, 27,…
## $ YEAR <dbl> 2025, 2025, 2025, 2025, 2025, 2025, 2025, 2025, 202…
## $ MONTH_NAME <chr> "March", "March", "January", "January", "January", …
## $ EVENT_TYPE <chr> "Thunderstorm Wind", "Tornado", "Winter Storm", "Wi…
## $ CZ_TYPE <chr> "C", "C", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "Z", "…
## $ CZ_FIPS <dbl> 45, 27, 56, 506, 504, 503, 14, 53, 5, 505, 89, 71, …
## $ CZ_NAME <chr> "CARROLL", "CASS", "SPOTSYLVANIA", "CENTRAL AND SOU…
## $ WFO <chr> "FFC", "IWX", "LWX", "LWX", "LWX", "LWX", "LWX", "L…
## $ BEGIN_DATE_TIME <chr> "3/31/2025 11:04", "3/30/2025 15:52", "1/5/2025 18:…
## $ CZ_TIMEZONE <chr> "EST-5", "EST-5", "EST-5", "EST-5", "EST-5", "EST-5…
## $ END_DATE_TIME <chr> "3/31/2025 11:06", "3/30/2025 15:55", "1/6/2025 22:…
## $ INJURIES_DIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ INJURIES_INDIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_DIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DEATHS_INDIRECT <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ DAMAGE_PROPERTY <chr> "1.00K", "100.00K", NA, NA, NA, NA, "0.00K", NA, NA…
## $ DAMAGE_CROPS <chr> NA, "0.00K", NA, NA, NA, NA, "0.00K", NA, NA, NA, "…
## $ SOURCE <chr> "Emergency Manager", "NWS Storm Survey", "Trained S…
## $ MAGNITUDE <dbl> 52.0, NA, NA, NA, NA, NA, NA, NA, NA, NA, 38.0, NA,…
## $ MAGNITUDE_TYPE <chr> "EG", NA, NA, NA, NA, NA, NA, NA, NA, NA, "MS", NA,…
## $ FLOOD_CAUSE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ CATEGORY <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_F_SCALE <chr> NA, "EF1", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ TOR_LENGTH <dbl> NA, 2.59, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ TOR_WIDTH <dbl> NA, 100, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ TOR_OTHER_WFO <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_STATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_FIPS <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ TOR_OTHER_CZ_NAME <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BEGIN_RANGE <dbl> 2.22, 1.24, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BEGIN_AZIMUTH <chr> "W", "SW", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ BEGIN_LOCATION <chr> "TYUS", "EDWARDSBURG", NA, NA, NA, NA, NA, NA, NA, …
## $ END_RANGE <dbl> 2.22, 1.47, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_AZIMUTH <chr> "W", "NNE", NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_LOCATION <chr> "TYUS", "EDWARDSBURG", NA, NA, NA, NA, NA, NA, NA, …
## $ BEGIN_LAT <dbl> 33.4757, 41.7900, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ BEGIN_LON <dbl> -85.238, -86.100, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ END_LAT <dbl> 33.4757, 41.8200, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ END_LON <dbl> -85.238, -86.070, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ EPISODE_NARRATIVE <chr> "A cold-front initiated a line of thunderstorms acr…
## $ EVENT_NARRATIVE <chr> "Tree down at the intersection of highway 5 and old…
## $ DATA_SOURCE <chr> "CSV", "CSV", "CSV", "CSV", "CSV", "CSV", "CSV", "C…
## $ YEARMONTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ EPISODE_ID.y <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOCATION_INDEX <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RANGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ AZIMUTH <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LOCATION <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LATITUDE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LONGITUDE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LAT2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LON2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_YEARMONTH <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_DAY <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FAT_TIME <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_ID <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_TYPE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_DATE <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_AGE <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_SEX <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ FATALITY_LOCATION <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
# Because this is joined data, some EVENT_ID values appear more than once.
# To avoid double-counting deaths, injuries, and damages, keep one row per event.
storm_events <- storm %>%
distinct(EVENT_ID, .keep_all = TRUE)
# Convert health impact columns to numeric and replace missing values with 0
storm_events <- storm_events %>%
mutate(
injuries_direct = as.numeric(INJURIES_DIRECT),
injuries_indirect = as.numeric(INJURIES_INDIRECT),
deaths_direct = as.numeric(DEATHS_DIRECT),
deaths_indirect = as.numeric(DEATHS_INDIRECT),
injuries_direct = replace_na(injuries_direct, 0),
injuries_indirect = replace_na(injuries_indirect, 0),
deaths_direct = replace_na(deaths_direct, 0),
deaths_indirect = replace_na(deaths_indirect, 0),
total_injuries = injuries_direct + injuries_indirect,
total_deaths = deaths_direct + deaths_indirect,
total_health_impact = total_injuries + total_deaths
)
# Function to convert damage values such as 1.00K, 2.50M, or 1.00B
convert_damage <- function(x) {
x <- toupper(as.character(x))
number <- as.numeric(str_extract(x, "[0-9.]+"))
multiplier <- case_when(
str_detect(x, "K") ~ 1000,
str_detect(x, "M") ~ 1000000,
str_detect(x, "B") ~ 1000000000,
TRUE ~ 1
)
replace_na(number * multiplier, 0)
}
storm_events <- storm_events %>%
mutate(
property_damage_num = convert_damage(DAMAGE_PROPERTY),
crop_damage_num = convert_damage(DAMAGE_CROPS),
total_damage = property_damage_num + crop_damage_num
)
health_summary <- storm_events %>%
group_by(EVENT_TYPE) %>%
summarise(
total_deaths = sum(total_deaths),
total_injuries = sum(total_injuries),
total_health_impact = sum(total_health_impact),
number_of_events = n(),
.groups = "drop"
) %>%
arrange(desc(total_health_impact))
health_summary %>%
slice_head(n = 10)
## # A tibble: 10 × 5
## EVENT_TYPE total_deaths total_injuries total_health_impact number_of_events
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Excessive H… 90 326 416 1439
## 2 Tornado 64 257 321 1591
## 3 Flash Flood 209 20 229 5393
## 4 Heat 163 51 214 2864
## 5 Thunderstor… 41 141 182 21807
## 6 Winter Weat… 31 123 154 4436
## 7 Lightning 21 98 119 288
## 8 Wildfire 63 43 106 350
## 9 Dust Storm 17 78 95 320
## 10 Rip Current 39 49 88 72
health_summary %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(EVENT_TYPE, total_health_impact),
y = total_health_impact)) +
geom_col() +
coord_flip() +
labs(
title = "Top 10 Weather Event Types by Population Health Impact",
x = "Event Type",
y = "Deaths + Injuries"
)
Excessive Heat caused the greatest combined number of deaths and injuries. Tornadoes and Flash Floods were also major sources of population harm. For a government or municipal manager, this means that both dramatic events, such as tornadoes and flash floods, and less visually dramatic events, such as heat, should be taken seriously in emergency planning.
state_event_summary <- storm_events %>%
group_by(STATE, EVENT_TYPE) %>%
summarise(number_of_events = n(), .groups = "drop") %>%
arrange(desc(number_of_events))
state_event_summary %>%
slice_head(n = 20)
## # A tibble: 20 × 3
## STATE EVENT_TYPE number_of_events
## <chr> <chr> <int>
## 1 ALABAMA Thunderstorm Wind 1532
## 2 TEXAS Hail 1453
## 3 TEXAS Thunderstorm Wind 1205
## 4 VIRGINIA Thunderstorm Wind 1096
## 5 GEORGIA Thunderstorm Wind 1025
## 6 PENNSYLVANIA Thunderstorm Wind 1010
## 7 ILLINOIS Thunderstorm Wind 978
## 8 MISSOURI Thunderstorm Wind 886
## 9 OKLAHOMA Hail 836
## 10 KANSAS Thunderstorm Wind 832
## 11 SOUTH DAKOTA Thunderstorm Wind 749
## 12 NORTH CAROLINA Thunderstorm Wind 734
## 13 ATLANTIC NORTH Marine Thunderstorm Wind 724
## 14 OHIO Thunderstorm Wind 717
## 15 INDIANA Thunderstorm Wind 714
## 16 MISSISSIPPI Thunderstorm Wind 698
## 17 WEST VIRGINIA Thunderstorm Wind 681
## 18 KENTUCKY Thunderstorm Wind 677
## 19 TENNESSEE Thunderstorm Wind 658
## 20 NEW YORK Thunderstorm Wind 635
state_event_summary %>%
slice_head(n = 15) %>%
ggplot(aes(x = reorder(paste(STATE, EVENT_TYPE, sep = " - "), number_of_events),
y = number_of_events)) +
geom_col() +
coord_flip() +
labs(
title = "Most Frequent State and Event Type Combinations",
x = "State and Event Type",
y = "Number of Events"
)
Texas appears repeatedly among the highest-count combinations, especially for Flash Flood, Hail, and Thunderstorm Wind. Virginia also had a high number of Flash Flood and Thunderstorm Wind events. This suggests that event preparedness varies strongly by region and that states may face very different severe weather profiles.
monthly_event_summary <- storm_events %>%
group_by(MONTH_NAME, EVENT_TYPE) %>%
summarise(number_of_events = n(), .groups = "drop") %>%
arrange(desc(number_of_events))
monthly_event_summary %>%
slice_head(n = 20)
## # A tibble: 20 × 3
## MONTH_NAME EVENT_TYPE number_of_events
## <chr> <chr> <int>
## 1 June Thunderstorm Wind 5266
## 2 May Thunderstorm Wind 3895
## 3 July Thunderstorm Wind 3793
## 4 May Hail 2835
## 5 April Thunderstorm Wind 2766
## 6 March Thunderstorm Wind 2390
## 7 April Hail 1820
## 8 March High Wind 1669
## 9 August Thunderstorm Wind 1629
## 10 July Flash Flood 1570
## 11 July Heat 1399
## 12 June Hail 1384
## 13 February Winter Weather 1369
## 14 December Winter Weather 1274
## 15 January Winter Storm 1120
## 16 March Hail 1084
## 17 December High Wind 1061
## 18 January Winter Weather 1006
## 19 June Flash Flood 919
## 20 August Heat 894
top_month_events <- storm_events %>%
count(EVENT_TYPE, sort = TRUE) %>%
slice_head(n = 8) %>%
pull(EVENT_TYPE)
storm_events %>%
filter(EVENT_TYPE %in% top_month_events) %>%
mutate(
MONTH_NAME = factor(
MONTH_NAME,
levels = month.name
)
) %>%
count(MONTH_NAME, EVENT_TYPE) %>%
ggplot(aes(x = MONTH_NAME, y = n, group = EVENT_TYPE)) +
geom_line() +
facet_wrap(~ EVENT_TYPE, scales = "free_y") +
labs(
title = "Seasonal Patterns for Common Weather Event Types",
x = "Month",
y = "Number of Events"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The data shows clear seasonal patterns. Heat-related events were concentrated in summer, especially July and August. Winter Weather and Winter Storm events were concentrated in the colder months. Tornadoes were most common in spring and early summer, especially April, May, and June.
damage_summary <- storm_events %>%
group_by(EVENT_TYPE) %>%
summarise(
total_damage = sum(total_damage),
property_damage = sum(property_damage_num),
crop_damage = sum(crop_damage_num),
number_of_events = n(),
.groups = "drop"
) %>%
arrange(desc(total_damage))
damage_summary %>%
slice_head(n = 10)
## # A tibble: 10 × 5
## EVENT_TYPE total_damage property_damage crop_damage number_of_events
## <chr> <dbl> <dbl> <dbl> <int>
## 1 Tornado 1909699500 1906326500 3373000 1591
## 2 Flash Flood 1297935550 1297150550 785000 5393
## 3 Wildfire 982347110 788932110 193415000 350
## 4 Thunderstorm Wind 269116580 210492330 58624250 21807
## 5 Flood 92402950 92357950 45000 2261
## 6 Hail 62372500 60072500 2300000 9205
## 7 Debris Flow 50601200 50600200 1000 163
## 8 Drought 39503250 37133250 2370000 3283
## 9 Lightning 22615550 22600150 15400 288
## 10 High Wind 12211600 12162600 49000 4603
damage_summary %>%
slice_head(n = 10) %>%
ggplot(aes(x = reorder(EVENT_TYPE, total_damage),
y = total_damage)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = dollar) +
labs(
title = "Top 10 Weather Event Types by Property and Crop Damage",
x = "Event Type",
y = "Total Damage"
)
Tornadoes caused the greatest economic damage in this dataset, followed by Flash Floods and Wildfires. This is useful because the event types that are most harmful to health are not always the exact same event types that cause the most property and crop damage.
The analysis shows that severe weather risk should be understood in multiple ways. Excessive Heat, Tornadoes, Flash Floods, Heat, and Thunderstorm Wind were among the most harmful event types for population health. However, Tornadoes, Flash Floods, and Wildfires caused the greatest economic damage. The results also show that severe weather events are strongly seasonal and geographically concentrated. For example, heat-related events mostly occurred in summer months, while winter events occurred mainly in January, February, and December. Texas, Virginia, Alabama, California, and Pennsylvania appeared frequently among the highest event counts for major event types. Overall, the data suggests that emergency planning should consider both human health impacts and economic impacts, while also accounting for the season and the specific risks faced by each state.