Analysis of Severe Weather Events in the United States

Synopsis

This document analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to identify the most harmful types of severe weather events with respect to population health and economic consequences. The analysis includes summaries and visualizations of the total number of fatalities, injuries, and estimated property damage by event type. The focus is on the top 10 event types that have the greatest impact on both health and economy.

Data Processing

The data were loaded from a CSV file using the readr package. The dataset was then processed to summarize the total number of fatalities, injuries, and estimated property damage by event type. The summaries were created using the dplyr package within the tidyverse.

Load the dataset

# Load necessary libraries
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ purrr     1.0.2
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Load the dataset
repdata_data_StormData <- read_csv("repdata_data_StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display the structure of the dataset
str(repdata_data_StormData)
## spc_tbl_ [902,297 × 37] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ STATE__   : num [1:902297] 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr [1:902297] "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr [1:902297] "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr [1:902297] "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num [1:902297] 97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr [1:902297] "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr [1:902297] "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr [1:902297] "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr [1:902297] NA NA NA NA ...
##  $ BGN_LOCATI: chr [1:902297] NA NA NA NA ...
##  $ END_DATE  : chr [1:902297] NA NA NA NA ...
##  $ END_TIME  : chr [1:902297] NA NA NA NA ...
##  $ COUNTY_END: num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi [1:902297] NA NA NA NA NA NA ...
##  $ END_RANGE : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr [1:902297] NA NA NA NA ...
##  $ END_LOCATI: chr [1:902297] NA NA NA NA ...
##  $ LENGTH    : num [1:902297] 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num [1:902297] 100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : num [1:902297] 3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num [1:902297] 0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num [1:902297] 15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num [1:902297] 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr [1:902297] "K" "K" "K" "K" ...
##  $ CROPDMG   : num [1:902297] 0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr [1:902297] NA NA NA NA ...
##  $ WFO       : chr [1:902297] NA NA NA NA ...
##  $ STATEOFFIC: chr [1:902297] NA NA NA NA ...
##  $ ZONENAMES : chr [1:902297] NA NA NA NA ...
##  $ LATITUDE  : num [1:902297] 3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num [1:902297] 8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num [1:902297] 3051 0 0 0 0 ...
##  $ LONGITUDE_: num [1:902297] 8806 0 0 0 0 ...
##  $ REMARKS   : chr [1:902297] NA NA NA NA ...
##  $ REFNUM    : num [1:902297] 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   STATE__ = col_double(),
##   ..   BGN_DATE = col_character(),
##   ..   BGN_TIME = col_character(),
##   ..   TIME_ZONE = col_character(),
##   ..   COUNTY = col_double(),
##   ..   COUNTYNAME = col_character(),
##   ..   STATE = col_character(),
##   ..   EVTYPE = col_character(),
##   ..   BGN_RANGE = col_double(),
##   ..   BGN_AZI = col_character(),
##   ..   BGN_LOCATI = col_character(),
##   ..   END_DATE = col_character(),
##   ..   END_TIME = col_character(),
##   ..   COUNTY_END = col_double(),
##   ..   COUNTYENDN = col_logical(),
##   ..   END_RANGE = col_double(),
##   ..   END_AZI = col_character(),
##   ..   END_LOCATI = col_character(),
##   ..   LENGTH = col_double(),
##   ..   WIDTH = col_double(),
##   ..   F = col_double(),
##   ..   MAG = col_double(),
##   ..   FATALITIES = col_double(),
##   ..   INJURIES = col_double(),
##   ..   PROPDMG = col_double(),
##   ..   PROPDMGEXP = col_character(),
##   ..   CROPDMG = col_double(),
##   ..   CROPDMGEXP = col_character(),
##   ..   WFO = col_character(),
##   ..   STATEOFFIC = col_character(),
##   ..   ZONENAMES = col_character(),
##   ..   LATITUDE = col_double(),
##   ..   LONGITUDE = col_double(),
##   ..   LATITUDE_E = col_double(),
##   ..   LONGITUDE_ = col_double(),
##   ..   REMARKS = col_character(),
##   ..   REFNUM = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

Processing the dataset

# Summarize the data by event type, calculating total fatalities and injuries
event_summary <- repdata_data_StormData %>%
  group_by(EVTYPE) %>%
  summarize(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  arrange(desc(total_fatalities), desc(total_injuries))

# Filter the data to include only the top 10 event types
top_10_harm <- event_summary %>%
  top_n(10)
## Selecting by total_injuries
# Summarize the data by event type, calculating total estimated property damage
property_damage_summary <- repdata_data_StormData %>%
  group_by(EVTYPE) %>%
  summarize(
    total_property_damage = sum(PROPDMG, na.rm = TRUE)
  ) %>%
  arrange(desc(total_property_damage))

# Filter the data to include only the top 10 event types
top_10_property_damage <- property_damage_summary %>%
  top_n(10)
## Selecting by total_property_damage

Results

Top 10 Event Types by Total Fatalities and Injuries

Fatalities

A bar plot showing the total number of fatalities for the top 10 event types.

# Create a bar plot showing the total number of fatalities for the top 10 event types
ggplot(top_10_harm, aes(x = reorder(EVTYPE, total_fatalities), y = total_fatalities)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(
    title = "Top 10 Total Fatalities by Event Type",
    x = "Event Type",
    y = "Total Fatalities"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

##### Injuries A bar plot showing the total number of injuries for the top 10 event types.

# Create a bar plot showing the total number of injuries for the top 10 event types
ggplot(top_10_harm, aes(x = reorder(EVTYPE, total_injuries), y = total_injuries)) +
  geom_bar(stat = "identity", fill = "indianred") +
  labs(
    title = "Top 10 Total Injuries by Event Type",
    x = "Event Type",
    y = "Total Injuries"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Top 10 Event Types by Total Estimated Property Damage

A bar plot showing the total estimated property damage for the top 10 event types.

# Create a bar plot showing the total estimated property damage for the top 10 event types
ggplot(top_10_property_damage, aes(x = reorder(EVTYPE, total_property_damage), y = total_property_damage)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(
    title = "Top 10 Event Types by Total Estimated Property Damage",
    x = "Event Type",
    y = "Total Estimated Property Damage (in thousands)"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

This document provides an overview of the most harmful severe weather events in the United States, summarizing the impact on population health and economic consequences. The plots above highlight the top 10 event types that contribute the most to fatalities, injuries, and property damage.