R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Title: Analysis of NOAA Storm Database, containing records of severe weather events across the U.S., and the impact of varying weather events

Synopsis: This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks the characteristics of major storms and weather events occurring in the U.S., including when and where they occur, along with estimates of any fatalities, injuries or property damage.

Storms and other weather events can cause both public health and economic issues for towns and cities, and this data is likely to illustrate which areas are most affected by natural disasters.

This data analysis includes the following questions:

  1. Across the United States, which type of events (as indicated in the EVENT_TYPE variable) are most harmful with respect to population health? Based on data used for this analysis, Flash Floods (830 total harmed) were found to be the most harmful to population health, with Tornadoes (440) and Floods (174) as second and third-most harmful.

  2. Across the U.S., which type of events happen most in which states? Based on data found for this analysis, the top three most frequent events per state are: Flash Floods in Tennessee (104), Floods in Missouri (24), and Tornadoes in Florida (16).

  3. Which type of events are characterized by which months? Based on data found in this analysis, the top three most frequent months for weather events were August (6), May (5), and September (5). The most common events during these months were Flash Floods (123), Floods (37) and Tornadoes (28), which also correlates to being the three most common events in the data.

  4. What is the most common weather event by month, along with event total? Based on the data analysis, in ten of the twelve months, the most common weather event was either flash floods, tornadoes or floods, which correlates to those weather events ranking in the top three in our data analysis.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Data Processing:

Load necessary libraries for data analysis

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
library(knitr)

Read CSV files

Needed to use files titled d2024_c20250818 ending because location file with d2025 had no data when analysis was conducted

details <- read.csv("C:/Users/rjani/Downloads/StormEvents - Details/StormEvents_details-ftp_v1.0_d2024_c20250818.csv")
fatalities <- read.csv("C:/Users/rjani/Downloads/StormEvents - Fatalities/StormEvents_fatalities-ftp_v1.0_d2024_c20250818.csv")
locations <- read.csv("C:/Users/rjani/Downloads/StormEvents - Locations/StormEvents_locations-ftp_v1.0_d2024_c20250818.csv")

Merge data files by event ID

StormEvents_joined_data <- merge(details, fatalities, by = "EVENT_ID")
StormEvents_joined_data <- merge(StormEvents_joined_data, locations, by = "EVENT_ID")

#1 Harmful events for population

health_impact <- StormEvents_joined_data %>%
  group_by(EVENT_TYPE) %>%
  summarise(
    total_fatalities = sum(DEATHS_DIRECT, DEATHS_INDIRECT, na.rm = TRUE),
    total_injuries = sum(INJURIES_DIRECT, INJURIES_INDIRECT, na.rm = TRUE),
    total_harmed = total_fatalities + total_injuries
  ) %>%
  arrange(desc(total_harmed))
head(health_impact, 8)
## # A tibble: 8 × 4
##   EVENT_TYPE         total_fatalities total_injuries total_harmed
##   <chr>                         <int>          <int>        <int>
## 1 Flash Flood                     826              4          830
## 2 Tornado                         137            303          440
## 3 Flood                           174              0          174
## 4 Thunderstorm Wind                32             19           51
## 5 Debris Flow                      24             24           48
## 6 Marine Strong Wind               16             12           28
## 7 Dust Devil                        8              8           16
## 8 Lightning                         6              5           11

2 Events by state

event_counts_by_state <- StormEvents_joined_data %>%
  group_by(STATE, EVENT_TYPE) %>%
  summarise(event_count = n(), .groups = "drop") %>%
  arrange(desc(event_count))

most_common_event_state <- event_counts_by_state %>%
  group_by(STATE) %>%
  slice_max(event_count, n = 1) %>%
  select(STATE, EVENT_TYPE,event_count) %>%
  arrange(desc(event_count))
print(most_common_event_state, n = Inf)
## # A tibble: 32 × 3
## # Groups:   STATE [30]
##    STATE          EVENT_TYPE         event_count
##    <chr>          <chr>                    <int>
##  1 TENNESSEE      Flash Flood                104
##  2 MISSOURI       Flood                       24
##  3 FLORIDA        Tornado                     16
##  4 VERMONT        Flash Flood                 15
##  5 SOUTH CAROLINA Flood                       14
##  6 ARKANSAS       Flood                       13
##  7 CALIFORNIA     Flood                       12
##  8 OKLAHOMA       Flood                       12
##  9 ALASKA         Debris Flow                  8
## 10 TEXAS          Flash Flood                  8
## 11 ARIZONA        Flash Flood                  6
## 12 OHIO           Flash Flood                  5
## 13 WEST VIRGINIA  Flood                        5
## 14 COLORADO       Flood                        4
## 15 GEORGIA        Tornado                      4
## 16 GUAM WATERS    Marine Strong Wind           4
## 17 NEW MEXICO     Flash Flood                  4
## 18 NORTH CAROLINA Tornado                      4
## 19 PUERTO RICO    Flash Flood                  4
## 20 ALABAMA        Thunderstorm Wind            2
## 21 ALABAMA        Tornado                      2
## 22 INDIANA        Tornado                      2
## 23 NEW YORK       Tornado                      2
## 24 SOUTH DAKOTA   Thunderstorm Wind            2
## 25 IDAHO          Thunderstorm Wind            1
## 26 KANSAS         Thunderstorm Wind            1
## 27 LOUISIANA      Thunderstorm Wind            1
## 28 MISSISSIPPI    Thunderstorm Wind            1
## 29 NEVADA         Thunderstorm Wind            1
## 30 UTAH           Lightning                    1
## 31 UTAH           Thunderstorm Wind            1
## 32 VIRGINIA       Thunderstorm Wind            1

3 Events by month

event_month_summary <- StormEvents_joined_data %>%
  group_by(MONTH_NAME, EVENT_TYPE) %>%
  summarise(event_count = n(), .groups = "drop") %>%
  arrange(MONTH_NAME, desc(event_count))
print(event_month_summary, n = Inf)
## # A tibble: 32 × 3
##    MONTH_NAME EVENT_TYPE         event_count
##    <chr>      <chr>                    <int>
##  1 April      Flood                        4
##  2 August     Flash Flood                 10
##  3 August     Debris Flow                  8
##  4 August     Dust Devil                   4
##  5 August     Thunderstorm Wind            3
##  6 August     Tornado                      2
##  7 August     Lightning                    1
##  8 December   Marine Strong Wind           4
##  9 December   Tornado                      1
## 10 February   Flood                        9
## 11 February   Debris Flow                  8
## 12 January    Tornado                      4
## 13 July       Flash Flood                 15
## 14 July       Thunderstorm Wind            7
## 15 July       Lightning                    2
## 16 July       Tornado                      2
## 17 June       Thunderstorm Wind            8
## 18 June       Lightning                    1
## 19 March      Tornado                      2
## 20 May        Tornado                     22
## 21 May        Flood                       17
## 22 May        Thunderstorm Wind            9
## 23 May        Flash Flood                  5
## 24 May        Lightning                    1
## 25 November   Flood                       54
## 26 November   Flash Flood                 20
## 27 October    Tornado                     12
## 28 September  Flash Flood                108
## 29 September  Flood                       20
## 30 September  Tornado                      4
## 31 September  Thunderstorm Wind            3
## 32 September  Lightning                    1

4 Most Common Events by month

most_common_event_month <- event_month_summary %>%
  group_by(MONTH_NAME) %>%
  slice_max(event_count, n = 1) %>%
  select(MONTH_NAME, EVENT_TYPE,event_count) %>%
  arrange(desc(event_count))
print(most_common_event_month, n = Inf)
## # A tibble: 12 × 3
## # Groups:   MONTH_NAME [12]
##    MONTH_NAME EVENT_TYPE         event_count
##    <chr>      <chr>                    <int>
##  1 September  Flash Flood                108
##  2 November   Flood                       54
##  3 May        Tornado                     22
##  4 July       Flash Flood                 15
##  5 October    Tornado                     12
##  6 August     Flash Flood                 10
##  7 February   Flood                        9
##  8 June       Thunderstorm Wind            8
##  9 April      Flood                        4
## 10 December   Marine Strong Wind           4
## 11 January    Tornado                      4
## 12 March      Tornado                      2

Including Plots

This plot consists of the results from question 1, which also included the report regarding population health that can be found in the Results section

You can also embed plots, for example:

event_graph <- ggplot(data=health_impact, aes(x=reorder(EVENT_TYPE, -total_harmed), y=total_harmed, fill=EVENT_TYPE))
event_graph <- event_graph + geom_bar(stat="identity")
event_graph <- event_graph + theme(axis.text.x = element_text(angle=90)) + xlab("Type of Event")
event_graph <- event_graph + ylab("Combined Injuries & Fatalities") + ggtitle("Most Harmful Events") + theme(legend.position = "none")
event_graph <- event_graph + ylim(c(0,850))
event_graph

Results:

#1: Most harmful events with respect to population health

EVENT_TYPE total_fatalities total_injuries total_harmed 1 Flash Flood 826 4 830 2 Tornado 137 303 440 3 Flood 174 0 174 4 Thunderstorm Wind 32 19 51 5 Debris Flow 24 24 48 6 Marine Strong Wind 16 12 28 7 Dust Devil 8 8 16 8 Lightning 6 5 11

#2: Most frequent events occurring by State

STATE         EVENT_TYPE              event_count
        

1 TENNESSEE Flash Flood 104 2 MISSOURI Flood 24 3 FLORIDA Tornado 16 4 VERMONT Flash Flood 15 5 SOUTH CAROLINA Flood 14 6 ARKANSAS Flood 13 7 CALIFORNIA Flood 12 8 OKLAHOMA Flood 12 9 ALASKA Debris Flow 8 10 TEXAS Flash Flood 8 11 ARIZONA Flash Flood 6 12 OHIO Flash Flood 5 13 WEST VIRGINIA Flood 5 14 COLORADO Flood 4 15 GEORGIA Tornado 4 16 GUAM WATERS Marine Strong Wind 4 17 NEW MEXICO Flash Flood 4 18 NORTH CAROLINA Tornado 4 19 PUERTO RICO Flash Flood 4 20 ALABAMA Thunderstorm Wind 2 21 ALABAMA Tornado 2 22 INDIANA Tornado 2 23 NEW YORK Tornado 2 24 SOUTH DAKOTA Thunderstorm Wind 2 25 IDAHO Thunderstorm Wind 1 26 KANSAS Thunderstorm Wind 1 27 LOUISIANA Thunderstorm Wind 1 28 MISSISSIPPI Thunderstorm Wind 1 29 NEVADA Thunderstorm Wind 1 30 UTAH Lightning 1 31 UTAH Thunderstorm Wind 1 32 VIRGINIA Thunderstorm Wind 1

#3: Events categorized by Month

MONTH_NAME EVENT_TYPE event_count

1 April Flood 4 2 August Flash Flood 10 3 August Debris Flow 8 4 August Dust Devil 4 5 August Thunderstorm Wind 3 6 August Tornado 2 7 August Lightning 1 8 December Marine Strong Wind 4 9 December Tornado 1 10 February Flood 9 11 February Debris Flow 8 12 January Tornado 4 13 July Flash Flood 15 14 July Thunderstorm Wind 7 15 July Lightning 2 16 July Tornado 2 17 June Thunderstorm Wind 8 18 June Lightning 1 19 March Tornado 2 20 May Tornado 22 21 May Flood 17 22 May Thunderstorm Wind 9 23 May Flash Flood 5 24 May Lightning 1 25 November Flood 54 26 November Flash Flood 20 27 October Tornado 12 28 September Flash Flood 108 29 September Flood 20 30 September Tornado 4 31 September Thunderstorm Wind 3 32 September Lightning 1

#4: Most frequent events occurring each month, along with total number of events

MONTH_NAME EVENT_TYPE event_count

1 September Flash Flood 108 2 November Flood 54 3 May Tornado 22 4 July Flash Flood 15 5 October Tornado 12 6 August Flash Flood 10 7 February Flood 9 8 June Thunderstorm Wind 8 9 April Flood 4 10 December Marine Strong Wind 4 11 January Tornado 4 12 March Tornado 2