This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Title: Analysis of NOAA Storm Database, containing records of severe weather events across the U.S., and the impact of varying weather events
Synopsis: This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks the characteristics of major storms and weather events occurring in the U.S., including when and where they occur, along with estimates of any fatalities, injuries or property damage.
Storms and other weather events can cause both public health and economic issues for towns and cities, and this data is likely to illustrate which areas are most affected by natural disasters.
This data analysis includes the following questions:
Across the United States, which type of events (as indicated in the EVENT_TYPE variable) are most harmful with respect to population health? Based on data used for this analysis, Flash Floods (830 total harmed) were found to be the most harmful to population health, with Tornadoes (440) and Floods (174) as second and third-most harmful.
Across the U.S., which type of events happen most in which states? Based on data found for this analysis, the top three most frequent events per state are: Flash Floods in Tennessee (104), Floods in Missouri (24), and Tornadoes in Florida (16).
Which type of events are characterized by which months? Based on data found in this analysis, the top three most frequent months for weather events were August (6), May (5), and September (5). The most common events during these months were Flash Floods (123), Floods (37) and Tornadoes (28), which also correlates to being the three most common events in the data.
What is the most common weather event by month, along with event total? Based on the data analysis, in ten of the twelve months, the most common weather event was either flash floods, tornadoes or floods, which correlates to those weather events ranking in the top three in our data analysis.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Data Processing:
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(readr)
library(ggplot2)
library(knitr)
details <- read.csv("C:/Users/rjani/Downloads/StormEvents - Details/StormEvents_details-ftp_v1.0_d2024_c20250818.csv")
fatalities <- read.csv("C:/Users/rjani/Downloads/StormEvents - Fatalities/StormEvents_fatalities-ftp_v1.0_d2024_c20250818.csv")
locations <- read.csv("C:/Users/rjani/Downloads/StormEvents - Locations/StormEvents_locations-ftp_v1.0_d2024_c20250818.csv")
StormEvents_joined_data <- merge(details, fatalities, by = "EVENT_ID")
StormEvents_joined_data <- merge(StormEvents_joined_data, locations, by = "EVENT_ID")
#1 Harmful events for population
health_impact <- StormEvents_joined_data %>%
group_by(EVENT_TYPE) %>%
summarise(
total_fatalities = sum(DEATHS_DIRECT, DEATHS_INDIRECT, na.rm = TRUE),
total_injuries = sum(INJURIES_DIRECT, INJURIES_INDIRECT, na.rm = TRUE),
total_harmed = total_fatalities + total_injuries
) %>%
arrange(desc(total_harmed))
head(health_impact, 8)
## # A tibble: 8 × 4
## EVENT_TYPE total_fatalities total_injuries total_harmed
## <chr> <int> <int> <int>
## 1 Flash Flood 826 4 830
## 2 Tornado 137 303 440
## 3 Flood 174 0 174
## 4 Thunderstorm Wind 32 19 51
## 5 Debris Flow 24 24 48
## 6 Marine Strong Wind 16 12 28
## 7 Dust Devil 8 8 16
## 8 Lightning 6 5 11
event_counts_by_state <- StormEvents_joined_data %>%
group_by(STATE, EVENT_TYPE) %>%
summarise(event_count = n(), .groups = "drop") %>%
arrange(desc(event_count))
most_common_event_state <- event_counts_by_state %>%
group_by(STATE) %>%
slice_max(event_count, n = 1) %>%
select(STATE, EVENT_TYPE,event_count) %>%
arrange(desc(event_count))
print(most_common_event_state, n = Inf)
## # A tibble: 32 × 3
## # Groups: STATE [30]
## STATE EVENT_TYPE event_count
## <chr> <chr> <int>
## 1 TENNESSEE Flash Flood 104
## 2 MISSOURI Flood 24
## 3 FLORIDA Tornado 16
## 4 VERMONT Flash Flood 15
## 5 SOUTH CAROLINA Flood 14
## 6 ARKANSAS Flood 13
## 7 CALIFORNIA Flood 12
## 8 OKLAHOMA Flood 12
## 9 ALASKA Debris Flow 8
## 10 TEXAS Flash Flood 8
## 11 ARIZONA Flash Flood 6
## 12 OHIO Flash Flood 5
## 13 WEST VIRGINIA Flood 5
## 14 COLORADO Flood 4
## 15 GEORGIA Tornado 4
## 16 GUAM WATERS Marine Strong Wind 4
## 17 NEW MEXICO Flash Flood 4
## 18 NORTH CAROLINA Tornado 4
## 19 PUERTO RICO Flash Flood 4
## 20 ALABAMA Thunderstorm Wind 2
## 21 ALABAMA Tornado 2
## 22 INDIANA Tornado 2
## 23 NEW YORK Tornado 2
## 24 SOUTH DAKOTA Thunderstorm Wind 2
## 25 IDAHO Thunderstorm Wind 1
## 26 KANSAS Thunderstorm Wind 1
## 27 LOUISIANA Thunderstorm Wind 1
## 28 MISSISSIPPI Thunderstorm Wind 1
## 29 NEVADA Thunderstorm Wind 1
## 30 UTAH Lightning 1
## 31 UTAH Thunderstorm Wind 1
## 32 VIRGINIA Thunderstorm Wind 1
event_month_summary <- StormEvents_joined_data %>%
group_by(MONTH_NAME, EVENT_TYPE) %>%
summarise(event_count = n(), .groups = "drop") %>%
arrange(MONTH_NAME, desc(event_count))
print(event_month_summary, n = Inf)
## # A tibble: 32 × 3
## MONTH_NAME EVENT_TYPE event_count
## <chr> <chr> <int>
## 1 April Flood 4
## 2 August Flash Flood 10
## 3 August Debris Flow 8
## 4 August Dust Devil 4
## 5 August Thunderstorm Wind 3
## 6 August Tornado 2
## 7 August Lightning 1
## 8 December Marine Strong Wind 4
## 9 December Tornado 1
## 10 February Flood 9
## 11 February Debris Flow 8
## 12 January Tornado 4
## 13 July Flash Flood 15
## 14 July Thunderstorm Wind 7
## 15 July Lightning 2
## 16 July Tornado 2
## 17 June Thunderstorm Wind 8
## 18 June Lightning 1
## 19 March Tornado 2
## 20 May Tornado 22
## 21 May Flood 17
## 22 May Thunderstorm Wind 9
## 23 May Flash Flood 5
## 24 May Lightning 1
## 25 November Flood 54
## 26 November Flash Flood 20
## 27 October Tornado 12
## 28 September Flash Flood 108
## 29 September Flood 20
## 30 September Tornado 4
## 31 September Thunderstorm Wind 3
## 32 September Lightning 1
most_common_event_month <- event_month_summary %>%
group_by(MONTH_NAME) %>%
slice_max(event_count, n = 1) %>%
select(MONTH_NAME, EVENT_TYPE,event_count) %>%
arrange(desc(event_count))
print(most_common_event_month, n = Inf)
## # A tibble: 12 × 3
## # Groups: MONTH_NAME [12]
## MONTH_NAME EVENT_TYPE event_count
## <chr> <chr> <int>
## 1 September Flash Flood 108
## 2 November Flood 54
## 3 May Tornado 22
## 4 July Flash Flood 15
## 5 October Tornado 12
## 6 August Flash Flood 10
## 7 February Flood 9
## 8 June Thunderstorm Wind 8
## 9 April Flood 4
## 10 December Marine Strong Wind 4
## 11 January Tornado 4
## 12 March Tornado 2
You can also embed plots, for example:
event_graph <- ggplot(data=health_impact, aes(x=reorder(EVENT_TYPE, -total_harmed), y=total_harmed, fill=EVENT_TYPE))
event_graph <- event_graph + geom_bar(stat="identity")
event_graph <- event_graph + theme(axis.text.x = element_text(angle=90)) + xlab("Type of Event")
event_graph <- event_graph + ylab("Combined Injuries & Fatalities") + ggtitle("Most Harmful Events") + theme(legend.position = "none")
event_graph <- event_graph + ylim(c(0,850))
event_graph
Results:
#1: Most harmful events with respect to population health
EVENT_TYPE total_fatalities total_injuries total_harmed
#2: Most frequent events occurring by State
STATE EVENT_TYPE event_count
1 TENNESSEE Flash Flood 104 2 MISSOURI Flood 24 3 FLORIDA Tornado 16 4 VERMONT Flash Flood 15 5 SOUTH CAROLINA Flood 14 6 ARKANSAS Flood 13 7 CALIFORNIA Flood 12 8 OKLAHOMA Flood 12 9 ALASKA Debris Flow 8 10 TEXAS Flash Flood 8 11 ARIZONA Flash Flood 6 12 OHIO Flash Flood 5 13 WEST VIRGINIA Flood 5 14 COLORADO Flood 4 15 GEORGIA Tornado 4 16 GUAM WATERS Marine Strong Wind 4 17 NEW MEXICO Flash Flood 4 18 NORTH CAROLINA Tornado 4 19 PUERTO RICO Flash Flood 4 20 ALABAMA Thunderstorm Wind 2 21 ALABAMA Tornado 2 22 INDIANA Tornado 2 23 NEW YORK Tornado 2 24 SOUTH DAKOTA Thunderstorm Wind 2 25 IDAHO Thunderstorm Wind 1 26 KANSAS Thunderstorm Wind 1 27 LOUISIANA Thunderstorm Wind 1 28 MISSISSIPPI Thunderstorm Wind 1 29 NEVADA Thunderstorm Wind 1 30 UTAH Lightning 1 31 UTAH Thunderstorm Wind 1 32 VIRGINIA Thunderstorm Wind 1
#3: Events categorized by Month
MONTH_NAME EVENT_TYPE event_count
1 April Flood 4 2 August Flash Flood 10 3 August Debris Flow 8 4 August Dust Devil 4 5 August Thunderstorm Wind 3 6 August Tornado 2 7 August Lightning 1 8 December Marine Strong Wind 4 9 December Tornado 1 10 February Flood 9 11 February Debris Flow 8 12 January Tornado 4 13 July Flash Flood 15 14 July Thunderstorm Wind 7 15 July Lightning 2 16 July Tornado 2 17 June Thunderstorm Wind 8 18 June Lightning 1 19 March Tornado 2 20 May Tornado 22 21 May Flood 17 22 May Thunderstorm Wind 9 23 May Flash Flood 5 24 May Lightning 1 25 November Flood 54 26 November Flash Flood 20 27 October Tornado 12 28 September Flash Flood 108 29 September Flood 20 30 September Tornado 4 31 September Thunderstorm Wind 3 32 September Lightning 1
#4: Most frequent events occurring each month, along with total number of events
MONTH_NAME EVENT_TYPE event_count
1 September Flash Flood 108 2 November Flood 54 3 May Tornado 22 4 July Flash Flood 15 5 October Tornado 12 6 August Flash Flood 10 7 February Flood 9 8 June Thunderstorm Wind 8 9 April Flood 4 10 December Marine Strong Wind 4 11 January Tornado 4 12 March Tornado 2