Synopsis:
In this analysis, we explore the NOAA Storm Database to identify severe weather events that have the greatest impact on public health and the economy in the United States. We begin by downloading and processing the raw data, focusing on key variables such as event type, fatalities, injuries, and damages. After cleaning and transforming the dataset, we analyze it to determine the top 10 weather events with the highest health and economic impacts. We find that certain event types, such as tornadoes, cause the most harm to public health, while others, like hurricanes and floods, lead to significant economic losses. Our findings can help government and municipal managers better allocate resources and prioritize planning efforts for various types of severe weather events. The results are visualized with bar plots to provide a clear understanding of the relative impacts of different weather events on both population health and the economy.
Data Processing:
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.0
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
# Download and Read the Data
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(url, "StormData.csv.bz2")
data <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
# Data Processing: Keeping only relevant columns, converting to proper data types, and cleaning up event types.
clean_data <- data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(
EVTYPE = toupper(EVTYPE),
PROPDMGEXP = case_when(
PROPDMGEXP %in% c("K", "k") ~ 1e3,
PROPDMGEXP %in% c("M", "m") ~ 1e6,
PROPDMGEXP %in% c("B", "b") ~ 1e9,
TRUE ~ 0
),
CROPDMGEXP = case_when(
CROPDMGEXP %in% c("K", "k") ~ 1e3,
CROPDMGEXP %in% c("M", "m") ~ 1e6,
CROPDMGEXP %in% c("B", "b") ~ 1e9,
TRUE ~ 0
),
PropertyDamage = PROPDMG * PROPDMGEXP,
CropDamage = CROPDMG * CROPDMGEXP
)
# Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
events_health <- clean_data %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES), Injuries = sum(INJURIES), Total = Fatalities + Injuries) %>%
arrange(desc(Total)) %>%
head(10)
events_health
## # A tibble: 10 × 4
## EVTYPE Fatalities Injuries Total
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
# Question 2: Across the United States, which types of events have the greatest economic consequences?
events_economic <- clean_data %>%
group_by(EVTYPE) %>%
summarise(PropertyDamage = sum(PropertyDamage), CropDamage = sum(CropDamage), Total = PropertyDamage + CropDamage) %>%
arrange(desc(Total)) %>%
head(10)
events_economic
## # A tibble: 10 × 4
## EVTYPE PropertyDamage CropDamage Total
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709800 5661968450 150319678250
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160480 414953110 57352113590
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732266720 3025954450 18758221170
## 6 FLASH FLOOD 16140811510 1421317100 17562128610
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944927810 5022113500 8967041310
RESULTS
# Visualizing the results with bar plots.
ggplot(events_health, aes(x = reorder(EVTYPE, -Total), y = Total)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type") +
ylab("Total Health Impact (Fatalities + Injuries)") +
ggtitle("Top 10 Event Types with Highest Health Impact")
ggplot(events_economic, aes(x = reorder(EVTYPE, -Total), y = Total)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("Event Type") +
ylab("Total Economic Impact (Property Damage + Crop Damage)") +
ggtitle("Top 10 Event Types with Highest Economic Impact")
Summary of Results:
Our analysis of the NOAA Storm Database revealed the top 10 weather events with the highest impact on public health and the economy in the United States. The results are as follows:
Health Impact (Fatalities + Injuries):
Tornadoes have the highest health impact, with 5,633 fatalities and 91,346 injuries, totaling 96,979 affected individuals. Excessive heat is the second most harmful event, causing 1,903 fatalities and 6,525 injuries, totaling 8,428 affected individuals. Other events in the top 10 list include TSTM wind, flood, lightning, heat, flash flood, ice storm, thunderstorm wind, and winter storm. Economic Impact (Property Damage + Crop Damage):
Floods cause the greatest economic losses, with $144,657,709,800 in property damage and $5,661,968,450 in crop damage, totaling $150,319,678,250. Hurricanes/typhoons are the second most costly events, with $69,305,840,000 in property damage and $2,607,872,800 in crop damage, totaling $71,913,712,800. Other events in the top 10 list include tornadoes, storm surge, hail, flash flood, drought, hurricane, river flood, and ice storm. These findings can help inform government and municipal decision-makers in allocating resources and prioritizing planning efforts for various types of severe weather events. The results have been visualized using bar plots to provide a clear understanding of the relative impacts of different weather events on both population health and the economy.