This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (from 1950 to November 2011) to determine which weather events are most harmful to population health and which have the greatest economic consequences.
This report provides a reproducible workflow for municipal managers to assess environmental risks according to the following methods:
Download the file from the source URL if it does not already exist in the working directory. Then, load the raw CSV file directly.
# Setting up code chunks and loading packages
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(ggplot2)
library(tidyr)
# URL for the dataset
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "repdata_data_StormData.csv.bz2"
# Download if not present
if (!file.exists(fileName)) {
download.file(fileUrl, destfile = fileName, method = "libcurl", mode = "wb")
}
# Reading the data
storm_data <- read.csv(fileName, sep=",", header=T)
# Subsetting the data to keep only the NB columns
Keep_columns <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
storm_sub <- storm_data[, Keep_columns]
# Exponent conversion
exp <- c("H"=100, "K"=1000, "M"=1e6, "B"=1e9)
# Calculate Property and Crop damage as numeric values
# (use toupper to ensure 'k' and 'K' are included)
storm_sub <- storm_sub %>%
mutate(
prop_val = PROPDMG * ifelse(toupper(PROPDMGEXP) %in% names(exp),
exp[toupper(PROPDMGEXP)], 1),
crop_val = CROPDMG * ifelse(toupper(CROPDMGEXP) %in% names(exp),
exp[toupper(CROPDMGEXP)], 1),
total_econ = prop_val + crop_val
)
# Aggregate Health Data
health_totals <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(health_impact = sum(FATALITIES + INJURIES, na.rm = TRUE)) %>%
arrange(desc(health_impact))
# Aggregate Economic Data
econ_totals <- storm_sub %>%
group_by(EVTYPE) %>%
summarise(economic_impact = sum(total_econ, na.rm = TRUE)) %>%
arrange(desc(economic_impact))
To determine which events are most harmful to population health, the
total number of fatalities and injuries for each event type were
aggregated.
The following table and plot show the 10 most harmful event types.
# Getting the top 10 health-harming events
top10_health <- head(health_totals, 10)
# Display the summary table
print(top10_health)
## # A tibble: 10 × 2
## EVTYPE health_impact
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
# Plotting the health impact
ggplot(top10_health,
aes(x = reorder(EVTYPE, health_impact), y = health_impact)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Most Harmful Events to Population Health (Fatalities and Injuries Combined)",
x = "Event Type",
y = "Total Number of People Affected"
) +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Figure 1: This plot illustrates that Tornadoes are the leading cause of health-related issues in the NOAA database, followed by excessive heat and thunderstorm winds
To evaluate economic consequences, the sum of property and crop
damage was calculated.
The data was converted into billions of dollars for easier
interpretation by municipal managers.
# Getting the top 10 economic-impact events
top_econ <- head(econ_totals, 10)
# Display the summary table
print(top_econ)
## # A tibble: 10 × 2
## EVTYPE economic_impact
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758222016.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
# Plotting the economic impact
ggplot(top_econ,
aes(x = reorder(EVTYPE, economic_impact), y = economic_impact / 1e9)) +
geom_bar(stat = "identity", fill = "blue") +
labs(
title = "Events with Greatest Economic Impact (Property and Crop Damage Combined)",
x = "Event Type",
y = "Total Economic Damage (Billions of USD)"
) +
theme(axis.text.x = element_text(angle = 75, hjust = 1))
Figure 2: This plot highlights that Floods have the highest total economic impact, followed by damage caused by hurricanes or tornadoes.
The analysis shows that Tornadoes are the most harmful to population health in terms of both fatalities and injuries.
The analysis shows that Floods have caused the greatest property and crop damage combined.
=======================================================================================================