Synopsis This analysis uses data from NOAA to explore which types of storms have the most impact on public health and the economy. Combining data on injuries, fatalities, property, and crop damage will allow the reader and municipal decision-makers to understand the data they need to prioritize resources. The analysis addresses two main questions: (1) Which types of events are most harmful with respect to population health? (2) Which types of events have the greatest economic consequences? The original dataset comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. Documentation is available to explain how some variables are constructed and defined. The following sections cover data processing, computations, figures, and results. All code is shown for full reproducibility.
Part 1: Data Processing In this section, we load and process the data using the following R packages:
dplyr tidyverse knitr ggplot2 scales Load Packages and Data
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0 v readr 2.1.5
## v ggplot2 3.5.1 v stringr 1.5.1
## v lubridate 1.9.4 v tibble 3.2.1
## v purrr 1.0.4 v tidyr 1.3.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
# Download the data file (if not already downloaded)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "StormData.csv.bz2")
# Read the CSV file (adjust the file path if needed)
stormdata <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
Create a Smaller Dataset with Relevant Variables Here we extract the columns for event type, fatalities, injuries, property damage, and crop damage. We then bind these columns into a new data frame and convert the damage values and impact measures to numeric.
# Extract relevant columns and bind them together
stormdata2 <- cbind(stormdata$EVTYPE, stormdata$FATALITIES, stormdata$INJURIES,
stormdata$PROPDMG, stormdata$CROPDMG)
# Replace NA values with 0 and coerce to data frame
stormdata2 <- as.data.frame(stormdata2)
stormdata2 <- stormdata2 %>% replace_na(list(V2 = 0, V3 = 0, V4 = 0, V5 = 0))
# Rename columns for clarity
colnames(stormdata2) <- c("Event", "Fatalities", "Injuries", "Prop", "Crop")
# Convert columns to numeric (if not already numeric)
stormdata2 <- transform(stormdata2,
Fatalities = as.numeric(Fatalities),
Injuries = as.numeric(Injuries),
Prop = as.numeric(Prop),
Crop = as.numeric(Crop))
# Create a new column for overall health impact (fatalities + injuries)
stormdata2 <- mutate(stormdata2, Health.Impact = Fatalities + Injuries)
Part 2: Computations and Figures We now address our two main questions: identifying events that are most harmful to public health and those with the greatest economic consequences.
2.1 Population Health Impact Compute Top 5 Events for Health Impact
PublicHealth <- stormdata2 %>%
select(Health.Impact, Event) %>%
group_by(Event) %>%
summarise(Health.Impact = sum(Health.Impact, na.rm = TRUE)) %>%
slice_max(Health.Impact, n = 5)
Bar Plot for Health Impact
ggplot(PublicHealth, aes(x = reorder(Event, -Health.Impact), y = Health.Impact)) +
geom_bar(stat = "identity", fill = "firebrick3") +
labs(title = "Health Impact of Storms by Type of Event",
caption = "Data between 1950 and 2011",
x = "Event Type",
y = "Fatalities + Injuries") +
geom_text(aes(label = Health.Impact), fontface = "bold", nudge_y = 25)
2.2 Economic Consequences Compute Top 5 Events for Property Damage
Property.Damage <- stormdata2 %>%
select(Prop, Event) %>%
group_by(Event) %>%
summarise(Prop = sum(Prop, na.rm = TRUE)) %>%
slice_max(Prop, n = 5)
Compute Top 5 Events for Crop Damage
Crop.Damage <- stormdata2 %>%
select(Crop, Event) %>%
group_by(Event) %>%
summarise(Crop = sum(Crop, na.rm = TRUE)) %>%
slice_max(Crop, n = 5)
Bar Plot for Property Damage
ggplot(Property.Damage, aes(x = reorder(Event, -Prop), y = Prop)) +
geom_bar(stat = "identity", fill = "deepskyblue4") +
labs(title = "Property Damage Totals by Type of Event",
caption = "Data between 1950 and 2011",
x = "Event Type",
y = "Property Damage ($)") +
scale_y_continuous(labels = scales::dollar) +
geom_text(aes(label = Prop), fontface = "bold", nudge_y = max(Property.Damage$Prop)*0.02)
Bar Plot for Crop Damage
ggplot(Crop.Damage, aes(x = reorder(Event, -Crop), y = Crop)) +
geom_bar(stat = "identity", fill = "forestgreen") +
labs(title = "Crop Damage Totals by Type of Event",
caption = "Data between 1950 and 2011",
x = "Event Type",
y = "Crop Damage ($)") +
scale_y_continuous(labels = scales::dollar) +
geom_text(aes(label = Crop), fontface = "bold", nudge_y = max(Crop.Damage$Crop)*0.02)
Part 3: Results Upon analysis of the NOAA Storm Database, the results indicate that tornadoes have the largest impact on public health, as measured by the sum of injuries and fatalities. In terms of economic consequences, while tornadoes cost the most overall in property damage, hail is responsible for the highest crop damage. These insights can help government and municipal managers better prioritize resource allocation and planning for severe weather events.