This report analyzes the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most detrimental to population health and which have the greatest economic consequences. The data covers events from 1950 to 2011, with more complete records in later years. The analysis involves processing raw event type data, extracting and converting fatality, injury, and property/crop damage figures. The results show that tornadoes are, by far, the most harmful event type to population health, causing the most fatalities and injuries. Regarding economic consequences, floods have inflicted the greatest total property damage, while droughts have caused the most significant crop damage. When combined, floods represent the event type with the single greatest overall economic impact. These findings can help government and municipal managers prioritize resource allocation and preparation strategies for different severe weather events.
The analysis starts from the original raw data file. The compressed CSV file is downloaded from the source URL if it is not already present in the working directory.
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "repdata_data_StormData.csv.bz2"
# Download the file if it doesn't exist
if (!file.exists(dest_file)) {
download.file(file_url, destfile = dest_file, method = "curl")
}
# Read the data into R
storm_data <- read.csv(dest_file)
To address the first question on population health, the data is
grouped by event type (EVTYPE
). The total number of
fatalities (FATALITIES
) and injuries
(INJURIES
) are summed for each group. The top 10 most
harmful event types are selected and transformed into a long format
suitable for plotting.
# Summarize health impact data
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE),
Total_Health = Fatalities + Injuries) %>%
arrange(desc(Total_Health)) %>% # Sort by total impact
head(10) # Select top 10 events
# Transform data from wide to long format for ggplot2
health_impact_long <- health_impact %>%
select(EVTYPE, Fatalities, Injuries) %>%
pivot_longer(cols = c(Fatalities, Injuries),
names_to = "Impact_Type",
values_to = "Count")
The economic data requires significant transformation. The cost
values are stored in two pairs of columns:
PROPDMG
/PROPDMGEXP
and
CROPDMG
/CROPDMGEXP
. The *DMGEXP
columns contain alphabetic characters (e.g., ‘K’, ‘M’, ‘B’) that signify
the magnitude (thousands, millions, billions). A function is created to
map these exponents to numerical multipliers. The actual damage in U.S.
dollars is then calculated for both property and crop damage.
Justification: This conversion is critical. Without it, a value
of 5
and PROPDMGEXP = 'B'
($5 Billion) would
be treated the same as a value of 5
and
PROPDMGEXP = 'K'
($5 Thousand), leading to completely
erroneous results.
# Function to convert exponent letters to numeric multipliers
convert_exp <- function(exp) {
exp <- toupper(exp) # Convert to uppercase for consistency
multiplier <- case_when(
exp %in% c("", "+", "-", "?") ~ 1, # Assume base value
exp == "H" ~ 100, # Hundreds
exp == "K" ~ 1000, # Thousands
exp == "M" ~ 1e6, # Millions
exp == "B" ~ 1e9, # Billions
TRUE ~ NA_real_ # Handle any unexpected values as NA
)
return(multiplier)
}
# Calculate economic damage in dollars and summarize
econ_impact <- storm_data %>%
mutate(
Prop_Damage_Dollars = PROPDMG * sapply(PROPDMGEXP, convert_exp),
Crop_Damage_Dollars = CROPDMG * sapply(CROPDMGEXP, convert_exp),
Total_Econ_Damage = Prop_Damage_Dollars + Crop_Damage_Dollars
) %>%
group_by(EVTYPE) %>%
summarise(
Property_Damage = sum(Prop_Damage_Dollars, na.rm = TRUE),
Crop_Damage = sum(Crop_Damage_Dollars, na.rm = TRUE),
Total_Damage = sum(Total_Econ_Damage, na.rm = TRUE)
) %>%
arrange(desc(Total_Damage)) %>%
head(10)
# Transform data for plotting, converting dollars to billions for better axis labels
econ_impact_long <- econ_impact %>%
select(EVTYPE, Property_Damage, Crop_Damage) %>%
pivot_longer(cols = c(Property_Damage, Crop_Damage),
names_to = "Damage_Type",
values_to = "Cost_Dollars") %>%
mutate(Cost_Billions = Cost_Dollars / 1e9)
The following plot shows the top 10 most harmful weather event types by their total impact on population health, broken down into fatalities and injuries.
ggplot(health_impact_long, aes(x = reorder(EVTYPE, -Count), y = Count, fill = Impact_Type)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + # Rotate x-axis labels
labs(title = "Top 10 Most Harmful Weather Events to Population Health",
subtitle = "U.S., 1950-2011",
x = "Event Type",
y = "Total Number of Fatalities & Injuries",
fill = "Type of Impact") +
scale_fill_manual(values = c("Fatalities" = "red3", "Injuries" = "orange"))
Figure 1: Impact of Severe Weather Events on Population Health. This bar chart displays the top 10 event types with the highest combined number of fatalities and injuries. Tornadoes are the most devastating event by a significant margin, causing over 90,000 combined casualties. Excessive heat and flash floods are also major contributors to fatalities, while thunderstorm winds cause a substantial number of injuries.
The plot below illustrates the top 10 most economically damaging event types, with costs separated into property and crop damage (in billions of U.S. dollars).
ggplot(econ_impact_long, aes(x = reorder(EVTYPE, -Cost_Billions), y = Cost_Billions, fill = Damage_Type)) +
geom_bar(stat = "identity") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Top 10 Weather Events with the Greatest Economic Consequences",
subtitle = "U.S., 1950-2011",
x = "Event Type",
y = "Total Cost (Billions of USD)",
fill = "Damage Type") +
scale_fill_manual(values = c("Property_Damage" = "steelblue", "Crop_Damage" = "goldenrod2"),
labels = c("Crop Damage", "Property Damage")) # Clean up legend labels
Figure 2: Economic Impact of Severe Weather Events. This bar chart shows the top 10 most costly event types. Floods have caused the greatest total economic damage, predominantly through property destruction. Hurricanes/typhoons and storm surges are also immensely damaging to property. In contrast, drought is the leading cause of crop damage by a wide margin, followed by floods and river flooding. This highlights how different events threaten different sectors of the economy.