This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which severe weather events cause the most harm to population health and the economy. The data spans from 1950 to 2011.
Our analysis reveals that Tornadoes are by far the most dangerous event to population health, causing the highest number of fatalities and injuries. Regarding economic impact, Floods cause the greatest total property and crop damage.
First, we load the required libraries and download the dataset directly from the course website.
# Download and load the data
file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "StormData.csv.bz2"
if (!file.exists(dest_file)) {
download.file(file_url, dest_file, method = "auto")
}
# Read the bz2 file
storm_data <- read.csv(dest_file)
To answer our questions, we only need a subset of the columns:
EVTYPE (Event Type), FATALITIES,
INJURIES, PROPDMG (Property Damage),
PROPDMGEXP (Property Damage Exponent), CROPDMG
(Crop Damage), and CROPDMGEXP (Crop Damage Exponent).
The economic damage columns use alphabetical characters (K, M, B) to signify thousands, millions, and billions. We must convert these into numerical multipliers to calculate the total cost.
clean_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(
# Standardize exponents to uppercase
PROPDMGEXP = toupper(PROPDMGEXP),
CROPDMGEXP = toupper(CROPDMGEXP),
# Convert characters to numerical multipliers
prop_mult = case_when(
PROPDMGEXP == "K" ~ 1000,
PROPDMGEXP == "M" ~ 1000000,
PROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 1
),
crop_mult = case_when(
CROPDMGEXP == "K" ~ 1000,
CROPDMGEXP == "M" ~ 1000000,
CROPDMGEXP == "B" ~ 1000000000,
TRUE ~ 1
),
# Calculate actual dollar amounts
TotalPropDmg = PROPDMG * prop_mult,
TotalCropDmg = CROPDMG * crop_mult,
TotalEconDmg = TotalPropDmg + TotalCropDmg,
# Calculate total health impact
TotalHealthImpact = FATALITIES + INJURIES
)
We aggregate the total fatalities and injuries by event type and identify the top 10 most harmful events.
health_data <- clean_data %>%
group_by(EVTYPE) %>%
summarize(Total_Casualties = sum(TotalHealthImpact, na.rm = TRUE)) %>%
arrange(desc(Total_Casualties)) %>%
slice(1:10)
# Plot the health data
ggplot(health_data, aes(x = reorder(EVTYPE, Total_Casualties), y = Total_Casualties)) +
geom_bar(stat = "identity", fill = "darkred") +
coord_flip() +
labs(title = "Top 10 Weather Events by Population Health Impact",
x = "Event Type",
y = "Total Casualties (Fatalities + Injuries)") +
theme_minimal()
As seen in the chart above, Tornadoes overwhelmingly cause the highest number of casualties.
We aggregate the combined property and crop damage by event type to find the top 10 most economically damaging events.
econ_data <- clean_data %>%
group_by(EVTYPE) %>%
summarize(Total_Damage = sum(TotalEconDmg, na.rm = TRUE)) %>%
arrange(desc(Total_Damage)) %>%
slice(1:10)
# Plot the economic data
ggplot(econ_data, aes(x = reorder(EVTYPE, Total_Damage), y = Total_Damage)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events by Economic Impact",
x = "Event Type",
y = "Total Economic Damage (in US Dollars)") +
theme_minimal()
As shown in the chart, Floods have caused the greatest economic damage, followed closely by Hurricanes/Typhoons and Tornadoes.