This analysis explores the NOAA Storm Database to identify the most harmful weather events in the United States in terms of public health and economic impact. The data includes events from 1950 to 2011. The analysis focuses on two main questions: (1) Which types of events are most harmful to population health (fatalities and injuries)? (2) Which types of events have the greatest economic consequences (property and crop damage)? Results show that tornadoes have the highest impact on human health, while floods lead in economic damage. All processing and visualization were done using R.
knitr::opts_chunk$set(echo = TRUE, cache = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
# Load the raw data
data <- read.csv("repdata_data_StormData (1).csv.bz2")
# Check column names
colnames(data)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
# Select relevant columns
storm_data <- data %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,
CROPDMG, CROPDMGEXP)
# Function to convert exponents to actual multipliers
exp_converter <- function(exp) {
ifelse(exp %in% c("K", "k"), 1e3,
ifelse(exp %in% c("M", "m"), 1e6,
ifelse(exp %in% c("B", "b"), 1e9, 1)))
}
storm_data <- storm_data %>%
mutate(PROPDMGVAL = PROPDMG * exp_converter(PROPDMGEXP),
CROPDMGVAL = CROPDMG * exp_converter(CROPDMGEXP),
TOTALDMG = PROPDMGVAL + CROPDMGVAL)
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(Fatalities = sum(FATALITIES, na.rm = TRUE),
Injuries = sum(INJURIES, na.rm = TRUE)) %>%
mutate(Total = Fatalities + Injuries) %>%
arrange(desc(Total)) %>%
top_n(10, Total)
ggplot(health_impact, aes(x = reorder(EVTYPE, -Total), y = Total)) +
geom_bar(stat = "identity", fill = "tomato") +
labs(title = "Top 10 Weather Events Causing Most Health Impact",
x = "Event Type", y = "Total Injuries + Fatalities") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(Economic_Damage = sum(TOTALDMG, na.rm = TRUE)) %>%
arrange(desc(Economic_Damage)) %>%
top_n(10, Economic_Damage)
ggplot(economic_impact, aes(x = reorder(EVTYPE, -Economic_Damage), y = Economic_Damage / 1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Top 10 Weather Events Causing Most Economic Damage",
x = "Event Type", y = "Damage (in Billion USD)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
The analysis reveals that: - Tornadoes are the leading cause of human injuries and fatalities. - Floods, followed by hurricanes and tornadoes, cause the most economic damage.
This information can help municipalities prioritize resources for disaster preparedness and response.