This analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine which types of severe weather events are most harmful to population health and have the greatest economic consequences. The database contains information on major storms and weather events from 1950 to November 2011, including fatalities, injuries, and property damage estimates. We analyzed the data to identify the top weather events causing health impacts (fatalities and injuries) and economic damage (property and crop damage). The results show that tornadoes are the most harmful to population health, causing the highest combined fatalities and injuries. For economic consequences, floods cause the greatest total damage when combining property and crop losses. These findings can help government and municipal managers prioritize resources and prepare for the most impactful severe weather events. The analysis was conducted using R and follows reproducible research principles with all code and data processing steps documented.
First, we load the necessary R packages for data manipulation and visualization.
# Install packages if you don't have them
# Uncomment these lines if needed:
# install.packages("dplyr")
# install.packages("ggplot2")
library(dplyr)
library(ggplot2)
We download the storm data file from the course website and read it directly from the compressed format.
# Download the data file if it doesn't exist
if(!file.exists("stormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "stormData.csv.bz2")
}
# Read the data directly from the compressed file
storm_data <- read.csv("stormData.csv.bz2")
# Check the structure of the data
dim(storm_data)
## [1] 902297 37
head(storm_data, 3)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
We calculate total health impact by summing fatalities and injuries for each event type.
# Select only the columns we need for health analysis
health_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
group_by(EVTYPE) %>%
summarize(
Total_Fatalities = sum(FATALITIES, na.rm = TRUE),
Total_Injuries = sum(INJURIES, na.rm = TRUE),
Total_Health_Impact = Total_Fatalities + Total_Injuries
) %>%
arrange(desc(Total_Health_Impact)) %>%
head(10)
# Display the top 10 events
print(health_data)
## # A tibble: 10 × 4
## EVTYPE Total_Fatalities Total_Injuries Total_Health_Impact
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
## 6 HEAT 937 2100 3037
## 7 FLASH FLOOD 978 1777 2755
## 8 ICE STORM 89 1975 2064
## 9 THUNDERSTORM WIND 133 1488 1621
## 10 WINTER STORM 206 1321 1527
We process property and crop damage data, converting the alphabetic exponents to numeric values.
# Function to convert damage exponents to multipliers
convert_exponent <- function(exp_value) {
exp_value <- toupper(as.character(exp_value))
ifelse(exp_value == "K", 1000,
ifelse(exp_value == "M", 1000000,
ifelse(exp_value == "B", 1000000000,
ifelse(exp_value == "H", 100,
ifelse(exp_value %in% c("0","1","2","3","4","5","6","7","8"), 10,
ifelse(exp_value == "", 1, 1))))))
}
# Calculate total economic damage
economic_data <- storm_data %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(
Property_Damage = PROPDMG * sapply(PROPDMGEXP, convert_exponent),
Crop_Damage = CROPDMG * sapply(CROPDMGEXP, convert_exponent),
Total_Damage = Property_Damage + Crop_Damage
) %>%
group_by(EVTYPE) %>%
summarize(
Total_Property_Damage = sum(Property_Damage, na.rm = TRUE),
Total_Crop_Damage = sum(Crop_Damage, na.rm = TRUE),
Total_Economic_Impact = sum(Total_Damage, na.rm = TRUE)
) %>%
arrange(desc(Total_Economic_Impact)) %>%
head(10)
# Display the top 10 events
print(economic_data)
## # A tibble: 10 × 4
## EVTYPE Total_Property_Damage Total_Crop_Damage Total_Economic_Impact
## <chr> <dbl> <dbl> <dbl>
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPH… 69305840000 2607872800 71913712800
## 3 TORNADO 56937162900 414954710 57352117610
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732269934 3025954653 18758224587
## 6 FLASH FLOOD 16140815218 1421317100 17562132318
## 7 DROUGHT 1046106000 13972566000 15018672000
## 8 HURRICANE 11868319010 2741910000 14610229010
## 9 RIVER FLOOD 5118945500 5029459000 10148404500
## 10 ICE STORM 3944928310 5022113500 8967041810
The following figure shows the top 10 weather event types that caused the most combined fatalities and injuries across the United States.
# Create bar plot for health impact
ggplot(health_data, aes(x = reorder(EVTYPE, Total_Health_Impact),
y = Total_Health_Impact)) +
geom_bar(stat = "identity", fill = "steelblue") +
coord_flip() +
labs(title = "Top 10 Weather Events Most Harmful to Population Health",
x = "Event Type",
y = "Total Health Impact (Fatalities + Injuries)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
Finding: Tornadoes are by far the most harmful weather event to population health, causing over 90,000 combined fatalities and injuries. This is followed by excessive heat and thunderstorm winds.
The following figure shows the top 10 weather event types that caused the most economic damage (property and crop damage combined).
# Create bar plot for economic impact
ggplot(economic_data, aes(x = reorder(EVTYPE, Total_Economic_Impact),
y = Total_Economic_Impact/1000000000)) +
geom_bar(stat = "identity", fill = "darkred") +
coord_flip() +
labs(title = "Top 10 Weather Events with Greatest Economic Consequences",
x = "Event Type",
y = "Total Economic Damage (Billions of USD)") +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
Finding: Floods cause the greatest economic damage, exceeding $150 billion in total property and crop losses. Hurricanes/typhoons and tornadoes also cause significant economic damage.
This analysis identified tornadoes as the most harmful weather event for population health and floods as the event with the greatest economic consequences. Municipal and government managers should prioritize preparedness and response resources for these event types.