This report looks at weather data from the National Oceanic and Atmospheric Administration (NOAA). I’m trying to figure out which types of storms or severe weather cause the most harm to people (like injuries or deaths), and which ones cause the most financial damage (like property or crop loss).
The data includes events from 1950 to 2011. I’ll use R to read the raw data file, clean it up, and then create graphs to help answer these questions.
# These packages help us with data, plotting, and reading files
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(readr)
## Warning: package 'readr' was built under R version 4.3.3
# Load the dataset (make sure it's in the same folder as this file)
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
# Take a quick look at the structure of the data
str(storm_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
# Keep only relevant columns
clean_data <- storm_data %>%
select(EVTYPE, FATALITIES, INJURIES,
PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
# Convert exponents for damage columns
exp_map <- c("K" = 1e3, "M" = 1e6, "B" = 1e9,
"k" = 1e3, "m" = 1e6, "b" = 1e9)
# Replace exponent values with numeric multipliers
clean_data <- clean_data %>%
mutate(
prop_mult = exp_map[as.character(PROPDMGEXP)],
crop_mult = exp_map[as.character(CROPDMGEXP)],
prop_mult = ifelse(is.na(prop_mult), 1, prop_mult),
crop_mult = ifelse(is.na(crop_mult), 1, crop_mult),
property_damage = PROPDMG * prop_mult,
crop_damage = CROPDMG * crop_mult
)
# Summarize total fatalities and injuries by event type
health_impact <- clean_data %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
total_impact = total_fatalities + total_injuries
) %>%
arrange(desc(total_impact)) %>%
slice_head(n = 10)
# Plot top 10 most harmful events
ggplot(health_impact, aes(x = reorder(EVTYPE, -total_impact), y = total_impact)) +
geom_bar(stat = "identity", fill = "tomato") +
labs(title = "Top 10 Most Harmful Weather Events to Population Health",
x = "Event Type", y = "Total Fatalities + Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
# Summarize total property + crop damage by event type
economic_impact <- clean_data %>%
group_by(EVTYPE) %>%
summarise(
total_damage = sum(property_damage + crop_damage, na.rm = TRUE)
) %>%
arrange(desc(total_damage)) %>%
slice_head(n = 10)
# Plot top 10 events by economic damage
ggplot(economic_impact, aes(x = reorder(EVTYPE, -total_damage), y = total_damage / 1e9)) +
geom_bar(stat = "identity", fill = "steelblue") +
labs(title = "Top 10 Weather Events by Economic Damage",
x = "Event Type", y = "Total Damage (in Billions USD)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Based on the analysis, tornadoes are by far the most harmful weather event when it comes to public health. The first plot shows that tornadoes have caused the highest number of combined injuries and fatalities, far more than any other type of event. Other harmful events include excessive heat, floods, and lightning.
In terms of economic consequences, floods have caused the most total damage, followed by hurricanes/typhoons, tornadoes, and storm surges. The second plot shows the top 10 events in terms of property and crop damage. These findings can help emergency planners know which types of events have historically had the biggest impact on both people and infrastructure.