This analysis explores the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database to identify which severe weather events pose the greatest risks to public health and the economy. The dataset spans from 1950 to November 2011 and includes records of fatalities, injuries, property damage, and crop damage across the United States
Data Preparation:
# Load the data
download.file(
"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "repdata_data_StormData.csv.bz2"
)
storm_data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
I will now extract the most harmful events in the data considering fatalities and injuries. I will add a column to the dataset called total_health to add both impacts. I will order the result by this new column in descent order
# Summarize health impact
health_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(
total_fatalities = sum(FATALITIES, na.rm = TRUE),
total_injuries = sum(INJURIES, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(total_health = total_fatalities + total_injuries) %>%
arrange(desc(total_health))
head(health_impact,5)
## # A tibble: 5 × 4
## EVTYPE total_fatalities total_injuries total_health
## <chr> <dbl> <dbl> <dbl>
## 1 TORNADO 5633 91346 96979
## 2 EXCESSIVE HEAT 1903 6525 8428
## 3 TSTM WIND 504 6957 7461
## 4 FLOOD 470 6789 7259
## 5 LIGHTNING 816 5230 6046
Now I move to the second question. First we need to identify how we extract the economic data from the dataset. There are two main conisderations to be done: 1 - Property damage (PROPDMG and PROPDMGEXP) and 2 - The *EXP column contains the exponents of the PROPDMG data been K = thouthands, M = millions and B = Billions.
In order to manage this data complexity, I created the helper function below
exp_to_mul <- function(exp) {
exp <- toupper(as.character(exp))
ifelse(exp == "K", 1e3,
ifelse(exp == "M", 1e6,
ifelse(exp == "B", 1e9, 1)))
}
I am going to create a new column in the dataset to calculate the total damage per event (row)
storm_data <- storm_data %>%
mutate(
prop_dmg = PROPDMG * exp_to_mul(PROPDMGEXP),
crop_dmg = CROPDMG * exp_to_mul(CROPDMGEXP),
total_dmg = prop_dmg + crop_dmg
)
I could now aggregate the data by even type considering the new column containing the total property damage per event
economic_impact <- storm_data %>%
group_by(EVTYPE) %>%
summarise(total_economic_dmg = sum(total_dmg, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(total_economic_dmg))
head(economic_impact, 10)
## # A tibble: 10 × 2
## EVTYPE total_economic_dmg
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 HURRICANE/TYPHOON 71913712800
## 3 TORNADO 57352114049.
## 4 STORM SURGE 43323541000
## 5 HAIL 18758221521.
## 6 FLASH FLOOD 17562129167.
## 7 DROUGHT 15018672000
## 8 HURRICANE 14610229010
## 9 RIVER FLOOD 10148404500
## 10 ICE STORM 8967041360
Results
Impact on Population Health
Across the United States, tornadoes are by far the most harmful weather event to population health.
Figure 1 shows the top 5 event types ranked by combined fatalities and injuries. Tornadoes clearly stand out as the dominant hazard, with an order of magnitude greater impact than most other events.
Figure 1. Total fatalities and injuries for the ten most harmful weather events in the U.S. between 1950 and 2011.
Economic Consequences
When examining economic damages, floods represent the most costly event type. Hurricanes/typhoons, tornadoes, and storm surges also caused significant financial damage
Figure 2 illustrates the top 10 event types ranked by total economic damage, highlighting floods as the leading cause of financial loss.
Figure 2. Total economic damage (property + crop) for the ten most costly weather events in the U.S. between 1950 and 2011.
Summary
In summary, tornadoes are the most hazardous to human health, while floods and hurricanes dominate in terms of economic consequences. These results highlight the importance of prioritizing preparedness and resource allocation for these specific types of extreme weather events.