SYNOPSIS

This analysis explores the impact of storms and other severe weather events on public health and economy for communities and municipalities in the US. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Data are taken from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, which spans from 1950 to 2011, documenting the characteristics of major storms and weather phenomena.

As regard public health issues, the total number of fatalities and injuries grouped by event type was analysed, identifying Tornadoes as the main event, while Floods were found to be the main cause of overall economic loss and Drought the primary driver of crop damage.

This report details the data cleaning process, including the consolidation of event types and the conversion of damage estimates into a uniform numeric format.

DATA PROCESSING

After downloading the CSV file and loading it into R, only the columns necessary for health and economic impact analysis were selected to optimize memory usage.

library(dplyr)
library(ggplot2)
library(tidyr)

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists("storm_data.csv.bz2")) {
    download.file(url, "storm_data.csv.bz2")
}
df <- read.csv("storm_data.csv.bz2")

# EVTYPE column for event type
# FATALITIES/INJURIES column for Health impact
# PROPDMG/PROPDMGEXP/CROPDMG/CROPDMGEXP for Economic impact
df2 <- df |> select(c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

To calculate the economic impact:

per <- function(exp) {
  exp <- toupper(exp)
  if (exp == "K") return(1000)
  if (exp == "M") return(1000000)
  if (exp == "B") return(1000000000)
  return(1) # Default for empty or unknown codes
}

df2$PROPTOTAL <- df2$PROPDMG * sapply(df2$PROPDMGEXP, per)
df2$CROPTOTAL <- df2$CROPDMG * sapply(df2$CROPDMGEXP, per)
df2$ECONOMICTOTAL <- df2$PROPTOTAL + df2$CROPTOTAL

Data were aggregated by event type (clean names were trimmed from whitespaces and converted to uppercase)

# Aggregate health data
HEALTHIMPACT <- df2 %>%
  group_by(EVTYPE = toupper(trimws(EVTYPE))) |>
  summarise(TOTALFATALITIES = sum(FATALITIES), 
            TOTALINJURIES = sum(INJURIES),
            TOTALHEALTH = sum(FATALITIES + INJURIES)) |>
  arrange(desc(TOTALHEALTH))

# Aggregate economic data
ECONIMPACT <- df2 |>
  group_by(EVTYPE = toupper(trimws(EVTYPE))) %>%
  summarise(TOTALDAMAGE = sum(ECONOMICTOTAL)) %>%
  arrange(desc(TOTALDAMAGE))

RESULTS

IMPACT ON POPULATION HEALTH

According to our analysis, tornadoes represent the most dangerous weather event for public health in the US (including both deaths and injuries), followed by excessive heat.

TOPHEALTH <- head(HEALTHIMPACT, 5)
TOPHEALTH[1,]
## # A tibble: 1 × 4
##   EVTYPE  TOTALFATALITIES TOTALINJURIES TOTALHEALTH
##   <chr>             <dbl>         <dbl>       <dbl>
## 1 TORNADO            5633         91346       96979
ggplot(TOPHEALTH, aes(x = reorder(EVTYPE, TOTALHEALTH), y = TOTALHEALTH)) +
  geom_bar(stat = "identity", fill = "grey") +
  coord_flip() +
  labs(title = "Top 10 Most Harmful Weather Events (Health)",
       x = "Event Type", y = "Fatalities + Injuries")

IMPACT ON ECONOMY

According to our analysis, floods represent the weather event with the greatest impoact on economy in the US, followed by hurricane/typhoon.

TOPECON <- head(ECONIMPACT, 5)
TOPECON[1,]
## # A tibble: 1 × 2
##   EVTYPE  TOTALDAMAGE
##   <chr>         <dbl>
## 1 FLOOD  150319678257
ggplot(TOPECON, aes(x = reorder(EVTYPE, TOTALDAMAGE), y = TOTALDAMAGE/1e9)) +
  geom_bar(stat = "identity", fill = "grey") +
  coord_flip() +
  labs(title = "Top 10 Weather Events with Greatest Economic Impact",
       x = "Event Type", y = "Total Damage (Billions of USD)")

CONCLUSIONS

After analysis, such conclusions can be drawn: