Synopsis

This analysis explores the NOAA storm database to identify the most significant weather events in the United States between 1950 and 2011. The objective is to determine which types of events are most harmful to population health and which have the greatest economic consequences. The analysis processes the raw data to quantify the damages, aggregating fatalities and injuries to assess health impact, and calculating property and crop damage costs to assess economic impact. The results indicate that tornadoes are by far the most harmful event to public health. Regarding economic consequences, floods cause the greatest total damage, followed by hurricanes and tornadoes.

Data Processing

In this section, the raw data from the NOAA storm database is loaded and processed to prepare it for analysis.

Data Loading

This script assumes that the repdata_data_StormData.csv.bz2 file has been manually downloaded and is located in the working directory. The loading process can be slow, so the cache=TRUE option is used to avoid reloading the data on subsequent runs of knit.

# Read the file repdata_data_StormData.csv.bz2 using the read.csv function and save it to a vector called "storm_data"
storm_data <- read.csv("repdata_data_StormData.csv.bz2")

Data Cleaning and Transformation

For the analysis, we only need a subset of the columns: EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. I will select only the columns needed and save as a new database called “relevant_data”. Additionally, the economic damage columns (PROPDMGEXP and CROPDMGEXP) need to be converted to numeric values with a function called “get_multiplier” that translates the letters (K, M, B) into their numeric values (1,000, 1,000,000 and, 1,000,000,000, respectively). Then, I summarize the total economic cost in a new variable called “TotalDamage”.

# Select only the relevant columns for the analysis
relevant_data <- storm_data %>%
  select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

# Function to convert exponents (K, M, B) into numeric multipliers
get_multiplier <- function(exp) {
  exp <- toupper(exp)
  ifelse(exp == "K", 1e3,
         ifelse(exp == "M", 1e6,
                ifelse(exp == "B", 1e9, 1)))
}

# Calculate the total cost of economic damage
# A new column 'TotalDamage' is created, summing property and crop damage
transformed_data <- relevant_data %>%
  mutate(
    PropDamage = PROPDMG * get_multiplier(PROPDMGEXP),
    CropDamage = CROPDMG * get_multiplier(CROPDMGEXP),
    TotalDamage = PropDamage + CropDamage
  )

Results

In this section, the results of the analysis are presented to answer the two main questions: 1. Which types of events are most harmful with respect to population health? 2. Which types of events have the greatest economic consequences?

1. Which events are most harmful to population health?

To determine the impact on health, we aggregate the total number of fatalities and injuries for each event type (EVTYPE) creating a new object called “health_impact” which contains two columns: TotalFatalities and TotalInjuries, removing the missing data. This will get the top 10 events for fatalities and injuries, which are ploted thereafter with ggplot function.

health_impact <- transformed_data %>%
  group_by(EVTYPE) %>%
  summarise(
    TotalFatalities = sum(FATALITIES, na.rm = TRUE),
    TotalInjuries = sum(INJURIES, na.rm = TRUE)
  )

# Get the top 10 events for fatalities and injuries
top_fatalities <- health_impact %>%
  arrange(desc(TotalFatalities)) %>%
  top_n(10, TotalFatalities)

top_injuries <- health_impact %>%
  arrange(desc(TotalInjuries)) %>%
  top_n(10, TotalInjuries)

The following plot shows the 10 most harmful event types in terms of fatalities and injuries.

# Plot for fatalities
plot_fatalities <- ggplot(top_fatalities, aes(x = reorder(EVTYPE, TotalFatalities), y = TotalFatalities)) +
  geom_bar(stat = "identity", fill = "red4") +
  coord_flip() +
  labs(title = "Top 10 Events by Fatalities", x = "Event Type", y = "Total Number of Fatalities")
print(plot_fatalities)
Figure 1: Top 10 most harmful weather events to population health (1950-2011).

Figure 1: Top 10 most harmful weather events to population health (1950-2011).

# Plot for injuries
plot_injuries <- ggplot(top_injuries, aes(x = reorder(EVTYPE, TotalInjuries), y = TotalInjuries)) +
  geom_bar(stat = "identity", fill = "orangered") +
  coord_flip() +
  labs(title = "Top 10 Events by Injuries", x = "Event Type", y = "Total Number of Injuries")
print(plot_injuries)
Figure 1: Top 10 most harmful weather events to population health (1950-2011).

Figure 1: Top 10 most harmful weather events to population health (1950-2011).

As can be seen in the plots, TORNADOES are by far the leading cause of both fatalities and injuries.

2. Which events have the greatest economic consequences?

To assess the economic impact, we aggregate the total damage (property + crops) for each event type. For this we created a database called “economic impact” from the “transformed_data” database that contains the total cost of the top 10 for each type of event in descending order. The data will transfom to billions.

economic_impact <- transformed_data %>%
  group_by(EVTYPE) %>%
  summarise(TotalDamage = sum(TotalDamage, na.rm = TRUE)) %>%
  arrange(desc(TotalDamage)) %>%
  top_n(10, TotalDamage)

# Convert damage to billions of dollars for easier reading
economic_impact$TotalDamageBillions <- economic_impact$TotalDamage / 1e9

The following plot shows the 10 event types with the highest economic cost.

ggplot(economic_impact, aes(x = reorder(EVTYPE, TotalDamageBillions), y = TotalDamageBillions)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  coord_flip() +
  labs(title = "Top 10 Events by Total Economic Damage",
       x = "Event Type",
       y = "Total Cost of Damage (in billions of $)")
Figure 2: Top 10 weather events with the greatest economic impact (1950-2011).

Figure 2: Top 10 weather events with the greatest economic impact (1950-2011).

In economic terms, FLOODS are the most costly type of event, followed by HURRICANES/TYPHOONS and TORNADOES.