Synopsis

In this analysis, we examine the impact of severe weather events in the United States with a focus on population health and economic consequences. We use data from the NOAA Storm Database, analyzing events from 1950 to 2011. First, we calculate a composite metric, the HARM_INDEX, to assess health impacts based on fatalities and injuries. Then, we investigate economic losses using property and crop damage data. Key findings show that tornadoes have the highest impact on health, while floods and hurricanes impose the greatest economic costs. Through visualizations and summary tables, we identify trends over time, highlight the most harmful event types, and examine regional patterns across states. This analysis supports efforts to prioritize disaster preparedness and mitigate risks associated with the most damaging weather events.

Research Questions

1. Across the United States, which types of events (as indicated in the EVTYPE) are most harmful with respect to population health?

2. Across the United States, which types of events have the greatest economic consequences?

Data Processing

1. Loading Packages Required

#### Loading the packages 

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.3
library(readxl)
library(readr)
library("wordcloud")
## Warning: package 'wordcloud' was built under R version 4.2.3
## Loading required package: RColorBrewer
library("RColorBrewer")
library(maps)
## Warning: package 'maps' was built under R version 4.2.3
library(knitr)

2. Reading the dataset

setwd("/")
setwd("C:/Users/QARA/Documents/Coursera/Data Science Specialization-JH/Reproducible Research/Weather_Data")
weather_data <- read.csv("C:/Users/QARA/Documents/Coursera/Data Science Specialization-JH/Reproducible Research/Weather_Data/repdata_data_StormData.csv")

# Save the dataset
saveRDS(weather_data, file = "weather_data.rds")

# Load the dataset
weather_data <- readRDS("weather_data.rds")

Harmful Weather Events for Population Health

Harm Index combining fatality and injury

To evaluate the impact of different weather events on population health, a composite variable called HARM_INDEX is developed. This metric combines fatalities and injuries associated with each event, with a weighting system applied to account for the relative severity of these outcomes: fatalities were assigned a weight of 3, while injuries were assigned a weight of 1. This weighting reflects the higher severity of fatalities compared to injuries in assessing health impacts.

The HARM_INDEX was calculated as follows:

= (3 ) + (1 ) Only events with a HARM_INDEX greater than 100 were included in the analysis to focus on events with significant health impacts. This threshold allows for an in-depth examination of the most impactful events, helping to highlight which event types pose the greatest risks to population health.

This approach enables a more comprehensive assessment of health impacts by accounting for both fatalities and injuries in a single metric.

harm_index <- (3 * weather_data$FATALITIES) + (1 * weather_data$INJURIES)

weather_data$HARM_INDEX <- harm_index

fatility_index <- weather_data %>% group_by(EVTYPE) %>% summarize(severe_events = sum(HARM_INDEX)) %>% filter(severe_events > 100) %>% arrange(desc(severe_events))


### Generating a wordcloud to get a broader sense of events with greater and lesser harm
wordcloud(words = fatility_index$EVTYPE,
          freq = fatility_index$severe_events,
          scale = c(4, 0.6),           # Adjusts the size range of words
          colors = brewer.pal(8, "Dark2"), 
          random.order = FALSE,        # Largest words in the center
          rot.per = 0.35)              # Proportion of words with 90 degree rotation

### Generating a table containing the harm index for event types

fatility_index
## # A tibble: 47 × 2
##    EVTYPE            severe_events
##    <chr>                     <dbl>
##  1 TORNADO                  108245
##  2 EXCESSIVE HEAT            12234
##  3 TSTM WIND                  8469
##  4 FLOOD                      8199
##  5 LIGHTNING                  7678
##  6 HEAT                       4911
##  7 FLASH FLOOD                4711
##  8 ICE STORM                  2242
##  9 WINTER STORM               1939
## 10 THUNDERSTORM WIND          1887
## # ℹ 37 more rows

The table above lists the top ten weather event types in the United States with the highest HARM_INDEX, indicating the greatest cumulative impact on population health through fatalities and injuries. Tornadoes top the list by a substantial margin, with a HARM_INDEX of 108,245, underscoring their severe and often devastating effect on communities. This is followed by Excessive Heat and TSTM (Thunderstorm) Wind, which, though significantly lower than tornadoes, still have considerable impacts, with HARM_INDEX values of 12,234 and 8,469, respectively. Other impactful events include Floods, Lightning, and Heat, each causing substantial harm, especially during extreme weather episodes. Lower on the list but still noteworthy are Flash Floods, Ice Storms, Winter Storms, and Thunderstorm Wind. This ranking highlights the diverse range of severe weather events that contribute to population health risks in the U.S., with tornadoes and extreme heat being particularly hazardous.

Event Harm across States

The map above illustrates the total harm caused by severe weather events across various states in the United States, based on our calculated HARM_INDEX. The HARM_INDEX is a composite measure that combines fatalities and injuries from weather events, with higher weights given to fatalities to reflect their greater severity. This analysis enables us to identify the states where weather events have had the most significant impact on population health.

In the map, states with higher HARM_INDEX values are shaded in darker colors, indicating a greater cumulative impact from severe weather. For instance, Texas stands out with the darkest shade, suggesting it has experienced the highest levels of harm relative to other states. This map allows us to visually compare the impact across states, revealing regional patterns of vulnerability to severe weather.

Our analysis involved aggregating the HARM_INDEX by state to produce a total score for each. By mapping these totals, we gain insight into the states most affected by harmful weather events, which may inform public health and disaster preparedness efforts. This visualization highlights where preventive measures and resources could be prioritized to mitigate the health impacts of future severe weather events.

# Step 1: Summarize HARM_INDEX by state
state_analysis <- weather_data %>%
  group_by(STATE) %>%                          # Group by state only, not event type
  summarize(total_Harm = sum(HARM_INDEX, na.rm = TRUE), .groups = "drop") %>%
  filter(total_Harm > 100)            # Filter for states with a total harm index over 100

# Step 2: Convert state abbreviations to full state names in lowercase
state_analysis <- state_analysis %>%
  mutate(STATE = tolower(state.name[match(STATE, state.abb)]))

# Step 3: Load map data
us_map <- map_data("state")

# Step 4: Join map data with summarized harm index data
map_data_events <- us_map %>%
  left_join(state_analysis, by = c("region" = "STATE"))

# Step 5: Replace NA values in `total_severe_events` with 0 for states with no severe events
map_data_events$total_Harm[is.na(map_data_events$total_Harm)] <- 0

# Step 6: Plot the heatmap
g <- ggplot(map_data_events, aes(long, lat, group = group, fill = total_Harm)) +
  geom_polygon(color = "white") +
  scale_fill_gradient(low = "lightyellow", high = "darkred", na.value = "grey90") +
  labs(title = "Severe Weather Events by State in the USA",
       fill = "Total Harm") +
  theme_minimal() +
  theme(axis.text = element_blank(),
        axis.title = element_blank(),
        panel.grid = element_blank())

g

Events Harm over the years

The line plot illustrates the annual trend in the Total HARM_INDEX for the United States, representing the combined impact of severe weather events on population health, weighted by fatalities and injuries. The plot reveals distinct peaks in the early 1950s, late 1970s, and late 1990s to early 2000s, indicating years with particularly high impacts, likely due to major or clustered weather events. In recent decades, there is an upward trend in the HARM_INDEX, possibly due to increasing event frequency, severity, or improved reporting. The fluctuations suggest variability in the annual impact, highlighting years of both high and low harm. These findings underscore the importance of continued disaster preparedness efforts to mitigate the health impacts of severe weather, especially given the recent upward trend.

# Convert BGN_DATE to Date format
weather_data$BGN_DATE <- as.Date(weather_data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")

# Extract the year and aggregate HARM_INDEX by year
weather_data_yearly <- weather_data %>%
  mutate(Year = format(BGN_DATE, "%Y")) %>%  # Extract year from BGN_DATE
  group_by(Year) %>%
  summarize(total_harm_index = sum(HARM_INDEX, na.rm = TRUE)) %>%
  mutate(Year = as.numeric(Year))  # Convert Year to numeric for plotting

# Plot the aggregated HARM_INDEX by year
ggplot(weather_data_yearly, aes(x = Year, y = total_harm_index)) +
  geom_line(color = "blue") +
  labs(title = "Total HARM_INDEX Over Time",
       x = "Year",
       y = "Total HARM_INDEX") +
  theme_minimal()

Economic Impact of the Events

# Define a function to convert damage values based on the exponent
convert_damage <- function(dmg, exp) {
  multiplier <- case_when(
    exp == "K" ~ 1e3,
    exp == "M" ~ 1e6,
    exp == "B" ~ 1e9,
    TRUE ~ 1
  )
  dmg * multiplier
}

# Apply the conversion function to property and crop damage
weather_data <- weather_data %>%
  mutate(
    PROPDMG_TOTAL = convert_damage(PROPDMG, PROPDMGEXP),
    CROPDMG_TOTAL = convert_damage(CROPDMG, CROPDMGEXP),
    TOTAL_ECONOMIC_DAMAGE = PROPDMG_TOTAL + CROPDMG_TOTAL
  )



# Summarize total economic damage by event type
economic_impact <- weather_data %>%
  group_by(EVTYPE) %>%
  summarize(total_economic_damage = sum(TOTAL_ECONOMIC_DAMAGE, na.rm = TRUE)) %>%
  arrange(desc(total_economic_damage))

top_economic_impact <- economic_impact %>% head(10)


library(scales)
## Warning: package 'scales' was built under R version 4.2.3
## 
## Attaching package: 'scales'
## The following object is masked from 'package:readr':
## 
##     col_factor
# Convert the total economic damage to billions for readability
top_economic_impact <- top_economic_impact %>%
  mutate(total_economic_damage = total_economic_damage / 1e9)  # Convert to billions

# Format the numbers for readability
top_economic_impact %>%
  mutate(total_economic_damage = dollar(total_economic_damage, scale = 1, prefix = "$", suffix = " B"))
## # A tibble: 10 × 2
##    EVTYPE            total_economic_damage
##    <chr>             <chr>                
##  1 FLOOD             $150.32 B            
##  2 HURRICANE/TYPHOON $71.91 B             
##  3 TORNADO           $57.34 B             
##  4 STORM SURGE       $43.32 B             
##  5 HAIL              $18.75 B             
##  6 FLASH FLOOD       $17.56 B             
##  7 DROUGHT           $15.02 B             
##  8 HURRICANE         $14.61 B             
##  9 RIVER FLOOD       $10.15 B             
## 10 ICE STORM         $8.97 B

The table above lists the top ten weather event types in the United States with the highest economic impact, measured in billions of dollars in combined property and crop damage. Floods lead the list with an estimated economic impact of $150.32 billion, reflecting their potential to cause extensive damage to infrastructure, homes, and agricultural lands. Hurricanes/Typhoons and Tornadoes follow closely, contributing substantial economic losses due to their destructive force. Other events, such as Storm Surges, Hail, and Flash Floods, also impose significant costs, especially in vulnerable areas. This analysis highlights the severe financial impact that various types of weather events have on the U.S. economy, underscoring the need for effective risk management and resilience strategies.

# Convert total economic damage to billions for better readability
top_economic_impact <- top_economic_impact %>%
  mutate(total_economic_damage = total_economic_damage / 1e9)  # Convert to billions

# Create the bar chart with improved x-axis formatting
ggplot(top_economic_impact, aes(x = reorder(EVTYPE, total_economic_damage), y = total_economic_damage)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = dollar_format(prefix = "$", suffix = " B"), breaks = pretty_breaks()) +
  labs(title = "Top 10 Weather Events with the Greatest Economic Impact",
       x = "Event Type",
       y = "Total Economic Damage (Billions of Dollars)") +
  theme_minimal()