Synopsis

This analysis examines the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to evaluate the impacts of severe weather events on population health and economic consequences across the United States. The dataset includes records of major storms and weather events, capturing details such as event types, fatalities, injuries, and property and crop damage. The primary goal is to identify which types of events are most harmful to population health and which have the greatest economic consequences. The analysis focuses on the EVTYPE variable to categorize weather events. For population health, we aggregate fatalities and injuries by event type to determine the most harmful events. For economic consequences, we calculate the total property and crop damage, converting damage values into a consistent numerical format. The results reveal that tornadoes are the most harmful to population health, causing the highest number of fatalities and injuries. Floods, however, lead to the greatest economic losses due to extensive property damage. Visualizations, including bar plots, are used to present the top 10 event types for both health and economic impacts. This report is intended for government or municipal managers to help prioritize resources for severe weather preparedness.

Data Processing

Loading Required Packages

Reading the Data

The data is a comma-separated-value (CSV) file that was originally compressed with bzip2. The file has been downloaded and unzipped, and is located on the desktop. We’ll load the raw CSV file into R for analysis.

# Set working directory to the desktop (Please adjust path based on your system)
setwd("~/Desktop/")
# Read the uncompressed CSV file
storm_data <- read_csv("repdata_data_StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(storm_data)
## # A tibble: 6 × 37
##   STATE__ BGN_DATE   BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE
##     <dbl> <chr>      <chr>    <chr>      <dbl> <chr>      <chr> <chr>      <dbl>
## 1       1 4/18/1950… 0130     CST           97 MOBILE     AL    TORNA…         0
## 2       1 4/18/1950… 0145     CST            3 BALDWIN    AL    TORNA…         0
## 3       1 2/20/1951… 1600     CST           57 FAYETTE    AL    TORNA…         0
## 4       1 6/8/1951 … 0900     CST           89 MADISON    AL    TORNA…         0
## 5       1 11/15/195… 1500     CST           43 CULLMAN    AL    TORNA…         0
## 6       1 11/15/195… 2000     CST           77 LAUDERDALE AL    TORNA…         0
## # ℹ 28 more variables: BGN_AZI <chr>, BGN_LOCATI <chr>, END_DATE <chr>,
## #   END_TIME <chr>, COUNTY_END <dbl>, COUNTYENDN <lgl>, END_RANGE <dbl>,
## #   END_AZI <chr>, END_LOCATI <chr>, LENGTH <dbl>, WIDTH <dbl>, F <dbl>,
## #   MAG <dbl>, FATALITIES <dbl>, INJURIES <dbl>, PROPDMG <dbl>,
## #   PROPDMGEXP <chr>, CROPDMG <dbl>, CROPDMGEXP <chr>, WFO <chr>,
## #   STATEOFFIC <chr>, ZONENAMES <chr>, LATITUDE <dbl>, LONGITUDE <dbl>,
## #   LATITUDE_E <dbl>, LONGITUDE_ <dbl>, REMARKS <chr>, REFNUM <dbl>

Cleaning and Processing the Data

The NOAA storm database contains inconsistencies in event types and damage values. We’ll clean the data to make it suitable for analysis. # Standardize Event Types The EVTYPE column contains the type of weather event, but there are many variations of the same event (e.g., “HIGH WIND” and ” HIGH WINDS”). We’ll simplify by converting all event types to uppercase and removing extra spaces. # Convert event types to uppercase and trim whitespace

storm_data <- storm_data %>%
mutate(EVTYPE = toupper(trimws(EVTYPE)))

Results

Question 1: Which Types of Events Are Most Harmful to Population Health?

To assess the impact on population health, we aggregate the total fatalities and injuries by event type (EVTYPE). We then identify the top 10 event types with the highest combined impact (fatalities + injuries)

# Aggregate fatalities and injuries by event type
health_impact <- storm_data %>% group_by(EVTYPE) %>% summarize(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE),
    total_health_impact = total_fatalities + total_injuries) %>%
  arrange(desc(total_health_impact))

# Display the top 10 event types
head(health_impact, 10)
## # A tibble: 10 × 4
##    EVTYPE            total_fatalities total_injuries total_health_impact
##    <chr>                        <dbl>          <dbl>               <dbl>
##  1 TORNADO                       5633          91346               96979
##  2 EXCESSIVE HEAT                1903           6525                8428
##  3 TSTM WIND                      504           6957                7461
##  4 FLOOD                          470           6789                7259
##  5 LIGHTNING                      816           5230                6046
##  6 HEAT                           937           2100                3037
##  7 FLASH FLOOD                    978           1777                2755
##  8 ICE STORM                       89           1975                2064
##  9 THUNDERSTORM WIND              133           1488                1621
## 10 WINTER STORM                   206           1321                1527

Visualization of Health Impact

We’ll create a bar plot to visualize the top 10 event types by total health impact.

ggplot(head(health_impact, 10), aes(x = reorder(EVTYPE, -total_health_impact), y = total_health_impact)) +
  geom_bar(stat = "identity", fill = "blue") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs( title = "Top 10 Weather Events by Population Health Impact",
    x = "Event Type",
    y = "Total Fatalities + Injuries" )

## The plot shows that tornadoes are the most harmful to population health, causing the highest number of fatalities and injuries combined.

Question 2: Which Types of Events Have the Greatest Economic Consequences?

To evaluate economic consequences, we aggregate the total property and crop damage by event type. But,Firstly, we process the data of economic damage: the property and crop damage values are stored in two columns each: PROPDMG and PROPDMGEXP for property damage, and CROPDMG and CROPDMGEXP for crop damage. The EXP columns contain multipliers (e.g., “K” for thousands, “M” for millions, “B” for billions). We’ll convert these into numeric values.

# Function to convert damage values with multipliers
convert_damage <- function(dmg, exp) {
# Replace empty or invalid multipliers with 0
  exp <- toupper(exp)
  exp <- ifelse(is.na(exp) | exp == "", "0", exp)
  
# Define multipliers
  multipliers <- c("0" = 1, "K" = 1e3, "M" = 1e6, "B" = 1e9)
  
# Convert damage to numeric
  dmg_numeric <- dmg * multipliers[exp]
  dmg_numeric[is.na(dmg_numeric)] <- 0  
  
# Replace NAs with 0
  return(dmg_numeric)
}

# Apply the function to property and crop damage
storm_data <- storm_data %>%
  mutate(
    prop_dmg_numeric = convert_damage(PROPDMG, PROPDMGEXP),
    crop_dmg_numeric = convert_damage(CROPDMG, CROPDMGEXP),
    total_dmg = prop_dmg_numeric + crop_dmg_numeric)

We use the total_dmg column created earlier, which combines property and crop damage in dollars.

# Aggregate total damage by event type
economic_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarize(total_damage = sum(total_dmg, na.rm = TRUE)) %>%
  arrange(desc(total_damage))
head(economic_impact, 10)
## # A tibble: 10 × 2
##    EVTYPE             total_damage
##    <chr>                     <dbl>
##  1 FLOOD             150319678257 
##  2 HURRICANE/TYPHOON  71913712800 
##  3 TORNADO            57352113886.
##  4 STORM SURGE        43323541000 
##  5 HAIL               18758221486.
##  6 FLASH FLOOD        17562179078.
##  7 DROUGHT            15018672000 
##  8 HURRICANE          14610229010 
##  9 RIVER FLOOD        10148404500 
## 10 ICE STORM           8967041360

Visualization of Economic Impact

We’ll create a bar plot to visualize the top 10 event types by total economic damage.

ggplot(head(economic_impact, 10), aes(x = reorder(EVTYPE, -total_damage), y = total_damage / 1e9)) +
  geom_bar(stat = "identity", fill = "orange") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title = "Top 10 Weather Events by Economic Damage",
    x = "Event Type",
    y = "Total Damage (Billions of USD)")

## The plot indicates that floods cause the greatest economic damage, primarily due to extensive property damage.