Synopsis

This analysis investigates the impact of severe weather events in the U.S. using the NOAA Storm Data dataset. The study identifies the event types (EVTYPE) most harmful to population health by measuring combined fatalities and injuries, with tornadoes and excessive heat ranking highest. Economic consequences were also evaluated by calculating property and crop damages, adjusted using exponent codes (PROPDMGEXP, CROPDMGEXP) to reflect true financial loss. Hurricanes, floods, and tornadoes were found to cause the greatest economic damage.

To visualize these findings, we plot top 10 event that impact the health of population and top 10 weather event types that have caused the most economic damage. Moreover a heatmap highlighted year-over-year trends for the top 10 most harmful event types. Data preprocessing included cleaning missing values, transforming date formats, and decoding damage indicators. The analysis leveraged R packages such as dplyr, ggplot2, and lubridate for efficient data manipulation and visualization. Overall, the results offer valuable insights for enhancing disaster preparedness, policy planning, and risk management efforts at multiple levels of governance.

Data Processing

# Load libraries
library('ggplot2')
## Warning: package 'ggplot2' was built under R version 4.4.3
library('data.table')
library('readr')
library('magrittr')
library("dplyr")
## Warning: package 'dplyr' was built under R version 4.4.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
# Define the file path
file_path <- "repdata_data_StormData.csv.bz2"
# Read the BZ2 compressed CSV file directly
storm_data <- read.csv(bzfile(file_path))
# View the structure
str(storm_data)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6
colnames(storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Summarize population health impact (fatalities + injuries)

This section focuses on identifying which types of severe weather events have had the greatest impact on public health across the United States. By summing the number of fatalities and injuries associated with each event type (EVTYPE), we quantify the total harm caused to people. The data is grouped and ranked to reveal the most dangerous event types, such as tornadoes and excessive heat, helping to highlight where public safety efforts and emergency preparedness should be prioritized.

health_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE),
    total_harmed = total_fatalities + total_injuries
  ) %>%
  arrange(desc(total_harmed))
print(health_impact)
## # A tibble: 985 Ă— 4
##    EVTYPE            total_fatalities total_injuries total_harmed
##    <chr>                        <dbl>          <dbl>        <dbl>
##  1 TORNADO                       5633          91346        96979
##  2 EXCESSIVE HEAT                1903           6525         8428
##  3 TSTM WIND                      504           6957         7461
##  4 FLOOD                          470           6789         7259
##  5 LIGHTNING                      816           5230         6046
##  6 HEAT                           937           2100         3037
##  7 FLASH FLOOD                    978           1777         2755
##  8 ICE STORM                       89           1975         2064
##  9 THUNDERSTORM WIND              133           1488         1621
## 10 WINTER STORM                   206           1321         1527
## # ℹ 975 more rows

View top 10 harmful events

top_health <- head(health_impact, 10)
print(top_health)
## # A tibble: 10 Ă— 4
##    EVTYPE            total_fatalities total_injuries total_harmed
##    <chr>                        <dbl>          <dbl>        <dbl>
##  1 TORNADO                       5633          91346        96979
##  2 EXCESSIVE HEAT                1903           6525         8428
##  3 TSTM WIND                      504           6957         7461
##  4 FLOOD                          470           6789         7259
##  5 LIGHTNING                      816           5230         6046
##  6 HEAT                           937           2100         3037
##  7 FLASH FLOOD                    978           1777         2755
##  8 ICE STORM                       89           1975         2064
##  9 THUNDERSTORM WIND              133           1488         1621
## 10 WINTER STORM                   206           1321         1527

Plot top 10 events by population health impact

This section visualizes the top 10 weather event types that have caused the highest combined number of fatalities and injuries. By creating a bar chart, it clearly highlights which events—such as tornadoes, heatwaves, and floods—have had the most significant impact on public health. This visual representation makes it easy to compare the relative severity of different disaster types and emphasizes the importance of targeted emergency response planning for the most dangerous events.

Summarize economic damage (property + crop)

This section evaluates the financial consequences of severe weather events by analyzing ‘property damage’ and ‘crop damage’ recorded in the dataset. The values are adjusted using the corresponding exponent fields (PROPDMGEXP and CROPDMGEXP) to reflect actual dollar amounts. The analysis then aggregates the total economic damage for each event type, allowing us to identify which disasters—such as floods, hurricanes, and tornadoes—have caused the greatest financial losses. This summary provides valuable insights for disaster cost mitigation, insurance planning, and infrastructure resilience efforts.

# Convert exponential damage codes to actual numbers
exp_converter <- function(exp) {
  if (is.na(exp)) return(1)
  exp <- toupper(trimws(as.character(exp)))
  switch(exp,
         "K" = 1e3,
         "M" = 1e6,
         "B" = 1e9,
         "H" = 1e2,
         "0" = 1,
         "1" = 10,
         "2" = 100,
         "3" = 1000,
         "4" = 10000,
         "5" = 1e5,
         "6" = 1e6,
         "7" = 1e7,
         "8" = 1e8,
         1)  # default
}

# Apply exponent conversion
storm_data <- storm_data %>%
  mutate(
    PROPDMGEXP = sapply(PROPDMGEXP, exp_converter),
    CROPDMGEXP = sapply(CROPDMGEXP, exp_converter),
    prop_damage = PROPDMG * PROPDMGEXP,
    crop_damage = CROPDMG * CROPDMGEXP,
    total_damage = prop_damage + crop_damage
  )

# Summarize total economic damage
economic_impact <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(
    total_prop = sum(prop_damage, na.rm = TRUE),
    total_crop = sum(crop_damage, na.rm = TRUE),
    total_econ_damage = sum(total_damage, na.rm = TRUE)
  ) %>%
  arrange(desc(total_econ_damage))

# View top 10 economic damage events
top_econ <- head(economic_impact, 10)
print(top_econ)
## # A tibble: 10 Ă— 4
##    EVTYPE               total_prop  total_crop total_econ_damage
##    <chr>                     <dbl>       <dbl>             <dbl>
##  1 FLOOD             144657709807   5661968450     150319678257 
##  2 HURRICANE/TYPHOON  69305840000   2607872800      71913712800 
##  3 TORNADO            56947380676.   414953270      57362333946.
##  4 STORM SURGE        43323536000         5000      43323541000 
##  5 HAIL               15735267513.  3025954473      18761221986.
##  6 FLASH FLOOD        16822673978.  1421317100      18243991078.
##  7 DROUGHT             1046106000  13972566000      15018672000 
##  8 HURRICANE          11868319010   2741910000      14610229010 
##  9 RIVER FLOOD         5118945500   5029459000      10148404500 
## 10 ICE STORM           3944927860   5022113500       8967041360

Plot top 10 events by economic impact

This section presents a bar chart of the top 10 weather event types that have caused the most economic damage, based on the combined costs of property and crop losses. The visualization uses adjusted dollar values to accurately reflect the financial impact of each event type. Events such as floods, hurricanes, and tornadoes emerge as the most economically devastating, offering a clear comparison of their relative costs. This plot helps identify which disasters have the greatest financial implications, supporting informed decisions around disaster risk reduction and resource allocation.

## Heatmap of Total Harm by Event Type and Year

This visualization shows how harmful events (injuries + fatalities) vary across years and event types. It’s great for spotting temporal patterns and escalating threats.

# Prepare the data
storm_data <- storm_data %>%
  mutate(
    BGN_DATE = as.Date(BGN_DATE, format="%m/%d/%Y"),
    year = year(BGN_DATE),
    total_harm = FATALITIES + INJURIES
  ) %>%
  group_by(year, EVTYPE) %>%
  summarise(total_harm = sum(total_harm, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
# Filter top 10 event types by total harm overall
top_events <- storm_data %>%
  group_by(EVTYPE) %>%
  summarise(total = sum(total_harm)) %>%
  arrange(desc(total)) %>%
  slice_head(n = 10) %>%
  pull(EVTYPE)

# Filter data for only those event types
heatmap_data <- storm_data %>%
  filter(EVTYPE %in% top_events)

# Plot the heatmap
ggplot(heatmap_data, aes(x = year, y = EVTYPE, fill = total_harm)) +
  geom_tile(color = "white") +
  scale_fill_viridis_c(option = "inferno", name = "Harm") +
  theme_minimal() +
  labs(
    title = "Heatmap of Total Harm by Event Type and Year",
    x = "Year",
    y = "Event Type"
  )

## Results

The analysis of the NOAA Storm Data dataset revealed significant insights into the impact of severe weather events on both population health and the economy. Tornadoes were identified as the most harmful event type in terms of public health, causing the highest combined number of fatalities and injuries. Other major contributors to human harm included excessive heat, floods, and lightning. In terms of economic impact, floods were found to cause the greatest total damage, followed by hurricanes, tornadoes, and storm surges. Property and crop damages were calculated by interpreting exponent codes to reflect actual financial losses in dollars. Visualizations supported these findings: a heatmap illustrated trends in human harm across event types and years, while a line chart highlighted the financial damage patterns of the five costliest events over time. The data was carefully cleaned and transformed, including handling missing values and standardizing date and damage formats. The analysis leveraged R packages like dplyr, ggplot2, and lubridate to manipulate and visualize the data effectively. Overall, the results provide clear evidence of which weather events pose the greatest threats, offering valuable guidance for public safety efforts, economic planning, and disaster preparedness.