Synopsis

This report explores the NOAA Storm Database to analyze the effects of severe weather events in the United States. The data cover the period from 1950 to 2011 and include information on the type of event, as well as its impact on population health and economic damage. The main objective of this analysis is to identify which event types are most harmful with respect to fatalities and injuries, and which events have the greatest economic consequences. The dataset was processed by selecting relevant variables and transforming economic damage data into consistent units. The analysis shows that tornadoes are the most harmful to population health, while floods and hurricanes cause the greatest economic losses. The results highlight the importance of prioritizing resources for events with the most severe consequences.

Data Processing

The analysis uses the NOAA Storm Database, provided as a compressed CSV file. The data were loaded directly into R from the raw file without external preprocessing. For the purpose of this report, we selected only the variables relevant to population health (fatalities, injuries) and economic consequences (property damage, crop damage, and their exponents).

Load the dataset

storm_data <- read.csv("repdata_data_StormData.csv.bz2")
dim(storm_data)
## [1] 902297     37
head(storm_data)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6

The NOAA Storm Database contains information on severe weather events in the United States between 1950 and 2011.
After loading, the dataset has 902297 rows and 37 columns.
For this analysis, we kept only the variables related to event type, population health (fatalities and injuries), and economic consequences (property and crop damage with their exponents).
This transformation was necessary to reduce the dataset to the variables relevant to the research questions.

Keep only relevant columns

storm_data <- storm_data %>%
  select(EVTYPE, BGN_DATE, FATALITIES, INJURIES,
         PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP,
         STATE)
head(storm_data)
##    EVTYPE           BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1 TORNADO  4/18/1950 0:00:00          0       15    25.0          K       0
## 2 TORNADO  4/18/1950 0:00:00          0        0     2.5          K       0
## 3 TORNADO  2/20/1951 0:00:00          0        2    25.0          K       0
## 4 TORNADO   6/8/1951 0:00:00          0        2     2.5          K       0
## 5 TORNADO 11/15/1951 0:00:00          0        2     2.5          K       0
## 6 TORNADO 11/15/1951 0:00:00          0        6     2.5          K       0
##   CROPDMGEXP STATE
## 1               AL
## 2               AL
## 3               AL
## 4               AL
## 5               AL
## 6               AL

Count unique event types

ev_freq <- storm_data %>%
  count(EVTYPE, sort = TRUE)

# Show top 50 event types
head(ev_freq, 50)
##                      EVTYPE      n
## 1                      HAIL 288661
## 2                 TSTM WIND 219940
## 3         THUNDERSTORM WIND  82563
## 4                   TORNADO  60652
## 5               FLASH FLOOD  54277
## 6                     FLOOD  25326
## 7        THUNDERSTORM WINDS  20843
## 8                 HIGH WIND  20212
## 9                 LIGHTNING  15754
## 10               HEAVY SNOW  15708
## 11               HEAVY RAIN  11723
## 12             WINTER STORM  11433
## 13           WINTER WEATHER   7026
## 14             FUNNEL CLOUD   6839
## 15         MARINE TSTM WIND   6175
## 16 MARINE THUNDERSTORM WIND   5812
## 17               WATERSPOUT   3796
## 18              STRONG WIND   3566
## 19     URBAN/SML STREAM FLD   3392
## 20                 WILDFIRE   2761
## 21                 BLIZZARD   2719
## 22                  DROUGHT   2488
## 23                ICE STORM   2006
## 24           EXCESSIVE HEAT   1678
## 25               HIGH WINDS   1533
## 26         WILD/FOREST FIRE   1457
## 27             FROST/FREEZE   1342
## 28                DENSE FOG   1293
## 29       WINTER WEATHER/MIX   1104
## 30           TSTM WIND/HAIL   1028
## 31  EXTREME COLD/WIND CHILL   1002
## 32                     HEAT    767
## 33                HIGH SURF    725
## 34           TROPICAL STORM    690
## 35           FLASH FLOODING    682
## 36             EXTREME COLD    655
## 37            COASTAL FLOOD    650
## 38         LAKE-EFFECT SNOW    636
## 39        FLOOD/FLASH FLOOD    624
## 40                LANDSLIDE    600
## 41                     SNOW    587
## 42          COLD/WIND CHILL    539
## 43                      FOG    538
## 44              RIP CURRENT    470
## 45              MARINE HAIL    442
## 46               DUST STORM    427
## 47                AVALANCHE    386
## 48                     WIND    340
## 49             RIP CURRENTS    304
## 50              STORM SURGE    261

Clean EVTYPE

storm_data <- storm_data %>%
  mutate(
    EVTYPE_clean = str_to_lower(EVTYPE),
    EVTYPE_clean = str_replace_all(EVTYPE_clean, "[[:punct:]]", " "),
    EVTYPE_clean = str_squish(EVTYPE_clean)  # remove extra spaces
  )

# Check first rows
head(storm_data$EVTYPE_clean)
## [1] "tornado" "tornado" "tornado" "tornado" "tornado" "tornado"

Map EVTYPE_clean to main categories

pattern_map <- list(
  tornado = c("tornado"),
  thunderstorm_wind = c("thunderstorm wind", "thunderstorm winds", "tstm wind", "tstm"),
  flood = c("flood", "flash flood", "flooding"),
  hail = c("hail"),
  heat = c("heat wave", "extreme heat", "\\bheat\\b"),
  cold = c("extreme cold", "cold wave", "cold/wind chill"),
  hurricane = c("hurricane", "tropical storm", "tropical depression"),
  wind = c("\\bwind\\b", "high wind"),
  lightning = c("lightning"),
  snow = c("snow", "blizzard"),
  ice = c("ice", "sleet"),
  coastal_flood = c("coastal flood", "storm surge", "surge"),
  rip_current = c("rip current")
)

map_evtype <- function(ev) {
  for (name in names(pattern_map)) {
    pats <- pattern_map[[name]]
    pattern <- paste0("(", paste(pats, collapse = "|"), ")")
    if (str_detect(ev, pattern)) return(name)
  }
  return("other")  # group all unmatched events
}

storm_data <- storm_data %>%
  mutate(EVTYPE_simple = vapply(EVTYPE_clean, map_evtype, FUN.VALUE = character(1)))

# Check top 20 simplified event types
top_ev <- storm_data %>%
  count(EVTYPE_simple, sort = TRUE)
head(top_ev, 20)
##        EVTYPE_simple      n
## 1  thunderstorm_wind 336688
## 2               hail 289283
## 3              flood  82726
## 4              other  61390
## 5            tornado  60700
## 6               wind  26607
## 7               snow  20404
## 8          lightning  15765
## 9               heat   2647
## 10               ice   2192
## 11              cold   1662
## 12         hurricane   1045
## 13       rip_current    777
## 14     coastal_flood    411

Prepare economic damage columns

exp_to_num <- function(exp) {
  case_when(
    exp %in% c("H","h") ~ 100,
    exp %in% c("K","k") ~ 1e3,
    exp %in% c("M","m") ~ 1e6,
    exp %in% c("B","b") ~ 1e9,
    TRUE ~ 1
  )
}

storm_data <- storm_data %>%
  mutate(
    PROPDMG_num = PROPDMG * exp_to_num(PROPDMGEXP),
    CROPDMG_num = CROPDMG * exp_to_num(CROPDMGEXP),
    TOTAL_ECONOMIC = PROPDMG_num + CROPDMG_num
  )

Aggregate fatalities and injuries by event type

health_summary <- storm_data %>%
  group_by(EVTYPE_simple) %>%
  summarize(
    total_fatalities = sum(FATALITIES, na.rm = TRUE),
    total_injuries = sum(INJURIES, na.rm = TRUE)
  ) %>%
  arrange(desc(total_fatalities + total_injuries))

# Show top 10 events affecting population health
head(health_summary, 10)
## # A tibble: 10 × 3
##    EVTYPE_simple     total_fatalities total_injuries
##    <chr>                        <dbl>          <dbl>
##  1 tornado                       5661          91407
##  2 heat                          3138           9224
##  3 thunderstorm_wind              728           9503
##  4 flood                         1525           8604
##  5 other                         1267           6638
##  6 lightning                      817           5232
##  7 wind                           538           1931
##  8 ice                             99           2152
##  9 snow                           265           1928
## 10 hurricane                      201           1711

Aggregate economic damage by event type

economic_summary <- storm_data %>%
  group_by(EVTYPE_simple) %>%
  summarize(
    total_economic = sum(TOTAL_ECONOMIC, na.rm = TRUE)
  ) %>%
  arrange(desc(total_economic))

# Show top 10 events causing economic damage
head(economic_summary, 10)
## # A tibble: 10 × 2
##    EVTYPE_simple     total_economic
##    <chr>                      <dbl>
##  1 flood              179909795032.
##  2 hurricane           98682496360 
##  3 tornado             59010559549.
##  4 coastal_flood       47966079000 
##  5 other               39624569341 
##  6 hail                19021507166.
##  7 thunderstorm_wind   11020123044.
##  8 ice                  8984168660 
##  9 wind                 7005879233 
## 10 snow                 1932486802.

Results

Health Impact

The table below shows the top 10 event types with the highest combined fatalities and injuries. The bar plot visualizes the same information, highlighting which events are most harmful to the population:

head(health_summary, 10)
## # A tibble: 10 × 3
##    EVTYPE_simple     total_fatalities total_injuries
##    <chr>                        <dbl>          <dbl>
##  1 tornado                       5661          91407
##  2 heat                          3138           9224
##  3 thunderstorm_wind              728           9503
##  4 flood                         1525           8604
##  5 other                         1267           6638
##  6 lightning                      817           5232
##  7 wind                           538           1931
##  8 ice                             99           2152
##  9 snow                           265           1928
## 10 hurricane                      201           1711
top10_health <- health_summary %>%
  slice_max(total_fatalities + total_injuries, n = 10)

ggplot(top10_health, aes(x = reorder(EVTYPE_simple, total_fatalities + total_injuries),
                         y = total_fatalities + total_injuries)) +
  geom_col(fill = "tomato") +
  coord_flip() +
  labs(
    title = "Top 10 Event Types by Health Impact (Fatalities + Injuries)",
    x = "Event Type",
    y = "Total Health Impact"
  )

Figure 1. Bar plot showing the top 10 weather event types by health impact (fatalities + injuries).

Economic Impact

The table below shows the top 10 event types with the highest total economic damage (property + crop). The bar plot visualizes the same information, highlighting which events have the greatest economic consequences:

head(economic_summary, 10)
## # A tibble: 10 × 2
##    EVTYPE_simple     total_economic
##    <chr>                      <dbl>
##  1 flood              179909795032.
##  2 hurricane           98682496360 
##  3 tornado             59010559549.
##  4 coastal_flood       47966079000 
##  5 other               39624569341 
##  6 hail                19021507166.
##  7 thunderstorm_wind   11020123044.
##  8 ice                  8984168660 
##  9 wind                 7005879233 
## 10 snow                 1932486802.
top10_econ <- economic_summary %>%
  slice_max(total_economic, n = 10)

ggplot(top10_econ, aes(x = reorder(EVTYPE_simple, total_economic),
                       y = total_economic)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  scale_y_continuous(labels = scales::dollar_format()) +
  labs(
    title = "Top 10 Event Types by Economic Damage",
    x = "Event Type",
    y = "Total Economic Damage (USD)"
  )

Figure 2. Bar plot showing the top 10 weather event types by total economic damage (property + crops).

Conclusion

The analysis shows that tornadoes are the most harmful weather events to population health, causing the largest number of fatalities and injuries.
In terms of economic consequences, floods and hurricanes/tropical storms are responsible for the greatest financial losses.
These findings suggest that preventive resources and emergency planning should prioritize these high-impact event types.