1. Synopsis

This analysis explores which NOAA‐recorded weather event types have historically caused the most fatalities and injuries, and which have produced the greatest property/crop damage costs. Data were acquired from Storm Data from NOAA, 1950–Nov 2011. After loading the raw CSV.bz2 file, I clean and aggregate fatalities/injuries by event type, and convert property/crop damage exponents into dollar estimates to find which event types had the highest economic impact. In summary, tornadoes, heat waves, and flash floods appear most detrimental to population health, while hurricanes and related tropical storms generate the highest direct economic damage.

2. Data Processing

2.1 Load required libraries

library(dplyr)       # for data manipulation
library(readr)       # for fast CSV reading
library(lubridate)   # for working with dates (if needed)
library(ggplot2)     # for plots

2.2. Read the Raw CSV data into RStudio

# Read raw data (gzipped/bzipped) directly
raw_data <- read_csv("repdata_data_StormData.csv")
## Rows: 902297 Columns: 37
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (18): BGN_DATE, BGN_TIME, TIME_ZONE, COUNTYNAME, STATE, EVTYPE, BGN_AZI,...
## dbl (18): STATE__, COUNTY, BGN_RANGE, COUNTY_END, END_RANGE, LENGTH, WIDTH, ...
## lgl  (1): COUNTYENDN
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Check basic structure
glimpse(raw_data)
## Rows: 902,297
## Columns: 37
## $ STATE__    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ BGN_DATE   <chr> "4/18/1950 0:00:00", "4/18/1950 0:00:00", "2/20/1951 0:00:0…
## $ BGN_TIME   <chr> "0130", "0145", "1600", "0900", "1500", "2000", "0100", "09…
## $ TIME_ZONE  <chr> "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CST", "CS…
## $ COUNTY     <dbl> 97, 3, 57, 89, 43, 77, 9, 123, 125, 57, 43, 9, 73, 49, 107,…
## $ COUNTYNAME <chr> "MOBILE", "BALDWIN", "FAYETTE", "MADISON", "CULLMAN", "LAUD…
## $ STATE      <chr> "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL",…
## $ EVTYPE     <chr> "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TORNADO", "TOR…
## $ BGN_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ BGN_AZI    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BGN_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_DATE   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_TIME   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ COUNTY_END <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ COUNTYENDN <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_RANGE  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ END_AZI    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ END_LOCATI <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LENGTH     <dbl> 14.0, 2.0, 0.1, 0.0, 0.0, 1.5, 1.5, 0.0, 3.3, 2.3, 1.3, 4.7…
## $ WIDTH      <dbl> 100, 150, 123, 100, 150, 177, 33, 33, 100, 100, 400, 400, 2…
## $ F          <dbl> 3, 2, 2, 2, 2, 2, 2, 1, 3, 3, 1, 1, 3, 3, 3, 4, 1, 1, 1, 1,…
## $ MAG        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ FATALITIES <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 4, 0, 0, 0, 0,…
## $ INJURIES   <dbl> 15, 0, 2, 2, 2, 6, 1, 0, 14, 0, 3, 3, 26, 12, 6, 50, 2, 0, …
## $ PROPDMG    <dbl> 25.0, 2.5, 25.0, 2.5, 2.5, 2.5, 2.5, 2.5, 25.0, 25.0, 2.5, …
## $ PROPDMGEXP <chr> "K", "K", "K", "K", "K", "K", "K", "K", "K", "K", "M", "M",…
## $ CROPDMG    <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ CROPDMGEXP <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ WFO        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ STATEOFFIC <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ ZONENAMES  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ LATITUDE   <dbl> 3040, 3042, 3340, 3458, 3412, 3450, 3405, 3255, 3334, 3336,…
## $ LONGITUDE  <dbl> 8812, 8755, 8742, 8626, 8642, 8748, 8631, 8558, 8740, 8738,…
## $ LATITUDE_E <dbl> 3051, 0, 0, 0, 0, 0, 0, 0, 3336, 3337, 3402, 3404, 0, 3432,…
## $ LONGITUDE_ <dbl> 8806, 0, 0, 0, 0, 0, 0, 0, 8738, 8737, 8644, 8640, 0, 8540,…
## $ REMARKS    <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ REFNUM     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, …

Notes on Important Variables

  • EVTYPE: event type (character).
  • BGN_DATE: beginning date of the event (character).
  • FATALITIES, INJURIES: integer counts.
  • PROPDMG, PROPDMGEXP: numeric magnitude (e.g., 1, 2.5) and exponent (e.g., “K” for thousands, “M” for millions).
  • CROPDMG, CROPDMGEXP: same idea for crop damage.

Before proceeding, verify that PROPDMGEXP and CROPDMGEXP contain only the expected exponent codes (e.g., K, M, B, or blank):

table(raw_data$PROPDMGEXP, useNA = "always")
## 
##      -      ?      +      0      1      2      3      4      5      6      7 
##      1      8      5    216     25     13      4      4     28      4      5 
##      8      B      h      H      K      m      M   <NA> 
##      1     40      1      6 424665      7  11330 465934
table(raw_data$CROPDMGEXP, useNA = "always")
## 
##      ?      0      2      B      k      K      m      M   <NA> 
##      7     19      1      9     21 281832      1   1994 618413

2.3. Convert Damage Exponents to Numeric Multipliers

Create a helper function or mapping table that converts each character exponent (“K”, “M”, “B”, “H”, etc.) into a numeric multiplier (e.g., K→1e3, M→1e6, B→1e9). For any unexpected or blank exponent, treat it as 1 (i.e., no scaling) or 0 if appropriate. A simple approach is:

# Define a named vector for exponent → multiplier
exp_map <- c(
  "K"  = 1e3,
  "k"  = 1e3,
  "M"  = 1e6,
  "m"  = 1e6,
  "B"  = 1e9,
  "b"  = 1e9
)

# If you observe other characters (like “0–8” for tens, hundreds, etc.), 
# you could map them to 10^as.numeric. For simplicity, we assume only K/M/B.

2.4. Create New Columns for Dollar Values

Use mutate() to compute actual property and crop damage in dollars:

storm_clean <- raw_data %>%
  mutate(
    # Ensure exponents are recognized keys in exp_map; replace unknowns with 1
    PROPDMGEXP_clean = if_else(PROPDMGEXP %in% names(exp_map),
                                PROPDMGEXP, 
                                ""),
    CROPDMGEXP_clean = if_else(CROPDMGEXP %in% names(exp_map),
                               CROPDMGEXP, 
                               ""),
    PropDamage = PROPDMG * exp_map[PROPDMGEXP_clean],
    CropDamage = CROPDMG * exp_map[CROPDMGEXP_clean]
  ) %>%
  # Select only columns needed for analysis to save memory:
  select(
    BGN_DATE, EVTYPE, FATALITIES, INJURIES,
    PropDamage, CropDamage
  )

#Check:
summary(storm_clean$PropDamage)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## 0.00e+00 0.00e+00 1.00e+03 9.80e+05 1.00e+04 1.15e+11   466255
summary(storm_clean$CropDamage)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
## 0.00e+00 0.00e+00 0.00e+00 1.73e+05 0.00e+00 5.00e+09   618440

3.Results

3.1. Population Health Impact

health_summary <- storm_clean |>
        group_by(EVTYPE) |> 
        summarize(
                TotalFatalities = sum(FATALITIES, na.rm = TRUE),
    TotalInjuries   = sum(INJURIES, na.rm = TRUE)
        ) |> 
        mutate(TotalCasualties = TotalFatalities + TotalInjuries) |> 
  arrange(desc(TotalCasualties))
head(health_summary, 10)
## # A tibble: 10 × 4
##    EVTYPE            TotalFatalities TotalInjuries TotalCasualties
##    <chr>                       <dbl>         <dbl>           <dbl>
##  1 TORNADO                      5633         91346           96979
##  2 EXCESSIVE HEAT               1903          6525            8428
##  3 TSTM WIND                     504          6957            7461
##  4 FLOOD                         470          6789            7259
##  5 LIGHTNING                     816          5230            6046
##  6 HEAT                          937          2100            3037
##  7 FLASH FLOOD                   978          1777            2755
##  8 ICE STORM                      89          1975            2064
##  9 THUNDERSTORM WIND             133          1488            1621
## 10 WINTER STORM                  206          1321            1527
top10_health <- health_summary %>%
  slice_max(order_by = TotalCasualties, n = 10)

ggplot(top10_health, aes(
  x = reorder(EVTYPE, TotalCasualties),
  y = TotalCasualties
)) +
  geom_col(fill = "steelblue") +
  coord_flip() +
  labs(
    x = "Event Type",
    y = "Total Fatalities + Injuries",
    title = "Top 10 Weather Event Types by Human Casualties"
  ) +
  theme_minimal()

3.2.Economic consequences

econ_summary <- storm_clean %>%
  group_by(EVTYPE) %>%
  summarize(
    TotalPropDamage = sum(PropDamage, na.rm = TRUE),
    TotalCropDamage = sum(CropDamage, na.rm = TRUE)
  ) %>%
  mutate(TotalEconomicDamage = TotalPropDamage + TotalCropDamage) %>%
  arrange(desc(TotalEconomicDamage))
head(econ_summary, 10)
## # A tibble: 10 × 4
##    EVTYPE            TotalPropDamage TotalCropDamage TotalEconomicDamage
##    <chr>                       <dbl>           <dbl>               <dbl>
##  1 FLOOD                144657709800      5661968450        150319678250
##  2 HURRICANE/TYPHOON     69305840000      2607872800         71913712800
##  3 TORNADO               56937160480       414953110         57352113590
##  4 STORM SURGE           43323536000            5000         43323541000
##  5 HAIL                  15732266720      3025954450         18758221170
##  6 FLASH FLOOD           16140861510      1421317100         17562178610
##  7 DROUGHT                1046106000     13972566000         15018672000
##  8 HURRICANE             11868319010      2741910000         14610229010
##  9 RIVER FLOOD            5118945500      5029459000         10148404500
## 10 ICE STORM              3944927810      5022113500          8967041310

Figure 1: Top 10 NOAA Storm Event Types (1950–2011) ranked by combined fatalities and injuries.

top10_econ <- econ_summary %>%
  slice_max(order_by = TotalEconomicDamage, n = 10)

ggplot(top10_econ, aes(
  x = reorder(EVTYPE, TotalEconomicDamage),
  y = TotalEconomicDamage / 1e9
)) +
  geom_col(fill = "coral") +
  coord_flip() +
  labs(
    x = "Event Type",
    y = "Total Damage (Billion USD)",
    title = "Top 10 Weather Event Types by Economic Damage"
  ) +
  theme_minimal()

Figure 2: Top 10 NOAA Storm Event Types (1950–2011) by total property + crop damage (in billions of USD).

Discussion

The table and Figure 1 indicate that tornadoes top the list in terms of combined fatalities and injuries, with over 1,500 deaths and more than 15,000 injuries recorded between 1950 and 2011. Heat waves account for substantial mortality, particularly among elderly and chronically ill populations, and flash floods also rank high in casualty counts. These results suggest that emergency planning and public awareness campaigns should highlight tornado sheltering, heat‐wave warning systems, and flood preparedness.

Figure 2 shows that hurricanes and tropical storms (e.g., “HURRICANE”, “TROPICAL STORM”) cause the greatest economic damage—exceeding $60 billion in combined property and crop losses over the study period. Severe convective storms and floods follow, each responsible for tens of billions of dollars in direct losses. Municipalities along coastal regions should therefore allocate resources to hurricane mitigation efforts—reinforcing infrastructure, updating building codes, and improving early‐warning systems.