1 Synopsis

Severe weather events can cause substantial harm to communities, including fatalities, injuries, and major economic losses.
This report analyzes the U.S. NOAA Storm Data database (1950–November 2011) to identify the event types associated with the greatest public health and economic impacts.
Public health impact is measured as the total number of fatalities and injuries aggregated by event type (EVTYPE).
Economic impact is measured as total property and crop damages, converted into U.S. dollars using the magnitude exponent fields (e.g., K, M, B).
To improve comparability, EVTYPE labels are standardized (trimmed and converted to uppercase) to reduce duplicates caused by inconsistent formatting.
Results are presented using summary tables and two bar charts to support prioritization of preparedness resources.
A decision table combines both dimensions (health and economy) to help stakeholders identify event types that are critical under different priorities.
All steps are documented with code to ensure reproducibility starting from the raw compressed CSV file.


2 Data Processing

This section describes how the data are loaded from the original compressed file and transformed for analysis. We retain only the variables required for the assignment, standardize event type labels (EVTYPE) for consistent grouping, and convert recorded damages into U.S. dollars using the magnitude exponent fields (e.g., K, M, B).

knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)

# Install/load required packages
pkgs <- c("dplyr", "ggplot2", "scales", "knitr")
for (p in pkgs) {
  if (!requireNamespace(p, quietly = TRUE)) {
    install.packages(p, repos = "https://cloud.r-project.org")
  }
}

library(dplyr)
## Warning: le package 'dplyr' a été compilé avec la version R 4.4.3
## 
## Attachement du package : 'dplyr'
## Les objets suivants sont masqués depuis 'package:stats':
## 
##     filter, lag
## Les objets suivants sont masqués depuis 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: le package 'ggplot2' a été compilé avec la version R 4.4.2
library(scales)
## Warning: le package 'scales' a été compilé avec la version R 4.4.2
library(knitr)
## Warning: le package 'knitr' a été compilé avec la version R 4.4.3
# Loading the raw data (from the original .csv.bz2 file)

file <- "repdata_data_StormData.csv.bz2"
storm <- read.csv(file, stringsAsFactors = FALSE)

dim(storm)
## [1] 902297     37
names(storm)[1:15]
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
# Keep required variables and standardize EVTYPE

data <- storm %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>%
mutate(EVTYPE = toupper(trimws(EVTYPE)))

# Convert damage exponents into dollars

exp_map <- function(x) {
x <- toupper(trimws(x))
dplyr::case_when(
x == "H" ~ 1e2,
x == "K" ~ 1e3,
x == "M" ~ 1e6,
x == "B" ~ 1e9,
x == ""  ~ 1,
x == "0" ~ 1,
x == "1" ~ 10,
x == "2" ~ 1e2,
x == "3" ~ 1e3,
x == "4" ~ 1e4,
x == "5" ~ 1e5,
x == "6" ~ 1e6,
x == "7" ~ 1e7,
x == "8" ~ 1e8,
x == "9" ~ 1e9,
TRUE ~ NA_real_
)
}

data <- data %>%
mutate(
prop_mult   = exp_map(PROPDMGEXP),
crop_mult   = exp_map(CROPDMGEXP),
prop_damage = PROPDMG * prop_mult,
crop_damage = CROPDMG * crop_mult,
econ_damage = prop_damage + crop_damage
)

# Diagnostics for unexpected exponent codes

sum(is.na(data$prop_mult))
## [1] 14
sum(is.na(data$crop_mult))
## [1] 7
# # Results


# Health impacts

health <- data %>%
group_by(EVTYPE) %>%
summarise(
fatalities   = sum(FATALITIES, na.rm = TRUE),
injuries     = sum(INJURIES, na.rm = TRUE),
total_health = fatalities + injuries,
.groups = "drop"
) %>%
arrange(desc(total_health))

top_health <- health %>% slice_head(n = 10)

kable(top_health,
caption = "Table 1. Top 10 Event Types by Public Health Impact (Fatalities + Injuries)")
Table 1. Top 10 Event Types by Public Health Impact (Fatalities + Injuries)
EVTYPE fatalities injuries total_health
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
# Figure 1: Health impacts (top 10)

ggplot(top_health, aes(x = reorder(EVTYPE, total_health), y = total_health)) +
geom_col() +
coord_flip() +
labs(
title = "Figure 1. Event Types Most Harmful to Population Health",
subtitle = "Top 10 EVTYPE categories by total fatalities + injuries (1950–Nov 2011).",
x = "Event type (EVTYPE)",
y = "Total fatalities + injuries"
)

# Economic impacts

econ <- data %>%
group_by(EVTYPE) %>%
summarise(
property_damage = sum(prop_damage, na.rm = TRUE),
crop_damage     = sum(crop_damage, na.rm = TRUE),
total_econ      = sum(econ_damage, na.rm = TRUE),
.groups = "drop"
) %>%
arrange(desc(total_econ))

top_econ <- econ %>% slice_head(n = 10)

kable(top_econ,
caption = "Table 2. Top 10 Event Types by Total Economic Damage (Property + Crops, USD)")
Table 2. Top 10 Event Types by Total Economic Damage (Property + Crops, USD)
EVTYPE property_damage crop_damage total_econ
FLOOD 144657709807 5661968450 150319678257
HURRICANE/TYPHOON 69305840000 2607872800 71913712800
TORNADO 56947380617 414953270 57362333887
STORM SURGE 43323536000 5000 43323541000
HAIL 15735267513 3025954473 18761221986
FLASH FLOOD 16822723979 1421317100 18244041079
DROUGHT 1046106000 13972566000 15018672000
HURRICANE 11868319010 2741910000 14610229010
RIVER FLOOD 5118945500 5029459000 10148404500
ICE STORM 3944927860 5022113500 8967041360
# Figure 2: Economic impacts (top 10)

ggplot(top_econ, aes(x = reorder(EVTYPE, total_econ), y = total_econ)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = label_dollar(scale = 1e-9, suffix = "B")) +
labs(
title = "Figure 2. Event Types with the Greatest Economic Consequences",
subtitle = "Top 10 EVTYPE categories by total property + crop damage (1950–Nov 2011).",
x = "Event type (EVTYPE)",
y = "Total economic damage (Billions of USD)"
)

# # Data Analysis


# Helper formatters

fmt_int <- function(x) format(round(x), big.mark = ",")
fmt_usd <- function(x) paste0("$", format(round(x), big.mark = ","))

# Top 3 health and economic

h3 <- health %>% slice_head(n = 3)
e3 <- econ   %>% slice_head(n = 3)

# Paragraphs for decision-makers

cat(paste0(
"Public health impacts are dominated by ", h3$EVTYPE[1],
", followed by ", h3$EVTYPE[2], " and ", h3$EVTYPE[3], ". ",
"Across 1950–Nov 2011, ", h3$EVTYPE[1], " accounts for ",
fmt_int(h3$fatalities[1]), " fatalities and ", fmt_int(h3$injuries[1]),
" injuries (total ", fmt_int(h3$total_health[1]), "). ",
"The next two event types total ", fmt_int(h3$total_health[2]),
" and ", fmt_int(h3$total_health[3]), " affected people, respectively.\n\n"
))
## Public health impacts are dominated by TORNADO, followed by EXCESSIVE HEAT and TSTM WIND. Across 1950–Nov 2011, TORNADO accounts for 5,633 fatalities and 91,346 injuries (total 96,979). The next two event types total 8,428 and 7,461 affected people, respectively.
cat(paste0(
"Economic losses are dominated by ", e3$EVTYPE[1],
", followed by ", e3$EVTYPE[2], " and ", e3$EVTYPE[3], ". ",
"Total damage (property + crops) is estimated at ", fmt_usd(e3$total_econ[1]),
" for ", e3$EVTYPE[1], " (property: ", fmt_usd(e3$property_damage[1]),
"; crops: ", fmt_usd(e3$crop_damage[1]), "). ",
"The next two event types are associated with ", fmt_usd(e3$total_econ[2]),
" and ", fmt_usd(e3$total_econ[3]), " in total losses, respectively.\n\n"
))
## Economic losses are dominated by FLOOD, followed by HURRICANE/TYPHOON and TORNADO. Total damage (property + crops) is estimated at $150,319,678,257 for FLOOD (property: $144,657,709,807; crops: $5,661,968,450). The next two event types are associated with $71,913,712,800 and $57,362,333,886 in total losses, respectively.
cat(
"Prioritization should recognize that the leading event types for human harm and for financial loss are not always identical; preparedness planning should balance life-safety interventions with mitigation strategies that reduce high-cost damages.\n\n"
)
## Prioritization should recognize that the leading event types for human harm and for financial loss are not always identical; preparedness planning should balance life-safety interventions with mitigation strategies that reduce high-cost damages.
# Decision table (union of top-10 lists)

candidate_evtypes <- union(top_health$EVTYPE, top_econ$EVTYPE)

health_ranked <- health %>% mutate(health_rank = row_number())
econ_ranked   <- econ   %>% mutate(econ_rank   = row_number())

rng01 <- function(x) {
if (all(is.na(x))) return(rep(NA_real_, length(x)))
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
}

decision <- tibble(EVTYPE = candidate_evtypes) %>%
left_join(health_ranked %>% select(EVTYPE, health_rank, fatalities, injuries, total_health), by = "EVTYPE") %>%
left_join(econ_ranked %>% select(EVTYPE, econ_rank, property_damage, crop_damage, total_econ), by = "EVTYPE") %>%
mutate(
health_score_0_100 = 100 * rng01(total_health),
econ_score_0_100   = 100 * rng01(total_econ),
health_score_0_100 = ifelse(is.na(health_score_0_100), 0, health_score_0_100),
econ_score_0_100   = ifelse(is.na(econ_score_0_100), 0, econ_score_0_100)
)

# Combined score weights (adjust if needed)

w_health <- 0.5
w_econ   <- 0.5

decision <- decision %>%
mutate(
combined_priority_score = w_health * health_score_0_100 + w_econ * econ_score_0_100,
total_econ_fmt = fmt_usd(total_econ),
property_fmt   = fmt_usd(property_damage),
crop_fmt       = fmt_usd(crop_damage)
) %>%
arrange(desc(combined_priority_score)) %>%
select(
EVTYPE,
health_rank, total_health, fatalities, injuries, health_score_0_100,
econ_rank, total_econ_fmt, property_fmt, crop_fmt, econ_score_0_100,
combined_priority_score
)

kable(decision, digits = 1,
caption = "Table 3. Decision Table Comparing Health and Economic Impacts (Union of Top-10 Lists; Scores normalized to 0–100)")
Table 3. Decision Table Comparing Health and Economic Impacts (Union of Top-10 Lists; Scores normalized to 0–100)
EVTYPE health_rank total_health fatalities injuries health_score_0_100 econ_rank total_econ_fmt property_fmt crop_fmt econ_score_0_100 combined_priority_score
TORNADO 1 96979 5633 91346 100.0 3 $ 57,362,333,886 $ 56,947,380,616 $ 414,953,270 38.0 69.0
FLOOD 4 7259 470 6789 7.5 1 $150,319,678,257 \(144,657,709,807 |\) 5,661,968,450 100.0 53.7
HURRICANE/TYPHOON 13 1339 64 1275 1.4 2 $ 71,913,712,800 $ 69,305,840,000 $ 2,607,872,800 47.7 24.5
STORM SURGE 51 51 13 38 0.0 4 $ 43,323,541,000 $ 43,323,536,000 $ 5,000 28.6 14.3
FLASH FLOOD 7 2755 978 1777 2.8 6 $ 18,244,041,078 $ 16,822,723,978 $ 1,421,317,100 11.9 7.4
HAIL 12 1376 15 1361 1.4 5 $ 18,761,221,986 $ 15,735,267,513 $ 3,025,954,473 12.2 6.8
TSTM WIND 3 7461 504 6957 7.7 15 $ 5,047,065,845 $ 4,493,058,495 $ 554,007,350 3.1 5.4
DROUGHT 115 4 0 4 0.0 7 $ 15,018,672,000 $ 1,046,106,000 \(13,972,566,000 | 9.7| 4.9| |HURRICANE | 41| 107| 61| 46| 0.1| 8|\) 14,610,229,010 $ 11,868,319,010 $ 2,741,910,000
EXCESSIVE HEAT 2 8428 1903 6525 8.7 33 $ 500,155,700 $ 7,753,700 $ 492,402,000 0.1 4.4
ICE STORM 8 2064 89 1975 2.1 10 $ 8,967,041,360 $ 3,944,927,860 $ 5,022,113,500 5.7 3.9
LIGHTNING 5 6046 816 5230 6.2 28 $ 942,471,520 $ 930,379,430 $ 12,092,090 0.4 3.3
RIVER FLOOD 121 4 2 2 0.0 9 $ 10,148,404,500 $ 5,118,945,500 $ 5,029,459,000 6.5 3.3
WINTER STORM 10 1527 206 1321 1.6 12 $ 6,715,441,251 $ 6,688,497,251 $ 26,944,000 4.2 2.9
THUNDERSTORM WIND 9 1621 133 1488 1.7 17 $ 3,897,965,522 $ 3,483,122,472 $ 414,843,050 2.3 2.0
HEAT 6 3037 937 2100 3.1 35 $ 403,258,500 $ 1,797,000 $ 401,461,500 0.0 1.6

3 Reproducibility Note

All figures, tables, and numerical summaries in this report are generated from the raw dataset file (repdata_data_StormData.csv.bz2) within this document. Re-knitting the document should reproduce the same results, provided the raw data file is available in the working directory.